Tudor Lapusan's Blog

Post info:

Spark 2.3.0 implementes vertical display row for Dataframe/Dataset

When you have a large number of columns in your Dataframe/Dataset and you want to display all, the result is not very pretty printed. In this short post, I will use Spark 2.3.0(pyspark) and Apache Zeppelin. As you can see in the above image, it’s kind of hard to see whose column each value belongs. Imagine you have way more columns, it’s even harder to understand the results. Until Spark 2.3.0, the single solution I’m aware of is to select fewer columns

Read the full post

Post info:

Perfect fit : Apache Spark, Zeppelin and Docker

The goal of this article is to show how easy you can start working with Apache Spark using Apache Zeppelin and Docker. I played for the first time with docker when Cloudera announced the new quickstart option for trying Apache Hadoop with Cloudera. It was a really nice experience and I was surprised by docker characteristics. For me the most powerful characteristic was the ability to share containers between users using for exemple Docker Hub. Here is the official definition of containers

Read the full post

Post info:

Start working with Apache Spark

In the last year, Apache Spark received a lot of attention in big data and data science fields and more and more jobs started to appear. So, if you are reading this page you are on a good way to start working on some cool and challenging projects. In this post I’m gonna show you how easy can be to start working with Apache Spark. All you need is a PC, IDE (ex. Eclipse, Idea) and basic Java knowledge. This post

Read the full post