Tudor Lapusan's Blog

Post info:

Spark 2.3.0 implementes vertical display row for Dataframe/Dataset

When you have a large number of columns in your Dataframe/Dataset and you want to display all, the result is not very pretty printed. In this short post, I will use Spark 2.3.0(pyspark) and Apache Zeppelin.bad pretty format

As you can see in the above image, it’s kind of hard to see whose column each value belongs. Imagine you have way more columns, it’s even harder to understand the results. Until Spark 2.3.0, the single solution I’m aware of is to select fewer columns and display them.
In these days I looked over show() documentation from pyspark and I was surprised to see another possible argument, vertical.
show As you can read from above, if we set vertical=True, we can print all column’s row into a single column, like this:

vertical

In this way, we can easily see whose column each value belongs. The single disadvantage is that we need to scroll a lot to browse through many rows.

As a conclusion, I was well impressed to see this new display option and that Spark community is active and tries to cover more developer needs.