When you have a large number of columns in your Dataframe/Dataset and you want to display all, the result is not very pretty printed. In this short post, I will use Spark 2.3.0(pyspark) and Apache Zeppelin.
As you can see in the above image, it’s kind of hard to see whose column each value belongs. Imagine you have way more columns, it’s even harder to understand the results. Until Spark 2.3.0, the single solution I’m aware of is to select fewer columns and display them.
In these days I looked over show() documentation from pyspark and I was surprised to see another possible argument, vertical.
As you can read from above, if we set vertical=True, we can print all column’s row into a single column, like this:
In this way, we can easily see whose column each value belongs. The single disadvantage is that we need to scroll a lot to browse through many rows.
As a conclusion, I was well impressed to see this new display option and that Spark community is active and tries to cover more developer needs.