Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fetch more than 20 rows and display full value of column in spark-shell

I am using CassandraSQLContext from spark-shell to query data from Cassandra. So, I want to know two things one how to fetch more than 20 rows using CassandraSQLContext and second how do Id display the full value of column. As you can see below by default it append dots in the string values.

Code :

val csc = new CassandraSQLContext(sc) csc.setKeyspace("KeySpace") val maxDF = csc.sql("SQL_QUERY" ) maxDF.show 

Output:

+--------------------+--------------------+-----------------+--------------------+ |                  id|               Col2|              Col3|                Col4|  +--------------------+--------------------+-----------------+--------------------+ |8wzloRMrGpf8Q3bbk...|             Value1|                 X|                  K1| |AxRfoHDjV1Fk18OqS...|             Value2|                 Y|                  K2| |FpMVRlaHsEOcHyDgy...|             Value3|                 Z|                  K3| |HERt8eFLRtKkiZndy...|             Value4|                 U|                  K4| |nWOcbbbm8ZOjUSNfY...|             Value5|                 V|                  K5| 
like image 731
Naresh Avatar asked Jun 10 '16 06:06

Naresh


People also ask

How can I show more than 20 rows in Spark?

By default Spark with Scala, Java, or with Python (PySpark), fetches only 20 rows from DataFrame show() but not all rows and the column value is truncated to 20 characters, In order to fetch/display more than 20 rows and column full value from Spark/PySpark DataFrame, you need to pass arguments to the show() method.

How do I show full column content in a Spark DataFrame?

The only way to show the full column content we are using show() function. show(): Function is used to show the Dataframe. n: Number of rows to display. truncate: Through this parameter we can tell the Output sink to display the full column content by setting truncate option to false, by default this value is true.

How do you select top 10 rows in PySpark?

In Spark/PySpark, you can use show() action to get the top/first N (5,10,100 ..) rows of the DataFrame and display them on a console or a log, there are also several Spark Actions like take() , tail() , collect() , head() , first() that return top and last n rows as a list of Rows (Array[Row] for Scala).


2 Answers

If you want to print the whole value of a column, in scala, you just need to set the argument truncate from the show method to false :

maxDf.show(false) 

and if you wish to show more than 20 rows :

// example showing 30 columns of  // maxDf untruncated maxDf.show(30, false)  

For pyspark, you'll need to specify the argument name :

maxDF.show(truncate = False) 
like image 61
eliasah Avatar answered Oct 13 '22 10:10

eliasah


You won't get in nice tabular form instead it will be converted to scala object.

maxDF.take(50) 
like image 21
WoodChopper Avatar answered Oct 13 '22 09:10

WoodChopper