I would like to capture the result of show in pyspark, similar to here and here. I was not able to find a solution with pyspark, only scala.
df.show()
#+----+-------+
#| age| name|
#+----+-------+
#|null|Michael|
#| 30| Andy|
#| 19| Justin|
#+----+-------+
The ultimate purpose is to capture this as string inside my logger.info
I tried logger.info(df.show())
which will only display on console.
By default, Pyspark reads all the data in the form of strings. So, we call our data variable then it returns every column with its number in the form of a string. To print, the raw data call the show() function with the data variable using the dot operator – '. '
DESCRIBE FUNCTION statement returns the basic metadata information of an existing function. The metadata information includes the function name, implementing class and the usage details. If the optional EXTENDED option is specified, the basic metadata information is returned along with the extended usage information.
In Spark/PySpark, you can save (write/extract) a DataFrame to a CSV file on disk by using dataframeObj. write. csv("path") , using this you can also write DataFrame to AWS S3, Azure Blob, HDFS, or any Spark supported file systems.
Spark/PySpark DataFrame show () is used to display the contents of the DataFrame in a Table Row & Column Format. By default it shows only 20 Rows and the column values are truncated at 20 characters.
Save DataFrame as Parquet File: To save or write a DataFrame as a Parquet file, we can use write.parquet() within the DataFrameWriter class. df.write.parquet(path='OUTPUT_DIR')
We are going to use show () function and toPandas function to display the dataframe in the required format. show (): Used to display the dataframe. N is the number of rows to be displayed from the top ,if n is not specified it will print entire rows in the dataframe
Save DataFrame as ORC File: To save or write a DataFrame as a ORC file, we can use write.orc () within the DataFrameWriter class. 3. Save DataFrame as JSON File: To save or write a DataFrame as a JSON file, we can use write.json () within the DataFrameWriter class. 4. Save DataFrame as Parquet File:
You can build a helper function using the same approach as shown in post you linked Capturing the result of explain() in pyspark. Just examine the source code for show()
and observe that it is calling self._jdf.showString()
.
The answer depends on which version of spark you are using, as the number of arguments to show()
has changed over time.
In version 2.3, the vertical
argument was added.
def getShowString(df, n=20, truncate=True, vertical=False):
if isinstance(truncate, bool) and truncate:
return(df._jdf.showString(n, 20, vertical))
else:
return(df._jdf.showString(n, int(truncate), vertical))
As of version 1.5, the truncate
argument was added.
def getShowString(df, n=20, truncate=True):
if isinstance(truncate, bool) and truncate:
return(df._jdf.showString(n, 20))
else:
return(df._jdf.showString(n, int(truncate)))
The show
function was first introduced in version 1.3.
def getShowString(df, n=20):
return(df._jdf.showString(n))
Now use the helper function as follows:
x = getShowString(df) # default arguments
print(x)
#+----+-------+
#| age| name|
#+----+-------+
#|null|Michael|
#| 30| Andy|
#| 19| Justin|
#+----+-------+
Or in your case:
logger.info(getShowString(df))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With