Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to set display precision in PySpark Dataframe show

How do you set the display precision in PySpark when calling .show()?

Consider the following example:

from math import sqrt
import pyspark.sql.functions as f

data = zip(
    map(lambda x: sqrt(x), range(100, 105)),
    map(lambda x: sqrt(x), range(200, 205))
)
df = sqlCtx.createDataFrame(data, ["col1", "col2"])
df.select([f.avg(c).alias(c) for c in df.columns]).show()

Which outputs:

#+------------------+------------------+
#|              col1|              col2|
#+------------------+------------------+
#|10.099262230352151|14.212583322380274|
#+------------------+------------------+

How can I change it so that it only displays 3 digits after the decimal point?

Desired output:

#+------+------+
#|  col1|  col2|
#+------+------+
#|10.099|14.213|
#+------+------+

This is a PySpark version of this scala question. I'm posting it here because I could not find an answer when searching for PySpark solutions, and I think it can be helpful to others in the future.

like image 344
pault Avatar asked Feb 16 '18 18:02

pault


People also ask

How do you set decimal places in PySpark?

You can use format_number to format a number to desired decimal places as stated in the official api document: Formats numeric column x to a format like '#,###,###. ##', rounded to d decimal places, and returns the result as a string column.

How do you show all values in PySpark?

In the code for showing the full column content we are using show() function by passing parameter df. count(),truncate=False, we can write as df. show(df. count(), truncate=False), here show function takes the first parameter as n i.e, the number of rows to show, since df.

What is show truncate in PySpark?

truncatebool, optional. If set to True , truncate strings longer than 20 chars by default. If set to a number greater than one, truncates long strings to length truncate and align cells right. verticalbool, optional. If set to True , print output rows vertically (one line per column value).

How do you show full column content in a PySpark DataFrame?

Solution: PySpark Show Full Contents of a DataFrame In Spark or PySpark by default truncate column content if it is longer than 20 chars when you try to output using show() method of DataFrame, in order to show the full contents without truncating you need to provide a boolean argument false to show(false) method.


1 Answers

Round

The easiest option is to use pyspark.sql.functions.round():

from pyspark.sql.functions import avg, round
df.select([round(avg(c), 3).alias(c) for c in df.columns]).show()
#+------+------+
#|  col1|  col2|
#+------+------+
#|10.099|14.213|
#+------+------+

This will maintain the values as numeric types.

Format Number

The functions are the same for scala and python. The only difference is the import.

You can use format_number to format a number to desired decimal places as stated in the official api document:

Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places, and returns the result as a string column.

from pyspark.sql.functions import avg, format_number 
df.select([format_number(avg(c), 3).alias(c) for c in df.columns]).show()
#+------+------+
#|  col1|  col2|
#+------+------+
#|10.099|14.213|
#+------+------+

The transformed columns would of StringType and a comma is used as a thousands separator:

#+-----------+--------------+
#|       col1|          col2|
#+-----------+--------------+
#|500,100.000|50,489,590.000|
#+-----------+--------------+

As stated in the scala version of this answer we can use regexp_replace to replace the , with any string you want

Replace all substrings of the specified string value that match regexp with rep.

from pyspark.sql.functions import avg, format_number, regexp_replace
df.select(
    [regexp_replace(format_number(avg(c), 3), ",", "").alias(c) for c in df.columns]
).show()
#+----------+------------+
#|      col1|        col2|
#+----------+------------+
#|500100.000|50489590.000|
#+----------+------------+
like image 148
6 revs, 2 users 72% Avatar answered Sep 22 '22 16:09

6 revs, 2 users 72%