Plotting Histogram for all columns in a Data Frame

Tags:

I am trying to draw histograms for all of the columns in my data frame. I imported pyspark and matplotlib. df is my data frame variable. plt is matplotlib.pyplot variable

I was able to draw/plot histogram for individual column, like this:

bins, counts = df.select('ColumnName').rdd.flatMap(lambda x: x).histogram(20)
plt.hist(bins[:-1], bins=bins, weights=counts)

But when I try to plot it for all variables I am having issues. Here is the for loop I have so far:

for x in range(0, len(df.columns)):
    bins, counts = df.select(x).rdd.flatMap(lambda x: x).histogram(20)
    plt.hist(bins[:-1], bins=bins, weights=counts)

How would I do it? Thanks in advance.

218

asked Apr 11 '18 16:04

Ram

1 Answers

As an alternative to the for loop approach, I think you can try this:

df.hist(bins=30, figsize=(15, 10))

This will plot a histogram for each numerical attribute in the df DataFrame. Here, the bins and figsize arguments are just for customizing the output.

answered Oct 15 '22 23:10

Farid

Related questions
                            
                                Mark email as read with exchangelib
                            
                                Make Frequency histogram from list with tuple elements
                            
                                opencv - cmake error : No such file or directory on Ubuntu
                            
                                How to run an existing function from Jupyter notebook
                            
                                Cannot round float to integer using jinja2
                            
                                pip install pygame - SDL.h file not found
                            
                                printing variable that contains string and 2 other variables
                            
                                Making a discord bot change playing status every 10 seconds
                            
                                Animated interactive plot using matplotlib
                            
                                Cartesian Product in Tensorflow
                            
                                How to transpose a 3D matrix?
                            
                                Using keras to load model and assign new values to its parameters
                            
                                Django 2.0 'name' is not a registered namespace
                            
                                How to pass None keyword as command line argument
                            
                                How to retrieve model estimates from statsmodels?
                            
                                How to check JSON format validation?
                            
                                ModuleNotFoundError No module named apt_pkg
                            
                                Python decorator to keep signature and user defined attribute
                            
                                Convert pandas dataframe to numpy array - which approach to prefer? [duplicate]
                            
                                How to find the cube root in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Plotting Histogram for all columns in a Data Frame

Tags:

python

apache-spark

apache-spark-sql

pyspark

Ram

People also ask

1 Answers

Farid

Recent Activity

Donate For Us