Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas dataframe.hist() change title size on subplot?

I am manipulating DataFrame using pandas, Python. My data is 10000(rows) X 20(columns) and I am visualizing it, like this.

df.hist(figsize=(150,150))

However, if I make figsize bigger, each of subplots' title, which is name of each columns, get really small or graphs overlap each other and it makes impossible to distinguish.

Is there any clever way to fix it?

Thank you!

like image 682
jayko03 Avatar asked Sep 13 '17 03:09

jayko03


People also ask

How do I change the size of a Pandas DataFrame plot?

The size of a plot can be modified by passing required dimensions as a tuple to the figsize parameter of the plot() method. it is used to determine the size of a figure object.

How do you plot Hist in pandas?

In order to plot a histogram using pandas, chain the . hist() function to the dataframe. This will return the histogram for each numeric column in the pandas dataframe.

How do you check a column distribution in Python?

You can use . describe() to see a number of basic statistics about the column, such as the mean, min, max, and standard deviation. This can give you a quick overview of the shape of the data.


2 Answers

There could be cleaner ways. Here are two ways.

1) You could set properties of subplots like

fig = df.hist(figsize=(50, 30))
[x.title.set_size(32) for x in fig.ravel()]

enter image description here

2) Another way, is to set matplotlib rcParams default parameters

import matplotlib

params = {'axes.titlesize':'32',
          'xtick.labelsize':'24',
          'ytick.labelsize':'24'}
matplotlib.rcParams.update(params)
df.hist(figsize=(50, 30))

enter image description here


Default Issue

This is default behavior with very small labels and titles in subplots.

matplotlib.rcParams.update(matplotlib.rcParamsDefault)  # to revert to default settings
df.hist(figsize=(50, 30))

enter image description here

like image 173
Zero Avatar answered Sep 18 '22 10:09

Zero


I would not recommend to make the figure much larger then 10 inch in each dimension. This should in any case be more than enough to host 20 subplots. And not making the figure so large will keep fontsize reasonable.
In order to prevent plot titles from overlappig, you may simply call plt.tight_layout().

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(1000,20))
df.hist(figsize=(10,9), ec="k")

plt.tight_layout()
plt.show()

enter image description here

like image 33
ImportanceOfBeingErnest Avatar answered Sep 20 '22 10:09

ImportanceOfBeingErnest