Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas bar plot -- specify bar color by column

Is there a simply way to specify bar colors by column name using Pandas DataFrame.plot(kind='bar') method?

I have a script that generates multiple DataFrames from several different data files in a directory. For example it does something like this:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pds

data_files = ['a', 'b', 'c', 'd']

df1 = pds.DataFrame(np.random.rand(4,3), columns=data_files[:-1])
df2 = pds.DataFrame(np.random.rand(4,3), columns=data_files[1:])

df1.plot(kind='bar', ax=plt.subplot(121))
df2.plot(kind='bar', ax=plt.subplot(122))

plt.show()

With the following output:

Output

Unfortunately, the column colors aren't consistent for each label in the different plots. Is it possible to pass in a dictionary of (filenames:colors), so that any particular column always has the same color. For example, I could imagine creating this by zipping up the filenames with the Matplotlib color_cycle:

data_files = ['a', 'b', 'c', 'd']
colors = plt.rcParams['axes.color_cycle']
print zip(data_files, colors)

[('a', u'b'), ('b', u'g'), ('c', u'r'), ('d', u'c')]

I could figure out how to do this directly with Matplotlib: I just thought there might be a simpler, built-in solution.

Edit:

Below is a partial solution that works in pure Matplotlib. However, I'm using this in an IPython notebook that will be distributed to non-programmer colleagues, and I'd like to minimize the amount of excessive plotting code.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pds

data_files = ['a', 'b', 'c', 'd']
mpl_colors = plt.rcParams['axes.color_cycle']
colors = dict(zip(data_files, mpl_colors))

def bar_plotter(df, colors, sub):
    ncols = df.shape[1]
    width = 1./(ncols+2.)
    starts = df.index.values - width*ncols/2.
    plt.subplot(120+sub)
    for n, col in enumerate(df):
        plt.bar(starts + width*n, df[col].values, color=colors[col],
                width=width, label=col)
    plt.xticks(df.index.values)
    plt.grid()
    plt.legend()

df1 = pds.DataFrame(np.random.rand(4,3), columns=data_files[:-1])
df2 = pds.DataFrame(np.random.rand(4,3), columns=data_files[1:])

bar_plotter(df1, colors, 1)
bar_plotter(df2, colors, 2)

plt.show()

Desired Output

like image 822
Ryan Avatar asked Sep 05 '14 15:09

Ryan


People also ask

How do you change the color of bars on a bar graph in Python?

You can change the color of bars in a barplot using color argument. RGB is a way of making colors. You have to to provide an amount of red, green, blue, and the transparency value to the color argument and it returns a color.

How do I color a specific bar in Matplotlib?

To set different colors for bars in a Bar Plot using Matplotlib PyPlot API, call matplotlib. pyplot. bar() function, and pass required color values, as list, to color parameter of bar() function. Of course, there are other named parameters, but for simplicity, only color parameter is given.


2 Answers

You can pass a list as the colors. This will require a little bit of manual work to get it to line up, unlike if you could pass a dictionary, but may be a less cluttered way to accomplish your goal.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pds

data_files = ['a', 'b', 'c', 'd']

df1 = pds.DataFrame(np.random.rand(4,3), columns=data_files[:-1])
df2 = pds.DataFrame(np.random.rand(4,3), columns=data_files[1:])

color_list = ['b', 'g', 'r', 'c']


df1.plot(kind='bar', ax=plt.subplot(121), color=color_list)
df2.plot(kind='bar', ax=plt.subplot(122), color=color_list[1:])

plt.show()

enter image description here

EDIT Ajean came up with a simple way to return a list of the correct colors from a dictionary:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pds

data_files = ['a', 'b', 'c', 'd']
color_list = ['b', 'g', 'r', 'c']
d2c = dict(zip(data_files, color_list))

df1 = pds.DataFrame(np.random.rand(4,3), columns=data_files[:-1])
df2 = pds.DataFrame(np.random.rand(4,3), columns=data_files[1:])

df1.plot(kind='bar', ax=plt.subplot(121), color=map(d2c.get,df1.columns))
df2.plot(kind='bar', ax=plt.subplot(122), color=map(d2c.get,df2.columns))

plt.show()
like image 55
DataSwede Avatar answered Oct 10 '22 17:10

DataSwede


Pandas version 1.1.0 makes this easier. You can pass a dictionary to specify different color for each column in the pandas.DataFrame.plot.bar() function:

enter image description here

Here is an example:

df1 = pd.DataFrame({'a': [1.2, .8, .9], 'b': [.2, .9, .7]})
df2 = pd.DataFrame({'b': [0.2, .5, .4], 'c': [.5, .6, .7], 'd': [1.1, .6, .7]})
color_dict = {'a':'green', 'b': 'red', 'c':'blue', 'd': 'cyan'}
df1.plot.bar(color = color_dict)
df2.plot.bar(color = color_dict)
like image 32
Kumar Avatar answered Oct 10 '22 19:10

Kumar