Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Assign line colors in pandas

Tags:

python

pandas

I am trying to plot some data in pandas and the inbuilt plot function conveniently plots one line per column. What I want to do is to manually assign each line a color based on a classification I make.

The following works:

df = pd.DataFrame({'1': [1, 2, 3, 4], '2': [1, 2, 1, 2]})
s = pd.Series(['c','y'], index=['1','2'])
df.plot(color = s)

But when my indices are integers it no longer works and throws as KeyError:

df = pd.DataFrame({1: [1, 2, 3, 4], 2: [1, 2, 1, 2]})
s = pd.Series(['c','y'], index=[1,2])
df.plot(color = s)

The way I understand it is that when an integer index is used it somehow has to start from 0. That is my guess since the following works as well:

df = pd.DataFrame({0: [1, 2, 3, 4], 1: [1, 2, 1, 2]})
s = pd.Series(['c','y'], index=[1,0])
df.plot(color = s)

My question is:

  • What is happening here?
  • Assuming I have an integer index that does not start from 0 or is not formed of successive numbers, how can I make this work without having to convert the index to string or reindex starting from 0?

EDIT:

I realised that even in the first case, the code doesn't do what I expected it to do. It seems like pandas matches the index of DataFrame and Series only if both are integer indices starting from 0. If that isn't the case, a KeyError is thrown or if the index is a str the order of the elements is used.

Is this correct? And is there a way to match the Series and DataFrame indices? Or do I have to make sure I pass a list of colours in the right order?

like image 350
GebitsGerbils Avatar asked Sep 11 '15 14:09

GebitsGerbils


People also ask

How do I change the color of a line in Python?

You can change the color by using hex strings ('#008000'). By using the RGB or RGBA tuples ((0,1,0,1)). Another way is by using grayscale intensities as a string ('0.8'). By simply mentioning the initials of the colors.

How do you plot a color in python?

For example, you can set the color, marker, linestyle, and markercolor with: plot(x, y, color='green', linestyle='dashed', marker='o', markerfacecolor='blue', markersize=12).

How do you color a column in pandas?

With your one line of code, can you apply to several columns with different colors for each column? @sqllearner you can apply the same color to several columns just by adding them to the subset, like df. style. set_properties(**{'background-color': 'red'}, subset=['A', 'C']).

How do I add color to Matplotlib?

The usual way to set the line color in matplotlib is to specify it in the plot command. This can either be done by a string after the data, e.g. "r-" for a red line, or by explicitely stating the color argument.


2 Answers

To set color for each line you can use the parameter style. For example:

df = pd.DataFrame({'A': [1, 2, 4], 'B': [1, 3, 9]})
df.plot(style={'A': 'r', 'B': 'g'})

enter image description here

Using the shortcut string notation in the form marker|line|color you can also set marker and line types:

df = pd.DataFrame({'A': [1, 2, 4], 'B': [1, 3, 9]})
df.plot(style={'A': '*:r', 'B': '+--g'})

enter image description here

like image 153
Mykola Zotko Avatar answered Sep 17 '22 22:09

Mykola Zotko


What is happening here?

The keyword argument color is inherited from matplotlib.pyplot.plot(). The details in the documentation don't make it clear that you can put in a list of colors when plotting. Given that color is a keyword argument from matplotlib, I'd recommend not using a Pandas Series to hold the color values.

How can I make this work?

Use a list instead of a Series. If you were using a Series with an index meant to match the columns of your DataFrame to specific colors, you will need to sort the Series first. If the columns are not in order, you will need to sort the columns as well.

# Option 1
s = s.sort_index()
df.plot(color = s.values) # as per Fiabetto's answer

# Option 2
df.plot(color = ['c', 'y']) # other method
like image 21
thecircus Avatar answered Sep 19 '22 22:09

thecircus