I'm using seaborn
and pandas
to create some bar plots from different (but related) data. The two datasets share a common category used as a hue
, and as such I would like to ensure that in the two graphs the bar color for this category matches. How can I go about this?
A basic example is as follows:
import seaborn as sns import pandas as pd import matplotlib.pyplot as plt sns.set_style('darkgrid') fig, ax = plt.subplots() a = pd.DataFrame({'Program': ['A', 'A', 'B', 'B', 'Total', 'Total'], 'Scenario': ['X', 'Y', 'X', 'Y', 'X', 'Y'], 'Duration': [4, 3, 5, 4, 9, 7]}) g = sns.barplot(data=a, x='Scenario', y='Duration', hue='Program', ci=None) plt.tight_layout() plt.savefig('3 progs.png') plt.clf() b = pd.DataFrame({'Program': ['A', 'A', 'B', 'B', 'C', 'C', 'Total', 'Total'], 'Scenario': ['X', 'Y', 'X', 'Y', 'X', 'Y', 'X', 'Y'], 'Duration': [4, 3, 5, 4, 3, 2, 12, 9]}) g = sns.barplot(data=b, x='Scenario', y='Duration', hue='Program', ci=None) plt.tight_layout() plt.savefig('4 progs.png')
Producing the two graphs:
In this example, I would like to ensure that the Total
category uses the same color in both graphs (e.g. black)
hue : (optional) This parameter take column name for colour encoding. data : (optional) This parameter take DataFrame, array, or list of arrays, Dataset for plotting. If x and y are absent, this is interpreted as wide-form. Otherwise it is expected to be long-form.
In seaborn, the hue parameter determines which column in the data frame should be used for colour encoding. Using the official document for lmplot provided an example for this. Adding `hue="smoker" tells seaborn you want to colour the data points for smoker and non-smoker differently.
color. The color parameter can be used to change the color of the bars. By default, each bar of your countplot will be a different color, as set by the defaults in Seaborn. But if you want all of the bars to have the same color (which I recommend), you can use the color parameter.
The easiest solution to make sure to have the same colors for the same categories in both plots would be to manually specify the colors at plot creation.
# First bar plot ax = sns.barplot(data=a, x='Scenario', y='Duration', hue='Program', ci=None, palette=["C0", "C1", "k"]) # ... # Second bar plot ax2 = sns.barplot(data=b, x='Scenario', y='Duration', hue='Program', ci=None, palette=["C0", "C1", "C2", "k"])
The color "C2"
(the third color of the color cycle) is only present in the second plot where there exists a Programm C.
Instead of a list, you may also use a dictionary, mapping values from the hue
column to colors.
palette ={"A": "C0", "B": "C1", "C": "C2", "Total": "k"} ax = sns.barplot(data=a, x='Scenario', y='Duration', hue='Program', palette=palette) # ... ax2 = sns.barplot(data=b, x='Scenario', y='Duration', hue='Program', palette=palette)
In both cases, the output would look like this:
Finally, you may create this dictionary automatically from the values from the hue
column. The advantage here would be that you neither need to know the colors, nor the values in the respective dataframes beforehands.
import seaborn as sns import pandas as pd import matplotlib.pyplot as plt sns.set_style('darkgrid') fig, ax = plt.subplots() a = pd.DataFrame({'Program': ['A', 'A', 'B', 'B', 'Total', 'Total'], 'Scenario': ['X', 'Y', 'X', 'Y', 'X', 'Y'], 'Duration': [4, 3, 5, 4, 9, 7]}) b = pd.DataFrame({'Program': ['A', 'A', 'B', 'B', 'C', 'C', 'Total', 'Total'], 'Scenario': ['X', 'Y', 'X', 'Y', 'X', 'Y', 'X', 'Y'], 'Duration': [4, 3, 5, 4, 3, 2, 12, 9]}) unique = a["Program"].append(b["Program"]).unique() palette = dict(zip(unique, sns.color_palette(n_colors=len(unique)))) palette.update({"Total":"k"}) ax = sns.barplot(data=a, x='Scenario', y='Duration', hue='Program', ci=None, palette=palette) plt.tight_layout() plt.figure() ax2 = sns.barplot(data=b, x='Scenario', y='Duration', hue='Program', ci=None, palette=palette) plt.tight_layout() plt.show()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With