I have a dataframe that looks roughly like this: <pre class="prettyprint"><code> Property Name industry 1 123 name1 industry 1 1 144 name1 industry 1 2 456 name2 industry 1 3 789 name3 industry 2 4 367 name4 industry 2 . ... ... ... . ... ... ... n 123 name1 industry 1 </code></pre> I want to make a bar plot that plots how many rows for each of the Names there are, and colors the bars by what industry it is. I've tried something like this: <pre class="prettyprint"><code>ax = df['name'].value_counts().plot(kind='bar', figsize=(14,8), title="Number for each Owner Name") ax.set_xlabel("Owner Names") ax.set_ylabel("Frequency") </code></pre> I get the following: <img src="https://i.stack.imgur.com/he7iz.png" alt="almost there"> My question is how do I colour the bars according the the industry column in the dataframe (and add a legend). Thanks!

It might be a little bit too complicated but this does the work. I first defined the mappings from name to industry and from industry to color (it seems like there are only two industries but you can adjust the dictionary to your case): <pre class="prettyprint"><code>ind_col_map = { "industry1": "red", "industry2": "blue" } unique_comb = df[["Name","industry"]].drop_duplicates() name_ind_map = {x:y for x, y in zip(unique_comb["Name"],unique_comb["industry"])} </code></pre> Then the color can be generated by using the above two mappings: <pre class="prettyprint"><code>c = df['Name'].value_counts().index.map(lambda x: ind_col_map[name_ind_map[x]]) </code></pre> Finally, you only need to simply add <code>color</code> to your plotting function: <pre class="prettyprint"><code>ax = df['Name'].value_counts().plot(kind='bar', figsize=(14,8), title="Number for each Owner Name", color=c) ax.set_xlabel("Owner Names") ax.set_ylabel("Frequency") plt.show() </code></pre> <img src="https://i.stack.imgur.com/8CQAB.png" alt="enter image description here">

How to plot a count bar chart with a Pandas DF, grouping by one categorical column and colouring by another

Tags:

python

pandas

I have a dataframe that looks roughly like this:

  Property   Name    industry
1  123     name1    industry 1
1  144     name1    industry 1
2  456     name2    industry 1
3  789     name3    industry 2
4  367     name4    industry 2
.  ...     ...      ... 
.  ...     ...      ... 
n  123     name1    industry 1

I want to make a bar plot that plots how many rows for each of the Names there are, and colors the bars by what industry it is. I've tried something like this:

ax = df['name'].value_counts().plot(kind='bar',
                                    figsize=(14,8),
                                    title="Number for each Owner Name")
ax.set_xlabel("Owner Names")
ax.set_ylabel("Frequency")

I get the following:

almost there

My question is how do I colour the bars according the the industry column in the dataframe (and add a legend).

Thanks!

471

asked Feb 23 '18 01:02

tlanigan

3 Answers

This is my answer:

def plot_bargraph_with_groupings(df, groupby, colourby, title, xlabel, ylabel):
    """
    Plots a dataframe showing the frequency of datapoints grouped by one column and coloured by another.
    df : dataframe
    groupby: the column to groupby
    colourby: the column to color by
    title: the graph title
    xlabel: the x label,
    ylabel: the y label
    """

    import matplotlib.patches as mpatches

    # Makes a mapping from the unique colourby column items to a random color.
    ind_col_map = {x:y for x, y in zip(df[colourby].unique(),
                               [plt.cm.Paired(np.arange(len(df[colourby].unique())))][0])}


    # Find when the indicies of the soon to be bar graphs colors.
    unique_comb = df[[groupby, colourby]].drop_duplicates()
    name_ind_map = {x:y for x, y in zip(unique_comb[groupby], unique_comb[colourby])}
    c = df[groupby].value_counts().index.map(lambda x: ind_col_map[name_ind_map[x]])

    # Makes the bargraph.
    ax = df[groupby].value_counts().plot(kind='bar',
                                         figsize=FIG_SIZE,
                                         title=title,
                                         color=[c.values])
    # Makes a legend using the ind_col_map
    legend_list = []
    for key in ind_col_map.keys():
        legend_list.append(mpatches.Patch(color=ind_col_map[key], label=key))

    # display the graph.
    plt.legend(handles=legend_list)
    ax.set_xlabel(xlabel)
    ax.set_ylabel(ylabel)

enter image description here

129

answered Oct 19 '22 21:10

tlanigan

Use seaborn.countplot

import seaborn as sns
sns.set(style="darkgrid")
titanic = sns.load_dataset("titanic")
ax = sns.countplot(x="class", data=titanic)

Ref the documentation of seaborn https://seaborn.pydata.org/generated/seaborn.countplot.html

answered Oct 19 '22 21:10

UDAY RAO

It might be a little bit too complicated but this does the work. I first defined the mappings from name to industry and from industry to color (it seems like there are only two industries but you can adjust the dictionary to your case):

ind_col_map = {
    "industry1": "red",
    "industry2": "blue"
}

unique_comb = df[["Name","industry"]].drop_duplicates()
name_ind_map = {x:y for x, y in zip(unique_comb["Name"],unique_comb["industry"])}

Then the color can be generated by using the above two mappings:

c = df['Name'].value_counts().index.map(lambda x: ind_col_map[name_ind_map[x]])

Finally, you only need to simply add color to your plotting function:

ax = df['Name'].value_counts().plot(kind='bar',
                                    figsize=(14,8),
                                    title="Number for each Owner Name", color=c)
ax.set_xlabel("Owner Names")
ax.set_ylabel("Frequency")
plt.show()

enter image description here

answered Oct 19 '22 20:10

TYZ

Related questions
                            
                                Taking np.average while ignoring NaN's?
                            
                                Pass JavaScript variable to Flask url_for
                            
                                Reading a list stored in a text file [duplicate]
                            
                                How to check anaconda's version on mac?
                            
                                Python 3.5, ctypes: TypeError: bytes or integer address expected instead of str instance
                            
                                ENTER key press using Selenium WebDriver with python [duplicate]
                            
                                Get constraints in matrix format from gurobipy
                            
                                Flask Response vs Flask make_response
                            
                                python - matplotlib : figsize for subplots - adding space between rows
                            
                                ImportError: cannot import name TwilioRestClient
                            
                                How to normalize the volume of an audio file in python?
                            
                                Pandas to_dict() Returning "Timestamp"
                            
                                Fastest way to check whether a value exists more often than X in a list
                            
                                Qt Designer how to change background
                            
                                What does tensorflow "op" do?
                            
                                Selecting only numeric/string columns names from a Spark DF in pyspark
                            
                                How to handle DuplicateKeyError in MongoDB (pyMongo)?
                            
                                python draw parallelepiped
                            
                                From request import PandaRequest ImportError: No module named 'request'
                            
                                Python / Pyspark - Count NULL, empty and NaN

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to plot a count bar chart with a Pandas DF, grouping by one categorical column and colouring by another

Tags:

python

pandas

tlanigan

People also ask

3 Answers

tlanigan

UDAY RAO

TYZ

Recent Activity

Donate For Us