Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Split Column String and Plot unique values

I have a dataframe Df that looks like this:

                        Country  Year  
0                Australia, USA  2015   
1            USA, Hong Kong, UK  1982   
2                           USA  2012   
3                           USA  1994   
4                   USA, France  2013   
5                         Japan  1988   
6                         Japan  1997   
7                           USA  2013   
8                        Mexico  2000   
9                       USA, UK  2005   
10                          USA  2012   
11                      USA, UK  2014   
12                          USA  1980   
13                          USA  1992   
14                          USA  1997   
15                          USA  2003   
16                          USA  2004   
17                          USA  2007    
18                 USA, Germany  2009   
19                        Japan  2006   
20                        Japan  1995  

I want to make a bar chart for the Country column, if i try this

Df.Country.value_counts().plot(kind='bar')

I get this plot

enter image description here

which is incorrect because it doesn't separate the countries. My goal is to obtain a bar chart that plots the count of each country in the column, but to achieve that, first i have to somehow split the string in each row (if needed) and then plot the data. I know i can use Df.Country.str.split(', ') to split the strings, but if i do this i can't plot the data.

Anyone has an idea how to solve this problem?

like image 362
Isak Baizley Avatar asked Jun 23 '26 16:06

Isak Baizley


1 Answers

You could use the vectorized Series.str.split method to split the Countrys:

In [163]: df['Country'].str.split(r',\s+', expand=True)
Out[163]: 
            0          1     2
0   Australia        USA  None
1         USA  Hong Kong    UK
2         USA       None  None
3         USA       None  None
4         USA     France  None
...

If you stack this DataFrame to move all the values into a single column, then you can apply value_counts and plot as before:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame(
{'Country': ['Australia, USA', 'USA, Hong Kong, UK', 'USA', 'USA', 'USA, France', 'Japan', 'Japan', 'USA', 'Mexico', 'USA, UK', 'USA', 'USA, UK', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA, Germany', 'Japan', 'Japan'],
 'Year': [2015, 1982, 2012, 1994, 2013, 1988, 1997, 2013, 2000, 2005, 2012, 2014, 1980, 1992, 1997, 2003, 2004, 2007, 2009, 2006, 1995]})
counts = df['Country'].str.split(r',\s+', expand=True).stack().value_counts()
counts.plot(kind='bar')
plt.show()

like image 136
unutbu Avatar answered Jun 26 '26 06:06

unutbu



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!