Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plot key count per unique value count in pandas

Tags:

I have a set of data from which I want to plot the number of keys per unique id count (x=unique_id_count, y=key_count), and I'm trying to learn how to take advantage of pandas.

In this case:

unique_ids 1 = key count 2

unique_ids 2 = key count 1

from pandas import * key_items = ("a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "c", "c", "c") id_data = ("X", "X", "X", "X", "X", "X", "X", "Y", "Y", "Y", "X", "X", "X")  df = DataFrame({'keys': key_items, 'ids': id_data}) 

I've managed to mangle the data into what I want by pulling out the data from the dataframe and restructuring it, and rebuilding a new dataframe. In this case it's probably better to do it all in python without pandas...

unique_values = defaultdict(list) for items in df.itertuples(index=False):     key = items[1]     v = items[0]     unique_values[key].append(v)  unique_values_count = {} for k, values in unique_values.iteritems():     unique_values_count[k] = [len(set(values))]  # reformat for plotting key_col = ("a", "b", "c") id_col = [unique_values_count[k][0] for k in key_col]    df2 = DataFrame({"keys":key_col, "unique_id_count": id_col}) df2.groupby("unique_id_count").size().plot(kind="bar") 

Is there a better way to do this more directly using the initial dataframe?

like image 343
monkut Avatar asked Feb 28 '13 03:02

monkut


People also ask

How do you count the number of unique values in pandas?

You can use the nunique() function to count the number of unique values in a pandas DataFrame.

How do I count the number of specific values in a column in pandas?

We can count by using the value_counts() method. This function is used to count the values present in the entire dataframe and also count values in a particular column.


2 Answers

How about just directly use value_counts()

pd.value_counts(df['ids']).plot.bar() 

enter image description here

like image 174
Aziz Alto Avatar answered Oct 09 '22 10:10

Aziz Alto


s = df.groupby("keys").ids.agg(lambda x:len(x.unique())) pd.value_counts(s).plot(kind="bar") 
like image 36
HYRY Avatar answered Oct 09 '22 09:10

HYRY