I have a set of data from which I want to plot the number of keys per unique id count (x=unique_id_count, y=key_count), and I'm trying to learn how to take advantage of pandas
.
In this case:
unique_ids 1 = key count 2
unique_ids 2 = key count 1
from pandas import * key_items = ("a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "c", "c", "c") id_data = ("X", "X", "X", "X", "X", "X", "X", "Y", "Y", "Y", "X", "X", "X") df = DataFrame({'keys': key_items, 'ids': id_data})
I've managed to mangle the data into what I want by pulling out the data from the dataframe and restructuring it, and rebuilding a new dataframe. In this case it's probably better to do it all in python without pandas...
unique_values = defaultdict(list) for items in df.itertuples(index=False): key = items[1] v = items[0] unique_values[key].append(v) unique_values_count = {} for k, values in unique_values.iteritems(): unique_values_count[k] = [len(set(values))] # reformat for plotting key_col = ("a", "b", "c") id_col = [unique_values_count[k][0] for k in key_col] df2 = DataFrame({"keys":key_col, "unique_id_count": id_col}) df2.groupby("unique_id_count").size().plot(kind="bar")
Is there a better way to do this more directly using the initial dataframe?
You can use the nunique() function to count the number of unique values in a pandas DataFrame.
We can count by using the value_counts() method. This function is used to count the values present in the entire dataframe and also count values in a particular column.
How about just directly use value_counts()
pd.value_counts(df['ids']).plot.bar()
s = df.groupby("keys").ids.agg(lambda x:len(x.unique())) pd.value_counts(s).plot(kind="bar")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With