Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use .apply() to combine a column of dictionaries into one dictionary?

Tags:

python

pandas

I have a column of dictionaries within a pandas data frame.

srs_tf = pd.Series([{'dried': 1, 'oak': 2},{'fruity': 2, 'earthy': 2},{'tones': 2, 'oak': 4}]) 
srs_b = pd.Series([2,4,6]) 
df = pd.DataFrame({'tf': srs_tf, 'b': srs_b}) 

df

                           tf  b
0      {'dried': 1, 'oak': 2}  2
1  {'fruity': 2, 'earthy': 2}  4
2      {'tones': 2, 'oak': 4}  6

These dictionaries represent word frequency in descriptions of wines (Ex input dictionary:{'savory': 1, 'dried': 3, 'thyme': 1, 'notes':..}). I need to create an output dictionary from this column of dictionaries that contains all of the keys from the input dictionaries and maps them to the number of input dictionaries in which those keys are present. For example, the word 'dried' is a key in 850 of the input dictionaries, so in the output dictionary {.. 'dried': 850...}.

I want to try using the data frame .apply() method but I believe that I am using it incorrectly.

def worddict(row, description_counter):
    for key in row['tf'].keys():
        if key in description_counter.keys():
            description_counter[key] += 1
        else:
            description_counter[key] = 1
    return description_counter

description_counter = {}

output_dict = df_wine_list.apply(lambda x: worddict(x, description_counter), axis = 1)

So a couple things. I think that my axis should = 0 rather than 1, but I get this error when I try that: KeyError: ('tf', 'occurred at index Unnamed: 0')

When I do use axis = 1, my function returns a column of identical dictionaries rather than a single dictionary.

like image 581
mpollinger Avatar asked Oct 22 '25 17:10

mpollinger


1 Answers

You can use chain and Counter:

from collections import Counter
from itertools import chain

Counter(chain.from_iterable(df['a']))
# Counter({'dried': 1, 'earthy': 1, 'fruity': 1, 'oak': 2, 'tones': 1})

Or,

Counter(y for x in df['a'] for y in x)
# Counter({'dried': 1, 'earthy': 1, 'fruity': 1, 'oak': 2, 'tones': 1})

You can also use Index.value_counts,

pd.concat(map(pd.Series, df['a'])).index.value_counts().to_dict()
# {'dried': 1, 'earthy': 1, 'fruity': 1, 'oak': 2, 'tones': 1}
like image 140
cs95 Avatar answered Oct 24 '25 08:10

cs95



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!