I have multiple dictionaries that contain word frequency counts for a series of text files. I'm trying to find a way of collating them into a dataframe (so one dict = one text file = one row in the df), but I am fairly inexperienced with Python and unsure how to proceed.
I have approx 50 text files/dictionaries, but for simplicity say I have the following;
mydict = {'red': 2,'blue': 1,'yellow': 3}
mydict2 = {'blue': 1,'orange': 3,'red': 1}
mydict3 = {'purple': 1,'green': 3,'brown': 2}
How can I create a dataframe with the full list of colours as columns, the dictionaries/text files as rows, and then the respective counts as the data-points (with any colors not appearing in a particular column registered as zero).
I would have included a coding attempt, however I do not know how to begin with the task.
You can make a series for each and then .concat them.
mydicts = [mydict, mydict2, mydict3]
df = pd.concat([pd.Series(d) for d in mydicts], axis=1).fillna(0).T
df.index = ['mydict', 'mydict1', 'mydict2']
df
returns
blue brown green orange purple red yellow
mydict 1.0 0.0 0.0 0.0 0.0 2.0 3.0
mydict1 1.0 0.0 0.0 3.0 0.0 1.0 0.0
mydict2 0.0 2.0 3.0 0.0 1.0 0.0 0.0
use pd.DataFrame.from_records():
In [6]: mydicts = [mydict, mydict2, mydict3]
In [7]: pd.DataFrame.from_records(mydicts).fillna(0)
Out[7]:
blue brown green orange purple red yellow
0 1.0 0.0 0.0 0.0 0.0 2.0 3.0
1 1.0 0.0 0.0 3.0 0.0 1.0 0.0
2 0.0 2.0 3.0 0.0 1.0 0.0 0.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With