I am trying to work with the table generated by nltk.ConditionalFreqDist but I can't seem to find any documentation on either writing the table to a csv file or exporting to other formats. I'd love to work with it in a pandas dataframe object, which is also really easy to write to a csv. The only thread I could find recommended pickling the CFD object which doesn't really solve my problem.
I wrote the following function to convert an nltk.ConditionalFreqDist object to a pd.DataFrame:
def nltk_cfd_to_pd_dataframe(cfd):
""" Converts an nltk.ConditionalFreqDist object into a pandas DataFrame object. """
df = pd.DataFrame()
for cond in cfd.conditions():
col = pd.DataFrame(pd.Series(dict(cfd[cond])))
col.columns = [cond]
df = df.join(col, how = 'outer')
df = df.fillna(0)
return df
But if I am going to do that, perhaps it would make sense to just write a new ConditionalFreqDist function that produces a pd.DataFrame in the first place. But before I reinvent the wheel, I wanted to see if there are any tricks that I am missing - either in NLTK or elsewhere to make the ConditionalFreqDist object talk with other formats and most importantly to export it to csv files.
Thanks.
You can treat an FreqDist as a dict, and create a dataframe from there using from_dict
fdist = nltk.FreqDist( ... )
df_fdist = pd.DataFrame.from_dict(fdist, orient='index')
df_fdist.columns = ['Frequency']
df_fdist.index.name = 'Term'
print(df_fdist)
df_fdist.to_csv(...)
output:
Frequency
Term
is 70464
a 26429
the 15079
pd.DataFrame(freq_dist.items(), columns=['word', 'frequency'])
Ok, so I went ahead and wrote a conditional frequency distribution function that takes a list of tuples like the nltk.ConditionalFreqDist
function but returns a pandas Dataframe object. Works faster than converting the cfd object to a dataframe:
def cond_freq_dist(data):
""" Takes a list of tuples and returns a conditional frequency distribution as a pandas dataframe. """
cfd = {}
for cond, freq in data:
try:
cfd[cond][freq] += 1
except KeyError:
try:
cfd[cond][freq] = 1
except KeyError:
cfd[cond] = {freq: 1}
return pd.DataFrame(cfd).fillna(0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With