Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add words from a text file to a dictionary depending on the name?

So I have a text file which has the script of Act 1 from a Romeo and Juliet play and I want to count how many times someone says a word.

Here is the text: http://pastebin.com/X0gaxAPK

There are 3 people speaking in the text: Gregory, Sampson, and Abraham.

Basically I want to make 3 different dictionaries (if that's the best way to do it?) for each of the three speakers. Populate the dictionaries with the words the people say respectively, and then count how many times they say each word in the entire script.

How would I go about doing this? I think I can figure out the word count but I am a bit confused on how to separate who says what and put it into 3 different dictionaries for each person.

My output should look something like this (this is not correct but an example):

Gregory - 
25: the
15: a
5: from
3: while
1: hello
etc

Where the number is the frequency of the word said in the file.

Right now I have code written that reads the text file, strips the punctuation, and compiles the text into a list. I also don't want to use any outside modules, I'd like to do it the old fashioned way to learn, thanks.

You don't have to post exact code, just explain what I need to do and hopefully I can figure it out. I'm using Python 3.

like image 993
Goose Avatar asked Dec 05 '25 18:12

Goose


1 Answers

import collections
import string
c = collections.defaultdict(collections.Counter)
speaker = None

with open('/tmp/spam.txt') as f:
  for line in f:
    if not line.strip():
      # we're on an empty line, the last guy has finished blabbing
      speaker = None
      continue
    if line.count(' ') == 0 and line.strip().endswith(':'):
      # a new guy is talking now, you might want to refine this event
      speaker = line.strip()[:-1]
      continue
    c[speaker].update(x.strip(string.punctuation).lower() for x in line.split())

Example output:

In [1]: run /tmp/spam.py

In [2]: c.keys()
Out[2]: [None, 'Abraham', 'Gregory', 'Sampson']

In [3]: c['Gregory'].most_common(10)
Out[3]: 
[('the', 7),
 ('thou', 6),
 ('to', 6),
 ('of', 4),
 ('and', 4),
 ('art', 3),
 ('is', 3),
 ('it', 3),
 ('no', 3),
 ('i', 3)]
like image 111
wim Avatar answered Dec 07 '25 08:12

wim



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!