I want to generate an ordered list of the least common words within a large body of text, with the least common word appearing first along with a value indicating how many times it appears in the text. I scraped the text from some online journal articles, then simply assigned and split; <pre class="prettyprint"><code>article_one = """ large body of text """.split() => ("large","body", "of", "text") </code></pre> Seems like a regex would be appropriate for the next steps, but being new to programming I'm not well versed- If the best answer includes a regex, could someone point me to a good regex tutorial other than pydoc?

How about a shorter/simpler version with a defaultdict, Counter is nice but needs Python 2.7, this works from 2.5 and up :) <pre class="prettyprint"><code>import collections counter = collections.defaultdict(int) article_one = """ large body of text """ for word in article_one.split(): counter[word] += 1 print sorted(counter.iteritems(), key=lambda x: x[::-1]) </code></pre>

Finding least common elements in a list. According to Counter class in Collections module <pre class="prettyprint"><code>c.most_common()[:-n-1:-1] # n least common elements </code></pre> So Code for least common element in list is <pre class="prettyprint"><code>from collections import Counter Counter( mylist ).most_common()[:-2:-1] </code></pre> Two least common elements is <pre class="prettyprint"><code>from collections import Counter Counter( mylist ).most_common()[:-3:-1] </code></pre> python-3.x

Finding least common elements in a list

Tags:

python

list

I want to generate an ordered list of the least common words within a large body of text, with the least common word appearing first along with a value indicating how many times it appears in the text.

I scraped the text from some online journal articles, then simply assigned and split;

article_one = """ large body of text """.split() 
=> ("large","body", "of", "text")

Seems like a regex would be appropriate for the next steps, but being new to programming I'm not well versed- If the best answer includes a regex, could someone point me to a good regex tutorial other than pydoc?

559

asked Jan 31 '13 01:01

Benjamin James

2 Answers

How about a shorter/simpler version with a defaultdict, Counter is nice but needs Python 2.7, this works from 2.5 and up :)

import collections

counter = collections.defaultdict(int)
article_one = """ large body of text """

for word in article_one.split():
    counter[word] += 1

print sorted(counter.iteritems(), key=lambda x: x[::-1])

163

answered Oct 20 '22 03:10

Wolph

Finding least common elements in a list. According to Counter class in Collections module

c.most_common()[:-n-1:-1]       # n least common elements

So Code for least common element in list is

from collections import Counter
Counter( mylist ).most_common()[:-2:-1]

Two least common elements is

from collections import Counter
Counter( mylist ).most_common()[:-3:-1]

python-3.x

answered Oct 20 '22 01:10

dwalsh84

Related questions
                            
                                Using nested generator expression in Python 2.7
                            
                                Django/Python email notification for events
                            
                                How to reference signals outside of models.py
                            
                                Python - how can I reference a class variable or method from within the __init__ method?
                            
                                Python 2.6 urlib2 timeout issue
                            
                                Reorder string using regular expressions
                            
                                Python deque scope?
                            
                                how to call @app.before_request
                            
                                Python failure injection
                            
                                Stopping threads spawned by BaseHTTPServer using ThreadingMixin
                            
                                PyQt Irregularly Shaped Windows (e.g. A circular without a border/decorations)
                            
                                Self scanning code to prevent print statments
                            
                                Matplotlib animate does not update tick labels
                            
                                pycuda ImportError in pycuda.driver
                            
                                Run code from a Python module, modify module, then run again without exiting interpeter
                            
                                Custom plot linestyle in matplotlib
                            
                                Confused by lexical closure in list comprehension
                            
                                BeautifulSoup: Extracting Value from Children nodes
                            
                                Running a standalone script doing a model query in Django with `settings/dev.py` instead of `settings.py`
                            
                                Selenium not deleting profiles on browser close

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With