I'm trying to make a python script that takes a string and gives the count of consecutive words. Let's say:
string = " i have no idea how to write this script. i have an idea."
output =
['i', 'have'] 2
['have', 'no'] 1
['no', 'idea'] 1
['idea', 'how'] 1
['how', 'to'] 1
['to', 'write'] 1
...
I'm trying to use python without importing collections, counters from collections. What I have is below. I'm trying to use a re.findall(#whatpatterndoiuse, string)
to iterate through the string and compare it but I'm having difficulties figuring out how to.
string2 = re.split('\s+', string. lower())
freq_dict = {} #empty dictionary
for word in word_list:
word = punctuation.sub("", word)
freq_dic[word] = freq_dic.get(word,0) + 1
freq_list = freq_dic.items()
freq_list.sort()
for word, freq in freq_list:
print word, freq
Using counter from collections which I did not want. Also it produce an output in a format that is not the one I stated above.
import re
from collections import Counter
words = re.findall('\w+', open('a.txt').read())
print(Counter(zip(words,words[1:])))
Given a String, extract all the K-length consecutive characters. Input : test_str = 'geekforgeeeksss is bbbest forrr geeks', K = 3 Output : ['eee', 'sss', 'bbb', 'rrr'] Explanation : K length consecutive strings extracted.
Use the count() Method to Count Words in Python String Python. The count() method is a Python built-in method. It takes three parameters and returns the number of occurrences based on the given substring.
Solving this without zip is fairly simple. Just build tuples of each pair of words and track their count in a dict. There are just a few special cases to watch for - when the input string only has one word, and when you are at the end of the string.
Give this a shot:
def freq(input_string):
freq = {}
words = input_string.split()
if len(words) == 1:
return freq
for idx, word in enumerate(words):
if idx+1 < len(words):
word_pair = (word, words[idx+1])
if word_pair in freq:
freq[word_pair] += 1
else:
freq[word_pair] = 1
return freq
You need to solve three problems:
['i', 'have']
, ['have', 'no']
, ...);The second problem can be easily solved by using a Counter
. Counter
objects also provide a most_common()
method to solve the third problem.
The first problem can be solved in many ways. The most compact way is using zip
:
>>> import re
>>> s = 'i have no idea how to write this script. i have an idea.'
>>> words = re.findall('\w+', s)
>>> pairs = zip(words, words[1:])
>>> list(pairs)
[('i', 'have'), ('have', 'no'), ('no', 'idea'), ...]
Putting everything together:
import collections
import re
def count_pairs(s):
"""
Returns a mapping that links each pair of words
to its number of occurrences.
"""
words = re.findall('\w+', s.lower())
pairs = zip(words, words[1:])
return collections.Counter(pairs)
def print_freqs(s):
"""
Prints the number of occurrences of word pairs
from the most common to the least common.
"""
cnt = count_pairs(s)
for pair, count in cnt.most_common():
print list(pair), count
EDIT: I realized just now that I accidentally read "with collections, counters, ..." instead of "with out importing collections, ...". My bad, sorry.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With