TypeError: str object is not an iterator

I have a file consisting of words, one word on each line. The file looks like this:


I want to count the frequency of the pair of words which occur one after the other.

For example,

aaa,bob: 1

and so on. I have tried this

for line in content:
    print line, next(line);
    dic.update({[line,next(line)]: 1})

I got the error:

TypeError: str object is not an iterator

I then tried using an iterator:

for x in it:
    print x, next(x);

Got the same error again. Please help!

You just need to keep track of the previous line, a file object returns it own iterator so you don't need the iter or readlines at all, call next once at the very start to creating a variable prev then just keep updating prev in the loop:

from collections import defaultdict

d = defaultdict(int)

with open("in.txt") as f:
    prev = next(f).strip()
    for line in map(str.strip,f): # python2 use itertools.imap
        d[prev, line] += 1
        prev = line

Which would give you:

defaultdict(<type 'int'>, {('aaa', 'bob'): 1, ('fff', 'err'): 2, ('err', 'ddd'): 1, ('bob', 'fff'): 1, ('ddd', 'fff'): 1})
line, like all strs, is an iterable, which means it has an __iter__ method. But next works with iterators, which have a __next__ method (in Python 2 it's a next method). When the interpreter executes next(line), it attempts to call line.__next__. Since line does not have a __next__ method it raises TypeError: str object is not an iterator.

Since line is an iterable and has an __iter__ method, we can set it = iter(line). it is an iterator with a __next__ method, and next(it) returns the next character in line. But you are looking for the next line in the file, so try something like:

from collections import defaultdict

dic = defaultdict(int)
with open('file.txt') as f:
    content = f.readlines()
    for i in range(len(content) - 1):
        key = content[i].rstrip() + ',' + content[i+1].rstrip()
        dic[key] += 1

for k,v in dic.items():

Output (file.txt as in OP)

err,ddd : 1
ddd,fff : 1
aaa,bob : 1
fff,err : 2
bob,fff : 1
from collections import Counter
with open(file, 'r') as f:
    content = f.readlines()
result = Counter((a, b) for a, b in zip(content[0:-1], content[1:]))

That will be a dictionary whose keys are the line pairs (in order) and whose values are the number of times that pair occurred.

As others said, line is a string and thus cannot be used with the next() method. Also you can't use a list as a key for the dictionary because they are hashable. You can use a tuple instead. A simple solution:



for i in range(len(content)-1):
    print(content[i], content[i+1])
        dic[(content[i], content[i+1])] += 1
    except KeyError:
        dic[(content[i], content[i+1])] = 1

Also notice that by using readlines() you also keep the '\n' of each line. You might want to strip it off first:

    content = []
    with open(file,'r') as f:
        for line in f:
You can use a 2 line deque and a Counter:

from collections import Counter, deque

with open(fn) as f:
    for line in f:
        lc+=Counter(["{},{}".format(*[e.rstrip() for e in d])])

>>> lc
Counter({'fff,err': 2, 'ddd,fff': 1, 'bob,fff': 1, 'aaa,bob': 1, 'err,ddd': 1})

You can also use a regex with a capturing look ahead:

with open(fn) as f:
    lc=Counter((m.group(1)+','+m.group(2),) for m in re.finditer(r"(\w+)\n(?=(\w+))", f.read()))
