Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TypeError: str object is not an iterator

I have a file consisting of words, one word on each line. The file looks like this:

aaa
bob
fff
err
ddd
fff
err

I want to count the frequency of the pair of words which occur one after the other.

For example,

aaa,bob: 1
bob,fff:1
fff,err:2

and so on. I have tried this

f=open(file,'r')
content=f.readlines()
f.close()
dic={}
it=iter(content)
for line in content:
    print line, next(line);
    dic.update({[line,next(line)]: 1})

I got the error:

TypeError: str object is not an iterator

I then tried using an iterator:

it=iter(content)
for x in it:
    print x, next(x);

Got the same error again. Please help!

like image 852
rowana Avatar asked Sep 12 '16 21:09

rowana


5 Answers

You just need to keep track of the previous line, a file object returns it own iterator so you don't need the iter or readlines at all, call next once at the very start to creating a variable prev then just keep updating prev in the loop:

from collections import defaultdict

d = defaultdict(int)

with open("in.txt") as f:
    prev = next(f).strip()
    for line in map(str.strip,f): # python2 use itertools.imap
        d[prev, line] += 1
        prev = line

Which would give you:

defaultdict(<type 'int'>, {('aaa', 'bob'): 1, ('fff', 'err'): 2, ('err', 'ddd'): 1, ('bob', 'fff'): 1, ('ddd', 'fff'): 1})
like image 178
Padraic Cunningham Avatar answered Sep 30 '22 19:09

Padraic Cunningham


line, like all strs, is an iterable, which means it has an __iter__ method. But next works with iterators, which have a __next__ method (in Python 2 it's a next method). When the interpreter executes next(line), it attempts to call line.__next__. Since line does not have a __next__ method it raises TypeError: str object is not an iterator.

Since line is an iterable and has an __iter__ method, we can set it = iter(line). it is an iterator with a __next__ method, and next(it) returns the next character in line. But you are looking for the next line in the file, so try something like:

from collections import defaultdict

dic = defaultdict(int)
with open('file.txt') as f:
    content = f.readlines()
    for i in range(len(content) - 1):
        key = content[i].rstrip() + ',' + content[i+1].rstrip()
        dic[key] += 1

for k,v in dic.items():
    print(k,':',v)

Output (file.txt as in OP)

err,ddd : 1
ddd,fff : 1
aaa,bob : 1
fff,err : 2
bob,fff : 1
like image 31
Craig Burgler Avatar answered Sep 30 '22 21:09

Craig Burgler


from collections import Counter
with open(file, 'r') as f:
    content = f.readlines()
result = Counter((a, b) for a, b in zip(content[0:-1], content[1:]))

That will be a dictionary whose keys are the line pairs (in order) and whose values are the number of times that pair occurred.

like image 35
Tore Eschliman Avatar answered Sep 30 '22 20:09

Tore Eschliman


As others said, line is a string and thus cannot be used with the next() method. Also you can't use a list as a key for the dictionary because they are hashable. You can use a tuple instead. A simple solution:

f=open(file,'r')
content=f.readlines()
f.close()

dic={}

for i in range(len(content)-1):
    print(content[i], content[i+1])
    try:
        dic[(content[i], content[i+1])] += 1
    except KeyError:
        dic[(content[i], content[i+1])] = 1

Also notice that by using readlines() you also keep the '\n' of each line. You might want to strip it off first:

    content = []
    with open(file,'r') as f:
        for line in f:
            content.append(line.strip('\n'))
like image 30
MaSdra Avatar answered Sep 30 '22 20:09

MaSdra


You can use a 2 line deque and a Counter:

from collections import Counter, deque

lc=Counter()
d=deque(maxlen=2)
with open(fn) as f:
    d.append(next(f))
    for line in f:
        d.append(line)
        lc+=Counter(["{},{}".format(*[e.rstrip() for e in d])])

>>> lc
Counter({'fff,err': 2, 'ddd,fff': 1, 'bob,fff': 1, 'aaa,bob': 1, 'err,ddd': 1})

You can also use a regex with a capturing look ahead:

with open(fn) as f:
    lc=Counter((m.group(1)+','+m.group(2),) for m in re.finditer(r"(\w+)\n(?=(\w+))", f.read()))
like image 34
dawg Avatar answered Sep 30 '22 19:09

dawg