I am trying to read a text file and remove all stop words from it. However, I am getting an Index out of range error on using b[i].pop(j).
But if i use print(b[i][j]), I dont get any error and get the words as output.
Can anyone spot the error?
import nltk
from nltk.corpus import stopwords
stop = stopwords.words('english')
fo = open("text.txt", "r")
# text.txt is just a text document
list = fo.read();
list = list.replace("\n","")
# removing newline character
b = list.split('.',list.count('.'))
# splitting list into lines
for i in range (len(b) - 1) :
b[i] = b[i].split()
# splitting each line into words
for i in range (0,len(b)) :
for j in range (0,len(b[i])) :
if b[i][j] in stop :
b[i].pop(j)
# print(b[i][j])
#print(b)
# Close opend file
fo.close()
Output:
Traceback (most recent call last):
File "prog.py", line 29, in <module>
if b[i][j] in stop :
IndexError: list index out of range
Output on commenting b[i].pop(j) and un-commenting print(b[i][j]) :
is
that
the
from
the
the
the
can
the
and
and
the
is
and
can
be
into
is
a
or
You are removing elements from the list as you are iterating over them, this causes the list to shrink in size during iteration, but the iteration would still continue for the length of the original list, hence causing such InderError issues.
You should instead try to create a new list only including the elements you want. Example -
result = []
for i in range (0,len(b)):
templist = []
for j in range (0,len(b[i])):
if b[i][j] not in stop :
templist.append(b[i][j])
result.append(templist)
Same can be done in list comprehension -
result = [[word for word in sentence if word not in stop] for sentence in b]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With