I get couple of grep:write errors when I run this code. What am I missing?
This is only part of it:
while d <= datetime.datetime(year, month, daysInMonth[month]):
day = d.strftime("%Y%m%d")
print day
results = [day]
first=subprocess.Popen("grep -Eliw 'Algeria|Bahrain' "+ monthDir +"/"+day+"*.txt | grep -Eliw 'Protest|protesters' "+ monthDir +"/"+day+"*.txt", shell=True, stdout=subprocess.PIPE, )
output1=first.communicate()[0]
d += delta
day = d.strftime("%Y%m%d")
second=subprocess.Popen("grep -Eliw 'Algeria|Bahrain' "+ monthDir +"/"+day+"*.txt | grep -Eliw 'Protest|protesters' "+ monthDir +"/"+day+"*.txt", shell=True, stdout=subprocess.PIPE, )
output2=second.communicate()[0]
articleList = (output1.split('\n'))
articleList2 = (output2.split('\n'))
results.append( len(articleList)+len(articleList2))
w.writerow(tuple(results))
d += delta
When you do
A | B
in a shell, process A's output is piped into process B as input. If process B shuts down before reading all of process A's output (e.g. because it found what it was looking for, which is the function of the -l
option), then process A may complain that its output pipe was prematurely closed.
These errors are basically harmless, and you can work around them by redirecting stderr
in the subprocesses to /dev/null
.
A better approach, though, may simply be to use Python's powerful regex capabilities to read the files:
def fileContains(fn, pat):
with open(file) as f:
for line in f:
if re.search(pat, line):
return True
return False
first = []
for file in glob.glob(monthDir +"/"+day+"*.txt"):
if fileContains(file, 'Algeria|Bahrain') and fileContains(file, 'Protest|protesters'):
file.append(first)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With