Boolean search text file in Python

Question

I have a text file with 32 articles. Each article starts with the expression: <Number> of 32 DOCUMENTS, for example: 1 of 32 DOCUMENTS, 2 of 32 DOCUMENTS, etc. In order to find each article I have used the following code:

import re 
sections = [] 
current = []
with open("Aberdeen2005.txt") as f:
    for line in f:
        if re.search(r"(?i)\d+ of \d+ DOCUMENTS", line):        
           sections.append("".join(current))
           current = [line]
        else:
           current.append(line)

print(len(sections))

So now, articles are represented by the expression sections

The next thing I want to do, is to subgroup the articles in 2 groups. Those articles containing the words: economy OR economic AND uncertainty OR uncertain AND tax OR policy, identify them with the number 1.

Whereas those articles containing the following words: economy OR economic AND uncertain OR uncertainty AND regulation OR spending, identify them with the number 2. This is what I have tried so far:

for i in range(len(sections)):
group1 = re.search(r"+[economic|economy].+[uncertainty|uncertain].+[tax|policy]", , sections[i])
group2 = re.search(r"+[economic|economy].+[uncertainty|uncertain].+[regulation|spending]", , sections[i])

Nevertheless, it does not seem to work. Any ideas why?

glibdud · Accepted Answer

It's a bit wordy, but you can get away without using regular expressions here, for example:

# Take a lowercase copy for comparisons
s = sections[i].lower()
if (('economic' in s or 'economy' in s) and
    ('uncertainty' in s or 'uncertain' in s) and
    ('tax' in s or 'policy' in s)):
    do_stuff()

Boolean search text file in Python

Tags:

python

Economist_Ayahuasca

1 Answers

glibdud

Recent Activity

Donate For Us

Boolean search text file in Python

Tags:

python

Economist_Ayahuasca

1 Answers

glibdud

Related questions

Recent Activity

Donate For Us