Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Boolean search text file in Python

Tags:

python

I have a text file with 32 articles. Each article starts with the expression: <Number> of 32 DOCUMENTS, for example: 1 of 32 DOCUMENTS, 2 of 32 DOCUMENTS, etc. In order to find each article I have used the following code:

import re 
sections = [] 
current = []
with open("Aberdeen2005.txt") as f:
    for line in f:
        if re.search(r"(?i)\d+ of \d+ DOCUMENTS", line):        
           sections.append("".join(current))
           current = [line]
        else:
           current.append(line)

print(len(sections)) 

So now, articles are represented by the expression sections

The next thing I want to do, is to subgroup the articles in 2 groups. Those articles containing the words: economy OR economic AND uncertainty OR uncertain AND tax OR policy, identify them with the number 1.

Whereas those articles containing the following words: economy OR economic AND uncertain OR uncertainty AND regulation OR spending, identify them with the number 2. This is what I have tried so far:

for i in range(len(sections)):
group1 = re.search(r"+[economic|economy].+[uncertainty|uncertain].+[tax|policy]", , sections[i])
group2 = re.search(r"+[economic|economy].+[uncertainty|uncertain].+[regulation|spending]", , sections[i])

Nevertheless, it does not seem to work. Any ideas why?

like image 249
Economist_Ayahuasca Avatar asked Mar 31 '26 16:03

Economist_Ayahuasca


1 Answers

It's a bit wordy, but you can get away without using regular expressions here, for example:

# Take a lowercase copy for comparisons
s = sections[i].lower()
if (('economic' in s or 'economy' in s) and
    ('uncertainty' in s or 'uncertain' in s) and
    ('tax' in s or 'policy' in s)):
    do_stuff()
like image 108
glibdud Avatar answered Apr 03 '26 04:04

glibdud



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!