Date list in text

Question

I have a text document with 32 articles in it and I want to spot each article's date. I have observed that the date comes on the 5th row of each article. So far I have split the text into the 32 articles using:

import re 
sections = [] 
current = []
with open("Aberdeen2005.txt") as f:
    for line in f:
        if re.search(r"(?i)\d+ of \d+ DOCUMENTS", line):
           sections.append("".join(current))
           current = [line]
        else:
           current.append(line)

print(len(sections))

I will like to create a list that contains the date for each article, MONTH and YEAR only: enter image description here

As it can be seen, date comes in the format from the above picture, but sometimes the day is not included, e.g. Thursday.

Any ideas?

Kind regards,

Andres

Ps. Here is another example of the 16 document: enter image description here

l'L'l · Accepted Answer

Using regex underneath the if statement you could replace the day:

regx = re.compile(ur'(\w+\s\d{1,2},\s\d{4})\s\w{6,9}')
line = re.sub(regx, "\1", line)

Example:

https://regex101.com/r/pJ0nZ8/1

linecache method:

Using the linecache module you can specifically capture line 5 and write it to a file; if a date includes the weekday it will be truncated. It's possible to do a lot more with this functionality, although I'll leave the finer details up to you.

import linecache

w = 'Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday'
l = linecache.getline("Aberdeen2005.txt",5)
m = [d in l for d in w]
c = '2005','2016' # years (optional)

if any(y in l for y in c): # check for years (optional)

    if any(x in l for x in w):
        r = [i for i,v in enumerate(m,0) if v]
        l = l.replace(' '+w[r[0]],'')

    with open("dates.txt", "a") as article_dates:
        article_dates.write(l)

linecache.clearcache()

Date list in text

Tags:

python

Economist_Ayahuasca

1 Answers

l'L'l

Recent Activity

Donate For Us

Date list in text

Tags:

python

Economist_Ayahuasca

1 Answers

l'L'l

Related questions

Recent Activity

Donate For Us