Use Regex to extract file path and save it in python

Question

I have a text file which holds lots of files path file.txt:

C:\data\AS\WO\AS_WOP_1PPPPPP20070506.bin
C:\data\AS\WO\AS_WOP_1PPPPPP20070606.bin
C:\data\AS\WO\AS_WOP_1PPPPPP20070708.bin
C:\data\AS\WO\AS_WOP_1PPPPPP20070808.bin
...

What I did with Regex to extract the date from path:

import re

textfile = open('file.txt', 'r')
filetext = textfile.read()
textfile.close()

data = []

for line in filetext:
    matches = re.search("AS_[A-Z]{3}_(.{7})([0-9]{4})([0-9]{2})([0-9]{2})", line)
    data.append(line)

it does not give what I want.

My output should be like this:

year    month
2007     05
2007     06
2007     07
2007     08

and then save it as list of lists:

[['2007', '5'], ['2007', '6'], ['2007', '7'], ['2007', '8']]

or save it as a Pandas series.

is there any way with regex to get what I want !?

Amit Joki · Accepted Answer

You can simplify your regex to this:

/(....)(..)..\.bin$/

Group 1 will have the year while Group 2 will have the month. I assume that the format is pertaining throughout the file.

Now, . represents any character and \. represents "dot" or literal .. $ means at the end of the string. So, I'm matching .bin at the end of the line and leaving out day and just grouping year and month.

JAB · Answer

try this using pandas:

df = pd.read_csv('yourfile.txt',header=None)
df.columns = ['paths']
# pandas string method extract takes a regex
df['paths'].str.extract('(\d{4})(\d{2})')

output:

       0    1
0   2007    05
1   2007    06
2   2007    07
3   2007    08

Use Regex to extract file path and save it in python

Tags:

python

regex

pandas

GeoCom

2 Answers

Amit Joki

JAB

Recent Activity

Donate For Us

Use Regex to extract file path and save it in python

Tags:

python

regex

pandas

GeoCom

2 Answers

Amit Joki

JAB

Related questions

Recent Activity

Donate For Us