I have a text file which holds lots of files path file.txt:
C:\data\AS\WO\AS_WOP_1PPPPPP20070506.bin
C:\data\AS\WO\AS_WOP_1PPPPPP20070606.bin
C:\data\AS\WO\AS_WOP_1PPPPPP20070708.bin
C:\data\AS\WO\AS_WOP_1PPPPPP20070808.bin
...
What I did with Regex to extract the date from path:
import re
textfile = open('file.txt', 'r')
filetext = textfile.read()
textfile.close()
data = []
for line in filetext:
matches = re.search("AS_[A-Z]{3}_(.{7})([0-9]{4})([0-9]{2})([0-9]{2})", line)
data.append(line)
it does not give what I want.
My output should be like this:
year month
2007 05
2007 06
2007 07
2007 08
and then save it as list of lists:
[['2007', '5'], ['2007', '6'], ['2007', '7'], ['2007', '8']]
or save it as a Pandas series.
is there any way with regex
to get what I want !?
You can simplify your regex to this:
/(....)(..)..\.bin$/
Group 1 will have the year while Group 2 will have the month. I assume that the format is pertaining throughout the file.
Now, .
represents any character and \.
represents "dot" or literal .
. $
means at the end of the string.
So, I'm matching .bin
at the end of the line and leaving out day and just grouping year and month.
try this using pandas:
df = pd.read_csv('yourfile.txt',header=None)
df.columns = ['paths']
# pandas string method extract takes a regex
df['paths'].str.extract('(\d{4})(\d{2})')
output:
0 1
0 2007 05
1 2007 06
2 2007 07
3 2007 08
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With