I have a text file which holds lots of files path file.txt:
C:\data\AS\WO\AS_WOP_1PPPPPP20070506.bin
C:\data\AS\WO\AS_WOP_1PPPPPP20070606.bin
C:\data\AS\WO\AS_WOP_1PPPPPP20070708.bin
C:\data\AS\WO\AS_WOP_1PPPPPP20070808.bin
...
What I did with Regex to extract the date from path:
import re
textfile = open('file.txt', 'r')
filetext = textfile.read()
textfile.close()
data = []
for line in filetext:
    matches = re.search("AS_[A-Z]{3}_(.{7})([0-9]{4})([0-9]{2})([0-9]{2})", line)
    data.append(line)
it does not give what I want.
My output should be like this:
year    month
2007     05
2007     06
2007     07
2007     08
and then save it as list of lists:
[['2007', '5'], ['2007', '6'], ['2007', '7'], ['2007', '8']]
or save it as a Pandas series.
is there any way with regex to get what I want !?
You can simplify your regex to this:
/(....)(..)..\.bin$/
Group 1 will have the year while Group 2 will have the month. I assume that the format is pertaining throughout the file.
Now, . represents any character and \. represents "dot" or literal .. $ means at the end of the string.
So, I'm matching .bin at the end of the line and leaving out day and just grouping year and month.
try this using pandas:
df = pd.read_csv('yourfile.txt',header=None)
df.columns = ['paths']
# pandas string method extract takes a regex
df['paths'].str.extract('(\d{4})(\d{2})')
output:
       0    1
0   2007    05
1   2007    06
2   2007    07
3   2007    08
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With