Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract part of string according to pattern using regular expression Python

I have a files that follow a specific format which look something like this:

test_0800_20180102_filepath.csv
anotherone_0800_20180101_hello.csv

The numbers in the middle represent timestamps, so I would like to extract that information. I know that there is a specific pattern which will always be _time_date_, so essentially I want the part of the string that lies between the first and third underscores. I found some examples and somehow similar problems, but I am new to Python and I am having trouble adapting them.

This is what I have implemented thus far:

datetime = re.search(r"\d+_(\d+)_", "test_0800_20180102_filepath.csv")

But the result I get is only the date part:

20180102

But what I actually need is:

0800_20180101
like image 549
Nisfa Avatar asked Jan 10 '18 09:01

Nisfa


People also ask

How do I extract a specific word from a string in Python A Using fetch () B using extract () C using Find () d all of these?

We can use search() method from re module to find the first occurrence of the word and then we can obtain the word using slicing. re.search() method will take the word to be extracted in regular expression form and the string as input and and returns a re.

How can we extract a part from a string?

The substr() method extracts a part of a string. The substr() method begins at a specified position, and returns a specified number of characters. The substr() method does not change the original string. To extract characters from the end of the string, use a negative start position.


1 Answers

That's quite simple:

match = re.search(r"_((\d+)_(\d+))_", your_string)

print(match.group(1))  # print time_date >> 0800_20180101
print(match.group(2))  # print time >> 0800
print(match.group(3))  # print date >> 20180101

Note that for such tasks the group operator () inside the regexp is really helpful, it allows you to access certain substrings of a bigger pattern without having to match each one individually (which can sometimes be much more ambiguous than matching a larger one).

The order in which you then access the groups is from 1-n_specified, where group 0 is the whole matched pattern. Groups themselves are assigned from left to right, as defined in your pattern.

On a side note, if you have control over it, use unix timestamps so you only have one number defining both date and time universally.

like image 82
meow Avatar answered Oct 02 '22 16:10

meow