Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regex for finding file paths

Tags:

python

regex

I used this regex(\/.*\.[\w:]+) to find all file paths and directories. But in a line like this "file path /log/file.txt some lines /log/var/file2.txt" which contains two paths in the same line , it does not select the paths individually , rather , it selects the whole line. How to solve this?

like image 477
Sriram Avatar asked May 31 '18 06:05

Sriram


2 Answers

Use regex(\/.*?\.[\w:]+) to make regex non-greedy. If you want to find multiple matches in the same line, you can use re.findall().

Update: Using this code and the example provided, I get:

import re
re.findall(r'(\/.*?\.[\w:]+)', "file path /log/file.txt some lines /log/var/file2.txt")
['/log/file.txt', '/log/var/file2.txt']
like image 55
Jonas Avatar answered Oct 27 '22 19:10

Jonas


Your regex (\/.*\.[\w:]+) uses .* which is greedy and would match [\w:]+ after the last dot in file2.txt. You could use .*? instead.

But it would also match /log////var////.txt

As an alternative you might use a repeating non greedy pattern that would match the directory structure (?:/[^/]+)+? followed by a part that matches the filename /\w+\.\w+

(?:/[^/]+)+?/\w+\.\w+

import re
s = "file path /log/file.txt some lines /log/var/file2.txt or /log////var////.txt"
print(re.findall(r'(?:/[^/]+)+?/\w+\.\w+', s))

That would result in:

['/log/file.txt', '/log/var/file2.txt']

Demo

like image 4
The fourth bird Avatar answered Oct 27 '22 17:10

The fourth bird