Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I create list from regular expressions?

Tags:

python

regex

I'm making a crawler. User can specify regular expression string to download data.

When user input form is:

http://xxx/abc[x-z]/image(9|10|11).png

I want to download these.

http://xxx/abcx/image9.png
http://xxx/abcy/image9.png
http://xxx/abcz/image9.png
http://xxx/abcx/image10.png
http://xxx/abcy/image10.png
http://xxx/abcz/image10.png
http://xxx/abcx/image11.png
http://xxx/abcy/image11.png
http://xxx/abcz/image11.png

Can I create the following list from the above regular expression string? Or, can I use each string in for-in block?

like image 588
Maiko Ohkawa Avatar asked Nov 20 '15 13:11

Maiko Ohkawa


1 Answers

If you are wanting to take a user's given regex as an input and generate a list of strings you can use the library sre_yield:

However, be very aware that trying to parse every possible string of a regex can get out of hand very quickly. You'll need to be sure that your users are aware of the implications that wildcard characters and open ended or repeating groups can have on the number of possible matching strings.

As an example, your regex string: http://xxx/abc[x-z]/image(9|10|11).png does not escape the ., which is a wildcard for any character, so it will generate a lot of unexpected strings. Instead we'll need to escape it as seen in the example below:

>>> import sre_yield

>>> links = []

>>> for each in sre_yield.AllStrings(r'http://xxx/abc[x-z]/image(9|10|11)\.png'):
        links.append(each)

Or more simply links = list(sre_yield.AllStrings(r'http://xxx/abc[x-z]/image(9|10|11)\.png'))

The result is:

>>> links

['http://xxx/abcx/image9.png', 'http://xxx/abcy/image9.png', 
'http://xxx/abcz/image9.png', 'http://xxx/abcx/image10.png', 
'http://xxx/abcy/image10.png', 'http://xxx/abcz/image10.png', 
'http://xxx/abcx/image11.png', 'http://xxx/abcy/image11.png', 
'http://xxx/abcz/image11.png']
like image 146
MrAlexBailey Avatar answered Sep 19 '22 01:09

MrAlexBailey