Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python regular expression across multiple lines

Tags:

python

regex

I'm gathering some info from some cisco devices using python and pexpect, and had a lot of success with REs to extract pesky little items. I'm afraid i've hit the wall on this. Some switches stack together, I have identified this in the script and used a separate routine to parse the data. If the switch is stacked you see the following (extracted from the sho ver output)

Top Assembly Part Number        : 800-25858-06
Top Assembly Revision Number    : A0
Version ID                      : V08
CLEI Code Number                : COMDE10BRA
Hardware Board Revision Number  : 0x01


Switch   Ports  Model              SW Version              SW Image
------   -----  -----              ----------              ----------
*    1   52     WS-C3750-48P       12.2(35)SE5             C3750-IPBASE-M  
     2   52     WS-C3750-48P       12.2(35)SE5             C3750-IPBASE-M
     3   52     WS-C3750-48P       12.2(35)SE5             C3750-IPBASE-M
     4   52     WS-C3750-48P       12.2(35)SE5             C3750-IPBASE-M


Switch 02 
---------
Switch Uptime                   : 11 weeks, 2 days, 16 hours, 27 minutes
Base ethernet MAC Address       : 00:26:52:96:2A:80
Motherboard assembly number     : 73-9675-15

When I encounter this I need to extract the switch number & model for each in the table of 4, (sw can be ignored, but there can be between 1 and 9 switches) It's the multiple line thing that has got me as I've been ok with the rest. Any ideas please?

OK apologies. My regex simply started looking at the last group of - until.. then I couldn't work ou where to go!
-{10]\s-{10}(.+)Switch

The model will change and the number of switches will change, I need to capture the 4 lines in this example which are

*    1   52     WS-C3750-48P       12.2(35)SE5             C3750-IPBASE-M  
     2   52     WS-C3750-48P       12.2(35)SE5             C3750-IPBASE-M
     3   52     WS-C3750-48P       12.2(35)SE5             C3750-IPBASE-M
     4   52     WS-C3750-48P       12.2(35)SE5             C3750-IPBASE-M

But each switch could be a different model and there could be between 1 and 9. For this example ideally i'd like to get

*,1,WS-C3750-48P
,2,WS-C3750-48P
,3,WS-C3750-48P
,4,WS-C3750-48P  

(the asterisk means master)
but getting those lines would set me on the right track

like image 239
user225882 Avatar asked Dec 09 '09 00:12

user225882


1 Answers

To have . match any character, including a newline, compile your RE with re.DOTALL among the options (remember, if you have multiple options, use |, the bit-or operator, between them, in order to combine them).

In this case I'm not sure you actually do need this -- why not something like

re.findall(r'(\d+)\s+\d+\s+(WS-\S+)')

assuming for example that the way you identify a "model" is that it starts with WS-? The fact that there will be newlines between one result of findall and the next one is not a problem here. Can you explain exactly how you identify a "model" and why "multiline" is an issue? Maybe you want the re.MULTILINE to make ^ match at each start-of-line, to grab your data with some reference to the start of the lines...?

like image 166
Alex Martelli Avatar answered Sep 18 '22 17:09

Alex Martelli