Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use Python regular expression to process zookeeper logfiles?

Ive got zookeeper logs like the following:

2019-09-25 11:16:39,253 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x16d666b95e10002, likely client has closed socket
        at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
        at java.lang.Thread.run(Thread.java:745)
2019-09-25 11:16:39,260 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /192.168.101.231:48311 which had sessionid 0x16d666b95e10002
2019-09-25 11:16:40,000 [myid:] - INFO  [SessionTracker:ZooKeeperServer@358] - Expiring session 0x36b63c29fbac528, timeout of 10000ms exceeded

I am trying to get the following results:

log entry 1:
2019-09-25 11:16:39,253 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x16d666b95e10002, likely client has closed socket
        at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
        at java.lang.Thread.run(Thread.java:745)

log entry 2:
2019-09-25 11:16:39,260 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /192.168.101.231:48311 which had sessionid 0x16d666b95e10002

log entry 3:
2019-09-25 11:16:39,260 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /192.168.101.231:48311 which had sessionid 0x16d666b95e10002
2019-09-25 11:16:40,000 [myid:] - INFO  [SessionTracker:ZooKeeperServer@358] - Expiring session 0x36b63c29fbac528, timeout of 10000ms exceeded

I tried using the following regular expression pattern:

import re

content = "2019-09-25 11:16:39,253 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception\n \
EndOfStreamException: Unable to read additional data from client sessionid 0x16d666b95e10002, likely client has closed socket\n \
        at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)\n \
        at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)\n \
        at java.lang.Thread.run(Thread.java:745)\n \
2019-09-25 11:16:39,260 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /192.168.101.231:48311 which had sessionid 0x16d666b95e10002\n \
2019-09-25 11:16:40,000 [myid:] - INFO  [SessionTracker:ZooKeeperServer@358] - Expiring session 0x36b63c29fbac528, timeout of 10000ms exceeded\n \
"

pattern = re.compile("(\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}.*)+",re.DOTALL|re.MULTILINE)

match = re.match(pattern, content)
for f in match.groups():
    print(f,"\nEND")

but it matched the whole content:

2019-09-25 11:16:39,253 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
 EndOfStreamException: Unable to read additional data from client sessionid 0x16d666b95e10002, likely client has closed socket
         at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
         at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
         at java.lang.Thread.run(Thread.java:745)
 2019-09-25 11:16:39,260 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /192.168.101.231:48311 which had sessionid 0x16d666b95e10002
 2019-09-25 11:16:40,000 [myid:] - INFO  [SessionTracker:ZooKeeperServer@358] - Expiring session 0x36b63c29fbac528, timeout of 10000ms exceeded

END

does anyone know how to fix this? much would be appreciated!

like image 914
Zhi He Avatar asked Sep 13 '25 10:09

Zhi He


1 Answers

You can try the following regex:

\d{4}(?:-\d{2}){2} \d{2}(?::\d{2}){2},\d{3}(?:(?!\d{4}(?:-\d{2}){2} \d{2}(?::\d{2}){2},\d{3})[\s\S])*

Click for Demo

Explanation:

  • \d{4}(?:-\d{2}){2} \d{2}(?::\d{2}){2},\d{3} - matches a timestamp of pattern XXXX-XX-XX XX:XX:XX,XXX where X is a digit
  • (?:(?!\d{4}(?:-\d{2}){2} \d{2}(?::\d{2}){2},\d{3})[\s\S])* - matches 0+ occurrences of any character as long as it does not start with another time-stamp of the format mentioned in pointer 1 above.

You can find the working Python code here.

like image 184
Gurmanjot Singh Avatar answered Sep 15 '25 01:09

Gurmanjot Singh