Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I make a regular expression which ignores repeated strings, but checks for future valid strings?

Tags:

regex

The Issue At Hand

I have a CAN log file which contains a series of messages in the following format. I've identified each string in the log file by naming it 'String n:' followed by the actual content of the file.

String 1: 01 3E 55 55 55 55 55 55
String 2: 01 7E 00 00 00 00 00 00
String 3: 21 51 00 00 66 63 51 00
String 4: 22 00 00 00 00 37 41 31
String 5: 30 00 00 55 55 55 55 55

There is more content on each log line, but this regex will only be run once I've extracted just this portion of each line from the original log file contents. I've provided a sample of a raw log line below, just in case that somehow helps anyone figure this out more easily.

Sample Line: 2023-07-07 05:07:48.896 Tx 7e0 01 3E 55 55 55 55 55 55

I'd like to make a regular expression which only returns back out pairs of characters before where I see all 00 for the remainder of a string, or 55 for the remainder of a string. I'm expecting to see results as follows for the 5 input strings, but I can't seem to build the correct regular expression to produce these results.

String 1: 01 3E
String 2: 01 7E
String 3: 21 51 00 00 66 63 51
String 4: 22 00 00 00 00 37 41 31
String 5: 30

Can someone help me build this regex correctly?


What I've Tried

I've tried using positive lookahead regular expression patterns, but no matter how I try and configure my positive lookaheads, I am struggling to get the right characters back. I'm always either dropping one pair of characters (the 3E in string 1, or the 7E in string 2), or I'm not getting matches at all (string 5 gives me back nothing). I've dropped the regex I've been messing with below along with an example of what it's not returning out.

Regular Expression: ([0-9A-F]{2,} (?!55|00))+
String 1: Returns 01
String 2: Returns 01
String 3: Returns 21 00 66 63 (No idea how to fix this issue)
String 4: Returns 00 37 41 (Again, no idea how to fix this issue)
String 5: Returns null (Why doesn't it even see the 30?)

like image 532
zwalsh57 Avatar asked Sep 17 '25 13:09

zwalsh57


1 Answers

You can match any chars up to the first occurrence of spaces and then 55 or 00 repeated till the end of the line with

^.*?(?=(?: (?:00|55))*$)

See the regex demo.

Details:

  • ^ - start of the string (or line)
  • .*? - any zero or more chars other than line break chars as few as possible
  • (?=(?: (?:00|55))*$) - a positive lookahead that matches a location that is immediately followed with zero or more repetitions of a space + 00 or 55 till the end of the string/line.

UPDATE

To match these texts inside larger strings, you can use

(?<!\S)[a-fA-F0-9]{2}(?: [a-fA-F0-9]{2})*?(?=(?: (?:00|55))*$)

See the regex demo.

Details:

  • (?<!\S) - left-hand whitespace boundary
  • [a-fA-F0-9]{2} - two hex chars
  • (?: [a-fA-F0-9]{2})*? - zero or more, but as few as possible, occurrences of a space + two hex chars
  • (?=(?: (?:00|55))*$) - a positive lookahead that matches a position immediately followed with zero or more repetitions of a space and then either 00 or 55, till end of string.

This works for you as you extract a single match from any given input string.

like image 82
Wiktor Stribiżew Avatar answered Sep 20 '25 07:09

Wiktor Stribiżew