Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regexp for finding substring(vehicle number plate) in matlab from long string?

Tags:

regex

matlab

I have a long string like "sdnak hsd fds fnsdf APsdf09sdf BN fddsdalf 7886sd f" from this string I have to extract "AP09BN7886" which is actually a vehicle's licence plate number in INDIA. I know possibly the easiest is to use regular expression, can anybody tell me the reg. exp to find this.

like image 600
Arpit Avatar asked Dec 30 '25 13:12

Arpit


2 Answers

As I understand it, you just need to remove the lower-case letters and spaces. Probably the most efficient solution would not to use regular expressions but the following code:

s = 'sdnak hsd fds fnsdf APsdf09sdf BN fddsdalf 7886sd f';
s(s~=upper(s) | s==32) = [];

Best,

like image 199
Ratbert Avatar answered Jan 01 '26 09:01

Ratbert


As I understand the format, Indian license plates are:

  1. Two uppercase latin letters representing the Indian state - so that is [A-Z]{2}
  2. Two digits representing the sequential number within that state, so that is \d{2}
  3. Then four digits prefixed by letters when they run out of digits; so [A-Z]*\d{4}

(Number 1 and number 2 combined are the RTO by state and district)

You are trying to gather components of that from a long string, and you have not stated enough detail of the other parts of the string to completely eliminate ambiguity (for example: The letters are all lower case or non latin characters? Only one license plate per string? Only two letters prefixing the four digits? etc) Since this is not known, you can represent the 'sea' of characters between match groups with .*?

The variable letters prior to the 4 digit number is particularly ambiguous, since there could be unrelated uppercase latin letters in the string.

You can use the following regex find the examples but I would not say that will find every commbo in all the strings out there:

([A-Z]{2}).*?(\d{2}).*?([A-Z]{2}).*?(\d{4})

Then combine the four capture groups.

Demo

If you want to be more specific, use alteration for the first two letters according to the two letter codes for the states of India.

eg:

(AS|NL|MH  etc).*?(\d{2}).*?([A-Z]{2}).*?(\d{4})

Or, use a more comprehensive match by RTO

If you can further refine what is 'other' between the capture groups, you can use a negative character class or an anchor with .* to be more accurate:

# pseudo regex...
# XXX is (match or skip or anchor the in between stuff...)

(AS|NL|MH  etc)XXX(\d{2})XXX([A-Z]*)XXX(\d{4})

Then you need to think amout exceptions (special Delhi region codes, diplomatic plates, vanity plates, military plates, o boy...)

like image 36
dawg Avatar answered Jan 01 '26 09:01

dawg



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!