I'm looking for a regular expression to catch all digits in the first 7 characters in a string.
This string has 12 characters:
A12B345CD678
I would like to remove A
and B
only since they are within the first 7 chars (A12B345
) and get
12345CD678
So, the CD678
should not be touched. My current solution in R:
paste(paste(str_extract_all(substr("A12B345CD678",1,7), "[0-9]+")[[1]],collapse=""),substr("A12B345CD678",8,nchar("A12B345CD678")),sep="")
It seems too complicated. I split the string at 7 as described, match any digits in the first 7 characters and bind it with the rest of the string.
Looking for a general answer, my current solution is to split the first 7 characters and just match all digits in this sub string.
Any help appreciated.
Extract first n characters from string Select a blank cell, here I select the Cell G1, and type this formula =LEFT(E1,3) (E1 is the cell you want to extract the first 3 characters from), press Enter button, and drag fill handle to the range you want. Then you see the first 3 characters are extracted.
Use Python to Remove the First N Characters from a String Using Regular Expressions. You can use Python's regular expressions to remove the first n characters from a string, using re's . sub() method. This is accomplished by passing in a wildcard character and limiting the substitution to a single substitution.
The formula =RIGHT(A2,LEN(A2)-4) in cell B2 is used to remove the first four characters in the product code.
You can use the known SKIP-FAIL regex trick to match all the rest of the string beginning with the 8th character, and only match non-digit characters within the first 7 with a lookbehind:
s <- "A12B345CD678"
gsub("(?<=.{7}).*$(*SKIP)(*F)|\\D", "", s, perl=T)
## => [1] "12345CD678"
See IDEONE demo
The perl=T
is required for this regex to work. The regex breakdown:
(?<=.{7}).*$(*SKIP)(*F)
- matches any character but a newline (add (?s)
at the beginning if you have newline symbols in the input), as many as possible (.*
) up to the end ($
, also \\z
might be required to remove final newlines), but only if preceded with 7 characters (this is set by the lookbehind (?<=.{7})
). The (*SKIP)(*F)
verbs make the engine omit the whole matched text and advance the regex index to the position at the end of that text.|
- or...\\D
- a non-digit character.See the regex demo.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With