Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove any digit only in first N characters

Tags:

regex

r

I'm looking for a regular expression to catch all digits in the first 7 characters in a string.

This string has 12 characters:

A12B345CD678

I would like to remove A and B only since they are within the first 7 chars (A12B345) and get

12345CD678

So, the CD678 should not be touched. My current solution in R:

paste(paste(str_extract_all(substr("A12B345CD678",1,7), "[0-9]+")[[1]],collapse=""),substr("A12B345CD678",8,nchar("A12B345CD678")),sep="‌​") 

It seems too complicated. I split the string at 7 as described, match any digits in the first 7 characters and bind it with the rest of the string.

Looking for a general answer, my current solution is to split the first 7 characters and just match all digits in this sub string.

Any help appreciated.

like image 493
Sebastian Avatar asked Feb 15 '16 13:02

Sebastian


People also ask

How do I extract the first n characters in Excel?

Extract first n characters from string Select a blank cell, here I select the Cell G1, and type this formula =LEFT(E1,3) (E1 is the cell you want to extract the first 3 characters from), press Enter button, and drag fill handle to the range you want. Then you see the first 3 characters are extracted.

How do I remove the first 5 characters in Python?

Use Python to Remove the First N Characters from a String Using Regular Expressions. You can use Python's regular expressions to remove the first n characters from a string, using re's . sub() method. This is accomplished by passing in a wildcard character and limiting the substitution to a single substitution.

How do I remove the first 4 digits in Excel?

The formula =RIGHT(A2,LEN(A2)-4) in cell B2 is used to remove the first four characters in the product code.


1 Answers

You can use the known SKIP-FAIL regex trick to match all the rest of the string beginning with the 8th character, and only match non-digit characters within the first 7 with a lookbehind:

s <- "A12B345CD678"
gsub("(?<=.{7}).*$(*SKIP)(*F)|\\D", "", s, perl=T)
## => [1] "12345CD678"

See IDEONE demo

The perl=T is required for this regex to work. The regex breakdown:

  • (?<=.{7}).*$(*SKIP)(*F) - matches any character but a newline (add (?s) at the beginning if you have newline symbols in the input), as many as possible (.*) up to the end ($, also \\z might be required to remove final newlines), but only if preceded with 7 characters (this is set by the lookbehind (?<=.{7})). The (*SKIP)(*F) verbs make the engine omit the whole matched text and advance the regex index to the position at the end of that text.
  • | - or...
  • \\D - a non-digit character.

See the regex demo.

like image 91
Wiktor Stribiżew Avatar answered Sep 28 '22 08:09

Wiktor Stribiżew