Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split a string between a word and a number

Tags:

regex

r

I have some text like the following:

foo_text <- c(
  "73000 PARIS   74000 LYON",
  "75 000 MARSEILLE 68483 LILLE",
  "60  MARSEILLE 68483 LILLE"
)

I'd like to separate each element in two after the first word. Expected output:

"73000 PARIS" "74000 LYON" "75000 MARSEILLE" "68483 LILLE" "60 MARSEILLE" "68483 LILLE"

Note that the number of spaces between two elements in the original text is not necessarily the same (e.g the number of spaces between PARIS and 74000 is not the same than the number of spaces between MARSEILLE and 68483). Also, sometimes the first number has a space in it (e.g 75 000) and sometimes not (e.g 73000).

I tried to adapt this answer but without success:

(delimitedString = gsub( "^([a-z]+) (.*) ([a-z]+)$", "\\1,\\2", foo_text))

Any idea how to do that?

like image 583
bretauv Avatar asked Dec 30 '25 17:12

bretauv


1 Answers

We can try using strsplit here as follows:

foo_text <- c(
    "73000 PARIS   74000 LYON",
    "75 000 MARSEILLE 68483 LILLE",
    "60  MARSEILLE 68483 LILLE"
)
output <- unlist(strsplit(foo_text, "(?<=[A-Z])\\s+(?=\\d)", perl=TRUE))
output

[1] "73000 PARIS"      "74000 LYON"       "75 000 MARSEILLE" "68483 LILLE"
[5] "60  MARSEILLE"    "68483 LILLE"

The regex pattern used here says to split when:

(?<=[A-Z])  what precedes is an uppercase letter
\\s+        split (and consume) on one or more whitespace characters
(?=\\d)     what follows is a digit
like image 176
Tim Biegeleisen Avatar answered Jan 02 '26 09:01

Tim Biegeleisen



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!