Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

remove pattern before digits and keep those digits

Tags:

regex

r

I have a string

text = "Math\n      \n      \n        600 rubles / 45 min."
text2 = "Math\n      \n      \n        in a group"

And I want to replace\n \n \n with " " only if digits are following. As a result, I want to have:

"Math 600 rubles / 45 min."
"Math\n      \n      \n        in a group"

I tried gsub("\n \n \n [\\d]", " ", text), but it replaces the first digit too.

like image 679
nurma_a Avatar asked Jan 30 '23 03:01

nurma_a


2 Answers

You may use a pattern that will match 3 occurrences of \n followed with 6+ spaces and then capture the digit and replace with a backreference to the Group 1:

gsub("(?:\n {6,}){3}(\\d)", " \\1", text)

See the R demo

Details

  • (?:\n {6,}){3} - 3 consecutive occurrences of:
    • \n - a newline
    • {6,} - 6 or more spaces
  • (\\d) - Group 1 (referred to with \1 from the replacement pattern): any digit.
like image 170
Wiktor Stribiżew Avatar answered Feb 01 '23 09:02

Wiktor Stribiżew


I came up with the following pattern:

gsub("\\n[[:blank:]]*\\n[[:blank:]]*\\n[[:blank:]]*(\\d+)", " \\1", text)

This pattern matches three newlines, in succession, ending with a number. It allows for an arbitrary and unfixed amount of whitespace between each newline. This makes the match flexible, and helps to avoid a misfire from counting spaces incorrectly (or new incoming data not behaving as you expect).

The main problems I see with your current call to gsub is that you are using fixed width spaces in between newlines. Also, [\\d] is never used in the replacement. Hence, you are consuming that number but it won't show up the replacement.

Demo

like image 29
Tim Biegeleisen Avatar answered Feb 01 '23 07:02

Tim Biegeleisen