Regex to extract numbers and trailing letter or white space

Tags:

regex

r

I'm currently trying to extract data from strings that are always in the same format (scraped from social sites with no API support)

example of strings

53.2k Followers, 11 Following, 1,396 Posts
5m Followers, 83 Following, 1.1m Posts

I'm currently using the following regex expression: "[0-9]{1,5}([,.][0-9]{1,4})?" to get the numeric sections, preserving the comma and dot separators.

It yields results like

53.2, 11, 1,396 
5, 83, 1.1

I really need a regular expression that will also grab the character after the numeric sections, even if it's a white-space. i.e.

53.2k, 11 , 1,396
5m, 83 , 1.1m

Any help is greatly appreciated

R code for reproduction

  library(stringr)

  string1 <- ("536.2k Followers, 83 Following, 1,396 Posts")
  string2 <- ("5m Followers, 83 Following, 1.1m Posts")

  info <- str_extract_all(string1,"[0-9]{1,5}([,.][0-9]{1,4})?")
  info2 <- str_extract_all(string2,"[0-9]{1,5}([,.][0-9]{1,4})?")

  info 
  info2

270

asked Mar 18 '19 03:03

Permafrost

1 Answers

I would suggest the following regex pattern:

[0-9]{1,3}(?:,[0-9]{3})*(?:\\.[0-9]+)?[A-Za-z]*

This pattern generates the outputs you expect. Here is an explanation:

[0-9]{1,3}      match 1 to 3 initial digits
(?:,[0-9]{3})*  followed by zero or more optional thousands groups
(?:\\.[0-9]+)?  followed by an optional decimal component
[A-Za-z]*       followed by an optional text unit

I tend to lean towards base R solutions whenever possible, and here is one using gregexpr and regmatches:

txt <- "53.2k Followers, 11 Following, 1,396 Posts"
m <- gregexpr("[0-9]{1,3}(?:,[0-9]{3})*(?:\\.[0-9]+)?[A-Za-z]*", txt)
regmatches(txt, m)

[[1]]
[1] "53.2k"   "11"   "1,396"

answered Sep 22 '22 15:09

Tim Biegeleisen

Related questions
                            
                                same area for all violins independent of facets in ggplot2
                            
                                Executing a stored oracle procedure in R using ROracle
                            
                                Importing xlsx data to R when numbers have a comma as decimal separator
                            
                                Mount local volume accessible to R/RStudio in docker (tidyverse)
                            
                                How to use RMixpanel library to export data in csv
                            
                                Remove single dplyr group_by group
                            
                                Comparing simple features {sf} and Spatial objects {sp}: speed and memory
                            
                                Simulate data for logistic regression with fixed r2
                            
                                Using R plumber for web API over HTTPS
                            
                                Multiple inputs to reactive value R Shiny
                            
                                Build a RStudio addin to debug pipe chains
                            
                                How to reverse secondary continuous_y_axis in ggplot2
                            
                                r, write_csv is changing all times/dates to UTC
                            
                                How to write nested functions with dplyr and dots elipse?
                            
                                table with long text, bullet points and specific table width
                            
                                r calculating rolling average with window based on value (not number of rows or date/time variable)
                            
                                pandas stack and unstack performance reduces after dataframe compression and is much worse than R's data.table
                            
                                shiny: Update input without reactives getting triggered?
                            
                                Remove repeated elements in a string with R
                            
                                Extract the best parameters from cva.glmnet object

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With