Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract all numbers from a single string in R

Tags:

Let's imagine you have a string:

strLine <- "The transactions (on your account) were as follows: 0 3,000 (500) 0 2.25 (1,200)" 

Is there a function that strips out the numbers into an array/vector producing the following required solution:

result <- c(0, 3000, -500, 0, 2.25, -1200)? 

i.e.

result[3] = -500 

Notice, the numbers are presented in accounting form so negative numbers appear between (). Also, you can assume that only numbers appear to the right of the first occurance of a number. I am not that good with regexp so would appreciate it if you could help if this would be required. Also, I don't want to assume the string is always the same so I am looking to strip out all words (and any special characters) before the location of the first number.

like image 306
Bertie Avatar asked Oct 04 '12 12:10

Bertie


People also ask

How do I convert a string to a number in R?

To convert strings to integers in R, use the strtoi() function. The strtoi() is a built-in function that converts strings to integers. The strtoi() function accepts two arguments and returns the integers.

How do I remove a character from a string in R?

How to remove a character or multiple characters from a string in R? You can either use R base function gsub() or use str_replace() from stringr package to remove characters from a string or text.

How do I extract the first character of a string in R?

Using the substr() function We can get the first character of a string by using the built-in substr() function in R. The substr() function takes 3 arguments, the first one is a string, the second is start position, third is end position.


2 Answers

library(stringr) x <- str_extract_all(strLine,"\\(?[0-9,.]+\\)?")[[1]] > x [1] "0"       "3,000"   "(500)"   "0"       "2.25"    "(1,200)" 

Change the parens to negatives:

x <- gsub("\\((.+)\\)","-\\1",x) x [1] "0"      "3,000"  "-500"   "0"      "2.25"   "-1,200" 

And then as.numeric() or taRifx::destring to finish up (the next version of destring will support negatives by default so the keep option won't be necessary):

library(taRifx) destring( x, keep="0-9.-") [1]    0 3000  -500    0    2.25 -1200 

OR:

as.numeric(gsub(",","",x)) [1]     0  3000  -500     0     2.25 -1200 
like image 110
Ari B. Friedman Avatar answered Sep 22 '22 00:09

Ari B. Friedman


Here's the base R way, for the sake of completeness...

x <- unlist(regmatches(strLine, gregexpr('\\(?[0-9,.]+', strLine))) x <- as.numeric(gsub('\\(', '-', gsub(',', '', x))) [1]     0.00  3000.00  -500.00     0.00     2.25 -1200.00 
like image 32
Matthew Plourde Avatar answered Sep 26 '22 00:09

Matthew Plourde