I have a lot of strings, and each of which tends to have the following format: Ab_Cd-001234.txt
I want to replace it with 001234
. How can I achieve it in R?
In this method to extract numbers from character string vector, the user has to call the gsub() function which is one of the inbuilt function of R language, and pass the pattern for the first occurrence of the number in the given strings and the vector of the string as the parameter of this function and in return, this ...
The following example shows how you can use the replaceAll() method to extract all digits from a string in Java: // string contains numbers String str = "The price of the book is $49"; // extract digits only from strings String numberOnly = str. replaceAll("[^0-9]", ""); // print the digitts System. out.
The stringr package has lots of handy shortcuts for this kind of work:
# input data following @agstudy data <- c('Ab_Cd-001234.txt','Ab_Cd-001234.txt') # load library library(stringr) # prepare regular expression regexp <- "[[:digit:]]+" # process string str_extract(data, regexp) Which gives the desired result: [1] "001234" "001234"
To explain the regexp a little:
[[:digit:]]
is any number 0 to 9
+
means the preceding item (in this case, a digit) will be matched one or more times
This page is also very useful for this kind of string processing: http://en.wikibooks.org/wiki/R_Programming/Text_Processing
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With