Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract substring and numbers from a string in R

Tags:

regex

r

I have several strings, following are some examples.

rfoutputtablep7q10000t20000c100
rfoutputtablep7q1000t20000c100
svmLinear2outputtablep7q20000t20000c100
svmLinear2outputtablep7q5000t20000c100

I want to make a data frame with columns: algorithm, p, q, t, and c and extract the values from these strings. So the stuff before "outputtable" is the algorithm, the number after "p" is the value of p, number after "q" is the value of q, etc.

How can this data frame be created?

like image 902
Jack Arnestad Avatar asked Nov 21 '17 16:11

Jack Arnestad


People also ask

How to extract strings that contains a particular substring in R?

How to extract strings that contains a particular substring in an R vector? Suppose we have a vector that contains multiple string elements and we want to find out which string element has a particular substring. This can be done with the help of grep function.

How to extract first n characters of the column in R?

Extract first n characters of the column in R Method 1: In the below example we have used substr () function to find first n characters of the column in R. substr () function takes column name, starting position and length of the strings as argument, which will return the substring of the specific column as shown below. 1

How to extract number from string in R data frame?

To extract number from string in R data frame, we can follow the below steps − First of all, create a data frame. Then, use gsub function to extract number from string.

How many numeric values can RStudio extract from a string?

As you can see based on the previous RStudio output, we have extracted a vector of three numeric values. Note that the previous R code only extracted the first numeric element of our character strings.


1 Answers

Using base R only.

res <- do.call(rbind, strsplit(y, 'outputtable|p|q|t|c'))
res <- as.data.frame(res[, -2])
res[-1] <- lapply(res[-1], function(x) as.numeric(as.character(x)))
names(res) <- c("algorithm", "p", "q", "t", "c")
res
#   algorithm p     q     t   c
#1         rf 7 10000 20000 100
#2         rf 7  1000 20000 100
#3 svmLinear2 7 20000 20000 100
#4 svmLinear2 7  5000 20000 100

DATA.

y <- scan(text = '"rfoutputtablep7q10000t20000c100"
"rfoutputtablep7q1000t20000c100"
"svmLinear2outputtablep7q20000t20000c100"
"svmLinear2outputtablep7q5000t20000c100"',
what = character())
like image 193
Rui Barradas Avatar answered Nov 05 '22 00:11

Rui Barradas