I have a dataframe that contains a 3 columns. The data looks like this
V1 V2 V3
Auto = Chevy Engine = V6 Trans = Auto
Auto = Chevy Engine = V8 Trans = Manual
Auto = Chevy Engine = V10 Trans = Manual
I want the dataframe to look like this:
Auto Engine Trans
Chevy V6 Auto
Chevy V8 Manual
Chevy V10 Manual
In other words, retrieve the last string after the "=" and take the 1st value in the column and make it the column header. Or a way to just retrieve the last word of after the "=" and replace it the column without adding new columns.
Can this be done in R? Many thanks!
To extract the substring of the column in R we use functions like substr() , str_sub() or str_extract() function.
To pick out single or multiple columns use the select() function. The select() function expects a dataframe as it's first input ('argument', in R language), followed by the names of the columns you want to extract with a comma between each name.
Well, if you don't mind just using old-style (pre-Hadley) R, here's a solution:
> x <- as.data.frame(list(c('Auto = Chevy', 'Auto = Chevy', 'Auto = Chevy'),
+ c('Engine = V6', 'Engine = V8', 'Engine = V10'),
+ c('Trans = Auto', 'Trans = Manual', 'Trans = Manual')),
+ stringsAsFactors=FALSE)
> values <- lapply(x, gsub, pattern='.*= ', replacement='')
> new.names <- lapply(x, gsub, pattern=' =.*', replacement='')
> new.names <- lapply(new.names, unique)
> names(values) <- new.names
> new.frame <- as.data.frame(values, stringsAsFactors = FALSE)
> new.frame
Auto Engine Trans
1 Chevy V6 Auto
2 Chevy V8 Manual
3 Chevy V10 Manual
It won't work for a data frame with many columns, but it will work for a narrow data frame with many rows.
Or, we could avoid the stringr
crutch and use a highly optimized function for just such this use case in stringi
(most of stringr
functions wrap stringi
functions):
library(stringi)
library(dplyr)
read.table(text='V1,V2,V3
"Auto = Chevy","Engine = V6","Trans = Auto"
"Auto = Chevy","Engine = V8","Trans = Manual"
"Auto = Chevy","Engine = V10","Trans = Manual"',
sep=",", header=TRUE, stringsAsFactors=FALSE) -> df
mutate_all(df, funs(stri_extract_last_words))
## V1 V2 V3
## 1 Chevy V6 Auto
## 2 Chevy V8 Manual
## 3 Chevy V10 Manual
More representative tidyverse with the "column name" req that could actually break your R script if the columns aren't as you imagine:
library(stringi)
library(dplyr)
library(purrr)
read.table(text='V1,V2,V3
"Auto = Chevy","Engine = V6","Trans = Auto"
"Auto = Chevy","Engine = V8","Trans = Manual"
"Auto = Chevy","Engine = V10","Trans = Manual"',
sep=",", header=TRUE, stringsAsFactors=FALSE) -> df
mutate_all(df, funs(stri_extract_last_words)) %>%
setNames(mutate_all(df, stri_extract_first_words) %>%
distinct() %>%
flatten_chr())
More tidyverse and stringi
with the very much assumed requirements that could actually break your R script if the columns aren't as you imagine:
library(stringi)
library(tidyverse)
read.table(text='V1,V2,V3
"Auto = Chevy","Engine = V6","Trans = Auto"
"Auto = Chevy","Engine = V8","Trans = Manual"
"Auto = Chevy","Engine = V10","Trans = Manual"',
sep=",", header=TRUE, stringsAsFactors=FALSE) -> df
by_row(df, function(x) {
map(x, stri_match_all_regex, "(.*) = (.*)") %>%
map(1) %>%
map(~setNames(.[,3], .[,2])) %>%
flatten_df()
}) %>%
select(.out) %>%
unnest()
## # A tibble: 3 × 3
## Auto Engine Trans
## <chr> <chr> <chr>
## 1 Chevy V6 Auto
## 2 Chevy V8 Manual
## 3 Chevy V10 Manual
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With