I have a dataframe that contains a 3 columns. The data looks like this <pre class="prettyprint"><code>V1 V2 V3 Auto = Chevy Engine = V6 Trans = Auto Auto = Chevy Engine = V8 Trans = Manual Auto = Chevy Engine = V10 Trans = Manual </code></pre> I want the dataframe to look like this: <pre class="prettyprint"><code>Auto Engine Trans Chevy V6 Auto Chevy V8 Manual Chevy V10 Manual </code></pre> In other words, retrieve the last string after the "=" and take the 1st value in the column and make it the column header. Or a way to just retrieve the last word of after the "=" and replace it the column without adding new columns. Can this be done in R? Many thanks!

Well, if you don't mind just using old-style (pre-Hadley) R, here's a solution: <pre class="prettyprint"><code>> x <- as.data.frame(list(c('Auto = Chevy', 'Auto = Chevy', 'Auto = Chevy'), + c('Engine = V6', 'Engine = V8', 'Engine = V10'), + c('Trans = Auto', 'Trans = Manual', 'Trans = Manual')), + stringsAsFactors=FALSE) > values <- lapply(x, gsub, pattern='.*= ', replacement='') > new.names <- lapply(x, gsub, pattern=' =.*', replacement='') > new.names <- lapply(new.names, unique) > names(values) <- new.names > new.frame <- as.data.frame(values, stringsAsFactors = FALSE) > new.frame Auto Engine Trans 1 Chevy V6 Auto 2 Chevy V8 Manual 3 Chevy V10 Manual </code></pre> It won't work for a data frame with many columns, but it will work for a narrow data frame with many rows.

Extracting last word from many data frame columns (R)

Tags:

r

I have a dataframe that contains a 3 columns. The data looks like this

V1                V2               V3
Auto = Chevy      Engine = V6      Trans = Auto
Auto = Chevy      Engine = V8      Trans = Manual
Auto = Chevy      Engine = V10     Trans = Manual

I want the dataframe to look like this:

Auto       Engine  Trans
Chevy      V6      Auto
Chevy      V8      Manual
Chevy      V10     Manual

In other words, retrieve the last string after the "=" and take the 1st value in the column and make it the column header. Or a way to just retrieve the last word of after the "=" and replace it the column without adding new columns.

Can this be done in R? Many thanks!

248

asked Jan 21 '17 03:01

Fishing101

2 Answers

Well, if you don't mind just using old-style (pre-Hadley) R, here's a solution:

> x <- as.data.frame(list(c('Auto = Chevy', 'Auto = Chevy', 'Auto = Chevy'),
+ c('Engine = V6', 'Engine = V8', 'Engine = V10'),
+ c('Trans = Auto', 'Trans = Manual', 'Trans = Manual')),
+ stringsAsFactors=FALSE)
> values <- lapply(x, gsub, pattern='.*= ', replacement='')
> new.names <- lapply(x, gsub, pattern=' =.*', replacement='')
> new.names <- lapply(new.names, unique)
> names(values) <- new.names
> new.frame <- as.data.frame(values, stringsAsFactors = FALSE)
> new.frame
   Auto Engine  Trans
1 Chevy     V6   Auto
2 Chevy     V8 Manual
3 Chevy    V10 Manual

It won't work for a data frame with many columns, but it will work for a narrow data frame with many rows.

answered Oct 21 '22 08:10

JWLM

Or, we could avoid the stringr crutch and use a highly optimized function for just such this use case in stringi (most of stringr functions wrap stringi functions):

library(stringi)
library(dplyr)

read.table(text='V1,V2,V3
"Auto = Chevy","Engine = V6","Trans = Auto"
"Auto = Chevy","Engine = V8","Trans = Manual"
"Auto = Chevy","Engine = V10","Trans = Manual"',
sep=",", header=TRUE, stringsAsFactors=FALSE) -> df

mutate_all(df, funs(stri_extract_last_words))
##      V1  V2     V3
## 1 Chevy  V6   Auto
## 2 Chevy  V8 Manual
## 3 Chevy V10 Manual

More representative tidyverse with the "column name" req that could actually break your R script if the columns aren't as you imagine:

library(stringi)
library(dplyr)
library(purrr)

read.table(text='V1,V2,V3
"Auto = Chevy","Engine = V6","Trans = Auto"
"Auto = Chevy","Engine = V8","Trans = Manual"
"Auto = Chevy","Engine = V10","Trans = Manual"',
sep=",", header=TRUE, stringsAsFactors=FALSE) -> df

mutate_all(df, funs(stri_extract_last_words)) %>%
  setNames(mutate_all(df, stri_extract_first_words) %>%
             distinct() %>%
             flatten_chr())

More tidyverse and stringi with the very much assumed requirements that could actually break your R script if the columns aren't as you imagine:

library(stringi)
library(tidyverse)

read.table(text='V1,V2,V3
"Auto = Chevy","Engine = V6","Trans = Auto"
"Auto = Chevy","Engine = V8","Trans = Manual"
"Auto = Chevy","Engine = V10","Trans = Manual"',
sep=",", header=TRUE, stringsAsFactors=FALSE) -> df

by_row(df, function(x) {
  map(x, stri_match_all_regex, "(.*) = (.*)") %>%
    map(1) %>%
    map(~setNames(.[,3], .[,2])) %>%
    flatten_df()
}) %>%
  select(.out) %>%
  unnest()
## # A tibble: 3 × 3
##    Auto Engine  Trans
##   <chr>  <chr>  <chr>
## 1 Chevy     V6   Auto
## 2 Chevy     V8 Manual
## 3 Chevy    V10 Manual

answered Oct 21 '22 08:10

hrbrmstr

Related questions
                            
                                Plot a simple conversion funnel in ggplot
                            
                                How do I count the number of unique vectors in a list?
                            
                                Why is using list() critical for .dots = setNames() uses in dplyr?
                            
                                R - Print table with columns sums below
                            
                                Make all elemants of a character vector the same length
                            
                                How to identify only "not duplicated" rows
                            
                                Create groups from vector of 0,1 and NA
                            
                                ggplot label bars in grouped bar plot
                            
                                How to slice a dataframe by selecting a range of columns and rows based on names and not indexes?
                            
                                Control which tick marks / labels appear on x-axis in plotly?
                            
                                How can I set twoside, symmetric layout for bookdown::tufte_book2()
                            
                                Accessing Error log in shiny-server deployed on AWS instance
                            
                                R: for loop creating new columns populated by conditional statement based on the previous column
                            
                                Can Rmarkdown having a non working code, knit html output showing the errors and warnings
                            
                                Prefix all columns resulting from left_join() with original table names
                            
                                How to put outputs side by side in shiny?
                            
                                File extension renaming in R
                            
                                R shiny: read a table from file and use it
                            
                                How to change the size and spacing of check boxes and radio buttons in Shiny app?
                            
                                Pasting the column name to each value of a dataframe in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With