Split column name and convert data from wide to long format in R

Tags:

I have a large dataset that I need to convert to long format from wide format. That should be simple enough and there are lots of examples of how to do that on this forum. However, in this case, I also need to split the column headers used in the wide format and create a column for each of them in the long format.

Example dataset

 data <- data.frame("East2010"=1:3, "West2010"=4:6, "East2011"=7:9, "West2011"=5:7)
 data
 East.2010 West.2010 East.2011 West.2011
 1         1         4         7         5
 2         2         5         8         6
 3         3         6         9         7

What I want is something like this

 Site   Year   Response
 East   2010   1
 East   2010   2
 East   2010   3
 West   2010   4
 West   2010   5
 West   2010   6
 East   2011   7
 East   2011   8
 East   2011   9
 West   2011   5
 West   2011   6
 West   2011   7

I have looked a lot of examples on this forum that will melt data to convert to long format and others that do a column split at a delimiter, but I have not been able to make the two work together.

699

asked Aug 12 '14 18:08

user3799924

3 Answers

Here's the "modern day" :-) approach for this:

library(dplyr)
library(tidyr)
data %>%
  gather(var, Response, East2010:West2011) %>%  ## Makes wide data long
  separate(var, c("Site", "Year"), sep = -5)    ## Splits up a column
#    Site Year Response
# 1  East 2010        1
# 2  East 2010        2
# 3  East 2010        3
# 4  West 2010        4
# 5  West 2010        5
# 6  West 2010        6
# 7  East 2011        7
# 8  East 2011        8
# 9  East 2011        9
# 10 West 2011        5
# 11 West 2011        6
# 12 West 2011        7

The sep = -5 from above says to go from the end of the string backwards five characters and split there. Thus, if you had "North2010" as a possible name, this would still work.

That said, it is more reliable to use a regular expression like @David's, which is also possible within separate:

data %>%
  gather(var, Response, East2010:West2011) %>%
  separate(var, c("Site", "Year"), 
           sep = "(?<=[[:alpha:]])(?=[[:digit:]])", 
           perl = TRUE)

answered Oct 20 '22 23:10

A5C1D2H2I1M1N2O1R2T1

Or (in case the column width is not always constant). Here I use a "lookahead" and "lookbehind" in order to separate characters from digits.

library(reshape2)
data <- melt(data)
temp <- strsplit(as.character(data$variable), "(?<=[[:alpha:]])(?=[[:digit:]])", perl = TRUE)
transform(data, Site = sapply(temp, "[", 1), Year = sapply(temp, "[", 2))

#   variable value Site Year
#1  East2010     1 East 2010
#2  East2010     2 East 2010
#3  East2010     3 East 2010
#4  West2010     4 West 2010
#5  West2010     5 West 2010
#6  West2010     6 West 2010
#7  East2011     7 East 2011
#8  East2011     8 East 2011
#9  East2011     9 East 2011
#10 West2011     5 West 2011
#11 West2011     6 West 2011
#12 West2011     7 West 2011

answered Oct 20 '22 23:10

David Arenburg

Something along these lines would work:

library("plyr")
library("reshape2")
m.data <- melt(data)
m.data <- mutate(m.data, Site=substr(variable, 1,4), 
    Year=substr(variable, 5,8))

Which would result in:

> m.data
   variable value Site Year
1  East2010     1 East 2010
2  East2010     2 East 2010
3  East2010     3 East 2010
4  West2010     4 West 2010
5  West2010     5 West 2010
6  West2010     6 West 2010
7  East2011     7 East 2011
8  East2011     8 East 2011
9  East2011     9 East 2011
10 West2011     5 West 2011
11 West2011     6 West 2011
12 West2011     7 West 2011

answered Oct 20 '22 22:10

landroni

Related questions
                            
                                R multiple urls into lapply
                            
                                Change title fontsize in heatmap.2 function?
                            
                                optim function argument missing
                            
                                How do I make my facets perfectly square?
                            
                                Parallelization in R: how to "source" on every node?
                            
                                How do I get a data.frame from R's aggregate function in the right format?
                            
                                how to scrape this squawka page?
                            
                                Build difference between groups with dplyr in r
                            
                                Formula evaluation with mutate()
                            
                                How to get `mtext()` with partial bold text?
                            
                                R: interactive plots (tooltips): rCharts dimple plot: formatting axis
                            
                                R - Subtracting two smoothScatter plots
                            
                                Use Predict on data.table with Linear Regression
                            
                                Using compiler- package and suppress "No visible binding for global variable"
                            
                                Rstudio knit to PDF
                            
                                Convert a printed message into a character vector
                            
                                dplyr, do(), extracting parameters from model without losing grouping variable
                            
                                parRF on caret not working for more than one core
                            
                                How to use tryCatch in R
                            
                                Splitting knitr Chunk code and output into two different knitrouts

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With