I have a large dataset that I need to convert to long format from wide format. That should be simple enough and there are lots of examples of how to do that on this forum. However, in this case, I also need to split the column headers used in the wide format and create a column for each of them in the long format.
Example dataset
data <- data.frame("East2010"=1:3, "West2010"=4:6, "East2011"=7:9, "West2011"=5:7)
data
East.2010 West.2010 East.2011 West.2011
1 1 4 7 5
2 2 5 8 6
3 3 6 9 7
What I want is something like this
Site Year Response
East 2010 1
East 2010 2
East 2010 3
West 2010 4
West 2010 5
West 2010 6
East 2011 7
East 2011 8
East 2011 9
West 2011 5
West 2011 6
West 2011 7
I have looked a lot of examples on this forum that will melt data to convert to long format and others that do a column split at a delimiter, but I have not been able to make the two work together.
The easiest way to reshape data between these formats is to use the following two functions from the tidyr package in R: pivot_longer(): Reshapes a data frame from wide to long format. pivot_wider(): Reshapes a data frame from long to wide format.
To split a column into multiple columns in the R Language, we use the separator() function of the dplyr package library. The separate() function separates a character column into multiple columns with a regular expression or numeric locations.
A dataset can be written in two different formats: wide and long. A wide format contains values that do not repeat in the first column. A long format contains values that do repeat in the first column.
When there are multiple measurements of the same subject, across time or using different tools, the data is often described as being in "wide" format if there is one observation row per subject with each measurement present as a different variable and "long" format if there is one observation row per measurement (thus, ...
Here's the "modern day" :-) approach for this:
library(dplyr)
library(tidyr)
data %>%
gather(var, Response, East2010:West2011) %>% ## Makes wide data long
separate(var, c("Site", "Year"), sep = -5) ## Splits up a column
# Site Year Response
# 1 East 2010 1
# 2 East 2010 2
# 3 East 2010 3
# 4 West 2010 4
# 5 West 2010 5
# 6 West 2010 6
# 7 East 2011 7
# 8 East 2011 8
# 9 East 2011 9
# 10 West 2011 5
# 11 West 2011 6
# 12 West 2011 7
The sep = -5
from above says to go from the end of the string backwards five characters and split there. Thus, if you had "North2010" as a possible name, this would still work.
That said, it is more reliable to use a regular expression like @David's, which is also possible within separate
:
data %>%
gather(var, Response, East2010:West2011) %>%
separate(var, c("Site", "Year"),
sep = "(?<=[[:alpha:]])(?=[[:digit:]])",
perl = TRUE)
Or (in case the column width is not always constant). Here I use a "lookahead" and "lookbehind" in order to separate characters from digits.
library(reshape2)
data <- melt(data)
temp <- strsplit(as.character(data$variable), "(?<=[[:alpha:]])(?=[[:digit:]])", perl = TRUE)
transform(data, Site = sapply(temp, "[", 1), Year = sapply(temp, "[", 2))
# variable value Site Year
#1 East2010 1 East 2010
#2 East2010 2 East 2010
#3 East2010 3 East 2010
#4 West2010 4 West 2010
#5 West2010 5 West 2010
#6 West2010 6 West 2010
#7 East2011 7 East 2011
#8 East2011 8 East 2011
#9 East2011 9 East 2011
#10 West2011 5 West 2011
#11 West2011 6 West 2011
#12 West2011 7 West 2011
Something along these lines would work:
library("plyr")
library("reshape2")
m.data <- melt(data)
m.data <- mutate(m.data, Site=substr(variable, 1,4),
Year=substr(variable, 5,8))
Which would result in:
> m.data
variable value Site Year
1 East2010 1 East 2010
2 East2010 2 East 2010
3 East2010 3 East 2010
4 West2010 4 West 2010
5 West2010 5 West 2010
6 West2010 6 West 2010
7 East2011 7 East 2011
8 East2011 8 East 2011
9 East2011 9 East 2011
10 West2011 5 West 2011
11 West2011 6 West 2011
12 West2011 7 West 2011
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With