Inspired by a comment from @gsk3 on a question about reshaping data, I started doing a little bit of experimentation with reshaping data where the variable names have character suffixes instead of numeric suffixes. As an example, I'll load the <code>dadmomw</code> dataset from one of the UCLA ATS Stata learning webpages (see "Example 4" on the webpage). Here's what the dataset looks like: <pre class="prettyprint"><code>library(foreign) dadmom <- read.dta("https://stats.idre.ucla.edu/stat/stata/modules/dadmomw.dat") dadmom # famid named incd namem incm # 1 1 Bill 30000 Bess 15000 # 2 2 Art 22000 Amy 18000 # 3 3 Paul 25000 Pat 50000 </code></pre> When trying to reshape from this wide format to long, I run into a problem. Here's what I do to reshape the data. <pre class="prettyprint"><code>reshape(dadmom, direction="long", idvar=1, varying=2:5, sep="", v.names=c("name", "inc"), timevar="dadmom", times=c("d", "m")) # famid dadmom name inc # 1.d 1 d 30000 Bill # 2.d 2 d 22000 Art # 3.d 3 d 25000 Paul # 1.m 1 m 15000 Bess # 2.m 2 m 18000 Amy # 3.m 3 m 50000 Pat </code></pre> Note the swapped column names for "name" and "inc"; changing <code>v.names</code> to <code>c("inc", "name")</code> doesn't solve the problem. <code>reshape</code> seems very picky about wanting the columns to be named in a fairly standard way. For example, I can reshape the data correctly (and easily) if I first rename the columns: <pre class="prettyprint"><code>dadmom2 <- dadmom # Just so we can continue experimenting with the original data # Change the names of the last four variables to include a "." names(dadmom2)[2:5] <- gsub("(d$|m$)", "\\.\\1", names(dadmom2)[2:5]) reshape(dadmom2, direction="long", idvar=1, varying=2:5, timevar="dadmom") # famid dadmom name inc # 1.d 1 d Bill 30000 # 2.d 2 d Art 22000 # 3.d 3 d Paul 25000 # 1.m 1 m Bess 15000 # 2.m 2 m Amy 18000 # 3.m 3 m Pat 50000 </code></pre> My questions are: <ol> <li>Why is R swapping the columns in the example I've provided?</li> <li>Can I get to this result with base R <code>reshape</code> without changing the variable names before reshaping?</li> <li>Are there other approaches that could be considered instead of <code>reshape</code>?</li> </ol>

This works (to specify to varying what columns go with who): <pre class="prettyprint"><code>reshape(dadmom, direction="long", varying=list(c(2, 4), c(3, 5)), sep="", v.names=c("name", "inc"), timevar="dadmom", times=c("d", "m")) </code></pre> So you actually have nested repeated measures here; both name and inc for mom and dad. Because you have more than one series of repeated measures you have to supply a <code>list</code> to varying that tells <code>reshape</code> which group gets stacked on the other group. So the two approaches to this problem are to provide a list as I did or to rename the columns the way the R beast likes them as you did. See my recent blogs on base <code>reshape</code> for more on this (particularly the second link deals with this): reshape (part I) reshape (part II)

reshape wide to long with character suffixes instead of numeric suffixes

Tags:

r

reshape

Inspired by a comment from @gsk3 on a question about reshaping data, I started doing a little bit of experimentation with reshaping data where the variable names have character suffixes instead of numeric suffixes.

As an example, I'll load the dadmomw dataset from one of the UCLA ATS Stata learning webpages (see "Example 4" on the webpage).

Here's what the dataset looks like:

library(foreign)
dadmom <- read.dta("https://stats.idre.ucla.edu/stat/stata/modules/dadmomw.dat")
dadmom
#   famid named  incd namem  incm
# 1     1  Bill 30000  Bess 15000
# 2     2   Art 22000   Amy 18000
# 3     3  Paul 25000   Pat 50000

When trying to reshape from this wide format to long, I run into a problem. Here's what I do to reshape the data.

reshape(dadmom, direction="long", idvar=1, varying=2:5, 
        sep="", v.names=c("name", "inc"), timevar="dadmom",
        times=c("d", "m"))
#     famid dadmom  name  inc
# 1.d     1      d 30000 Bill
# 2.d     2      d 22000  Art
# 3.d     3      d 25000 Paul
# 1.m     1      m 15000 Bess
# 2.m     2      m 18000  Amy
# 3.m     3      m 50000  Pat

Note the swapped column names for "name" and "inc"; changing v.names to c("inc", "name") doesn't solve the problem.

reshape seems very picky about wanting the columns to be named in a fairly standard way. For example, I can reshape the data correctly (and easily) if I first rename the columns:

dadmom2 <- dadmom # Just so we can continue experimenting with the original data
# Change the names of the last four variables to include a "."
names(dadmom2)[2:5] <- gsub("(d$|m$)", "\\.\\1", names(dadmom2)[2:5])
reshape(dadmom2, direction="long", idvar=1, varying=2:5, 
        timevar="dadmom")
#     famid dadmom name   inc
# 1.d     1      d Bill 30000
# 2.d     2      d  Art 22000
# 3.d     3      d Paul 25000
# 1.m     1      m Bess 15000
# 2.m     2      m  Amy 18000
# 3.m     3      m  Pat 50000

My questions are:

Why is R swapping the columns in the example I've provided?
Can I get to this result with base R reshape without changing the variable names before reshaping?
Are there other approaches that could be considered instead of reshape?

549

asked May 06 '12 07:05

A5C1D2H2I1M1N2O1R2T1

1 Answers

This works (to specify to varying what columns go with who):

reshape(dadmom, direction="long",  varying=list(c(2, 4), c(3, 5)), 
        sep="", v.names=c("name", "inc"), timevar="dadmom",
        times=c("d", "m"))

So you actually have nested repeated measures here; both name and inc for mom and dad. Because you have more than one series of repeated measures you have to supply a list to varying that tells reshape which group gets stacked on the other group.

So the two approaches to this problem are to provide a list as I did or to rename the columns the way the R beast likes them as you did.

See my recent blogs on base reshape for more on this (particularly the second link deals with this):

reshape (part I)

reshape (part II)

162

answered Oct 12 '22 05:10

Tyler Rinker

Related questions
                            
                                Multivariate Linear Mixed Model in lme4
                            
                                Avoid argument duplication passed through (...)
                            
                                R: change background color of plot for specific area only (based on x-values)
                            
                                How is 95% CI calculated using confint in R?
                            
                                TwitteR, ROAuth and Windows: register OK, but certificate verify failed
                            
                                RStudio Shiny Conditional Plot
                            
                                How to get environment of a variable in R
                            
                                Generate a Filled geom_step
                            
                                How to change the position of the table of contents in rmarkdown?
                            
                                python equivalent of get() in R (= use string to retrieve value of symbol)
                            
                                Dynamic plot height in Shiny
                            
                                Alternative to R's `memory.size()` in linux?
                            
                                Complexe non-equi merge in R
                            
                                ggplot 'non-finite values' error
                            
                                Running R scripts in Airflow?
                            
                                Efficiently merging large data.tables [duplicate]
                            
                                R cannot read Python Pandas dataframe saved in feather format
                            
                                How to vectorize R strsplit?
                            
                                How can I get the screen resolution in R
                            
                                modify variable within R function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With