Pardon my new-ness to the R world, thank you kindly in advance for your help. I would like to analyze the data from an experiment. The data comes in in Long format, and it needs to be reshaped into wide, but I cannot figure out exactly how to do it. Most of the examples for melt/cast and reshape deal with much simpler dataframes. Each time the subject answers a question on the experiment, his userid, location, age, and gender are recorded in a single row, then his experimental data on a series of questions are inputed next to those variables. Here's the thing, they may answer any number of questions on the experiment, and they may answer different items (it is quite complicated, but it must be this way). The raw data looks something like this: <pre class="prettyprint"><code>User_id, location, age, gender, Item, Resp 1, CA, 22, M, A, 1 1, CA, 22, M, B, -1 1, CA, 22, M, C, -1 1, CA, 22, M, D, 1 1, CA, 22, M, E,-1 2, MD, 27, F, A, -1 2, MD, 27, F, B, 1 2, MD, 27, F, C, 1 2, MD, 27, F, E, 1 2, MD, 27, F, G, -1 2, MD, 27, F, H, -1 </code></pre> I would like to reshape this data to have each user be on a single row, to look like this: <pre class="prettyprint"><code>User_id, location, age, gender, A, B, C, D, E, F, G, H 1, CA, 22, M, 1, -1, -1, 1, -1, 0, 0, 0, 2, MD, 27, F, -1, 1, 1, 1, 0, 1, -1, -1 </code></pre> I think this is just a matter of finding the right reshape equation, but I've been at it for a couple of hours and I can't quite get what I want it too look like, since most of the examples do not have the repeated demographic data, and thus can just be rotated more simply. Very sorry if I have overlooked something simple.

Using <code>data.table</code> you can do: <pre class="prettyprint"><code>library(data.table) > dcast(dt, User_id + location + age ~ Item, value.var = "Resp", fill = 0L) User_id location age A B C D E G H 1: 1 CA 22 1 -1 -1 1 -1 0 0 2: 2 MD 27 -1 1 1 0 1 -1 -1 </code></pre>

Here's the always elegant <code>stats::reshape</code> version <pre class="prettyprint"><code>(newdf <- reshape(df, direction = "wide", timevar = "Item", idvar = names(df)[1:4])) # User_id location age gender Resp. A Resp. B Resp. C Resp. D Resp. E Resp. G Resp. H # 1 1 CA 22 M 1 -1 -1 1 -1 NA NA # 6 2 MD 27 F -1 1 1 NA 1 -1 -1 </code></pre> Missing values get filled with <code>NA</code> in <code>reshape()</code>, and the names are not what we want. So we'll need to do a bit more work. Here we can change the names and replace the <code>NA</code>s with zero in the same line to arrive at your desired result. <pre class="prettyprint"><code>replace(setNames(newdf, sub(".* ", "", names(newdf))), is.na(newdf), 0) # User_id location age gender A B C D E G H # 1 1 CA 22 M 1 -1 -1 1 -1 0 0 # 6 2 MD 27 F -1 1 1 0 1 -1 -1 </code></pre> Of course, the code would definitely be more legible if we broke this up into two separate lines. Also, note that there is no <code>F</code> in <code>Item</code> in your original data, hence the difference in output from yours. Data: <pre class="prettyprint"><code>df <- structure(list(User_id = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), location = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c(" CA", " MD"), class = "factor"), age = c(22L, 22L, 22L, 22L, 22L, 27L, 27L, 27L, 27L, 27L, 27L), gender = structure(c(2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c(" F", " M" ), class = "factor"), Item = structure(c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 5L, 6L, 7L), .Label = c(" A", " B", " C", " D", " E", " G", " H"), class = "factor"), Resp = c(1, -1, -1, 1, -1, -1, 1, 1, 1, -1, -1)), .Names = c("User_id", "location", "age", "gender", "Item", "Resp"), class = "data.frame", row.names = c(NA, -11L )) </code></pre>

Reshape R data with user entries in rows, collapsing for each user

Tags:

r

reshape

Pardon my new-ness to the R world, thank you kindly in advance for your help.

I would like to analyze the data from an experiment.

The data comes in in Long format, and it needs to be reshaped into wide, but I cannot figure out exactly how to do it. Most of the examples for melt/cast and reshape deal with much simpler dataframes.

Each time the subject answers a question on the experiment, his userid, location, age, and gender are recorded in a single row, then his experimental data on a series of questions are inputed next to those variables. Here's the thing, they may answer any number of questions on the experiment, and they may answer different items (it is quite complicated, but it must be this way).

The raw data looks something like this:

User_id, location, age, gender, Item, Resp
1, CA, 22, M, A, 1 
1, CA, 22, M, B, -1 
1, CA, 22, M, C, -1 
1, CA, 22, M, D, 1 
1, CA, 22, M, E,-1
2, MD, 27, F, A, -1 
2, MD, 27, F, B, 1 
2, MD, 27, F, C, 1 
2, MD, 27, F, E, 1 
2, MD, 27, F, G, -1 
2, MD, 27, F, H, -1

I would like to reshape this data to have each user be on a single row, to look like this:

User_id, location, age, gender, A, B, C, D, E, F, G, H
1, CA, 22, M, 1, -1, -1, 1, -1, 0, 0, 0, 
2, MD, 27, F, -1, 1, 1, 1, 0, 1, -1, -1

I think this is just a matter of finding the right reshape equation, but I've been at it for a couple of hours and I can't quite get what I want it too look like, since most of the examples do not have the repeated demographic data, and thus can just be rotated more simply. Very sorry if I have overlooked something simple.

387

asked Aug 17 '15 22:08

GFoMoFo

3 Answers

Using data.table you can do:

library(data.table)
> dcast(dt, User_id + location + age ~ Item, value.var = "Resp", fill = 0L)
   User_id location age  A  B  C  D  E  G  H
1:       1       CA  22  1 -1 -1  1 -1  0  0
2:       2       MD  27 -1  1  1  0  1 -1 -1

answered Sep 29 '22 14:09

MichaelChirico

There’s a package called tidyr that makes melting and reshaping data frames much easier. In your case, you can use tidyr::spread straightforwardly:

result = spread(df, Item, Resp)

This will however fill missing entries with NA:

  User_id location age gender  A  B  C  D  E  G  H
1       1       CA  22      M  1 -1 -1  1 -1 NA NA
2       2       MD  27      F -1  1  1 NA  1 -1 -1

You can fix this by replacing them:

result[is.na(result)] = 0
result
#   User_id location age gender  A  B  C  D  E  G  H
# 1       1       CA  22      M  1 -1 -1  1 -1  0  0
# 2       2       MD  27      F -1  1  1  0  1 -1 -1

… or by using the fill argument:

result = spread(df, Item, Resp, fill = 0)

For completeness’ sake, the other way round (i.e. reproducing the original data.frame) works via gather (this is usually known as “melting”):

gather(result, Item, Resp, A : H)

— The last argument here tells gather which columns to gather (and it supports the concise range syntax).

answered Sep 29 '22 15:09

Konrad Rudolph

Here's the always elegant stats::reshape version

(newdf <- reshape(df, direction = "wide", timevar = "Item", idvar = names(df)[1:4]))
#   User_id location age gender Resp. A Resp. B Resp. C Resp. D Resp. E Resp. G Resp. H
# 1       1       CA  22      M       1      -1      -1       1      -1      NA      NA
# 6       2       MD  27      F      -1       1       1      NA       1      -1      -1

Missing values get filled with NA in reshape(), and the names are not what we want. So we'll need to do a bit more work. Here we can change the names and replace the NAs with zero in the same line to arrive at your desired result.

replace(setNames(newdf, sub(".* ", "", names(newdf))), is.na(newdf), 0)
#   User_id location age gender  A  B  C D  E  G  H
# 1       1       CA  22      M  1 -1 -1 1 -1  0  0
# 6       2       MD  27      F -1  1  1 0  1 -1 -1

Of course, the code would definitely be more legible if we broke this up into two separate lines. Also, note that there is no F in Item in your original data, hence the difference in output from yours.

Data:

df <- structure(list(User_id = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 2L), location = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L), .Label = c(" CA", " MD"), class = "factor"), age = c(22L, 
22L, 22L, 22L, 22L, 27L, 27L, 27L, 27L, 27L, 27L), gender = structure(c(2L, 
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c(" F", " M"
), class = "factor"), Item = structure(c(1L, 2L, 3L, 4L, 5L, 
1L, 2L, 3L, 5L, 6L, 7L), .Label = c(" A", " B", " C", " D", " E", 
" G", " H"), class = "factor"), Resp = c(1, -1, -1, 1, -1, -1, 
1, 1, 1, -1, -1)), .Names = c("User_id", "location", "age", "gender", 
"Item", "Resp"), class = "data.frame", row.names = c(NA, -11L
))

answered Sep 29 '22 14:09

Rich Scriven

Related questions
                            
                                Mutate multiple variable to create multiple new variables
                            
                                R function that returns a string literal
                            
                                How to collapse categories or recategorize variables?
                            
                                First circle of R hell. 0.1 != 0.3/3 [duplicate]
                            
                                How do I turn the numeric output of boxplot (with plot=FALSE) into something usable?
                            
                                check whether a variable is in increasing order in R
                            
                                Find nearest smaller number
                            
                                Select rows within a particular time range
                            
                                Fast way to replace all blanks with NA in R data.table
                            
                                How to speed up R packages installation in docker
                            
                                Write a Sparse Matrix to a CSV in R
                            
                                Plot probability with ggplot2 (not density)
                            
                                Group integer vector into consecutive runs
                            
                                fast subsetting in R
                            
                                R remove objects from a list with if else statement
                            
                                Convert list of list object to dataframe in R
                            
                                How to print a character list from A to Z?
                            
                                Provide shades between dates on x axis [duplicate]
                            
                                Convert numeric values into binary (0/1)
                            
                                Efficiently transform multiple columns of a data frame

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With