Reshape data frame with different column lengths into two columns replicating column ID

Question

I have the following data frame, with different row lengths:

myvar <- as.data.frame(rbind(c("Walter","NA","NA","NA","NA"),
                             c("Walter","NA","NA","NA","NA"),
                             c("Walter","Jesse","NA","NA","NA"),
                             c("Gus","Tuco","Mike","NA","NA"), 
                             c("Gus","Mike","Hank","Saul","Flynn")))
ID <- as.factor(c(1:5))   
data.frame(ID,myvar)

ID     V1    V2   V3   V4    V5
 1 Walter    NA   NA   NA    NA
 2 Walter    NA   NA   NA    NA
 3 Walter Jesse   NA   NA    NA
 4    Gus  Tuco Mike   NA    NA
 5    Gus  Mike Hank Saul Flynn

My goal is to switch this data frame into a two column data frame. The first column would be the ID and the other one would be the character name. Note that the ID must be correspondent to the row the character were originally placed. I'm expecting the following result:

ID      V
1  Walter    
2  Walter
3  Walter
3  Jesse
4  Gus
4  Tuco
4  Mike
5  Gus
5  Mike
5  Hank
5  Saul
5  Flynn

I've tried dcast {reshape2} but it doesn't returned what I need. It is noteworthy that my original data frame is quite big. Any tips? Cheers.

akrun · Accepted Answer

You could use unlist

 res <- subset(data.frame(ID,value=unlist(myvar[-1], 
                              use.names=FALSE)), value!='NA')
 res
 #   ID  value
 #1   1 Walter
 #2   2 Walter
 #3   3 Walter
 #4   4    Gus
 #5   5    Gus
 #6   3  Jesse
 #7   4   Tuco
 #8   5   Mike
 #9   4   Mike
 #10  5   Hank
 #11  5   Saul
 #12  5  Flynn

NOTE: The NAs are 'character' elements in the dataset, it is better to create it without quotes so that it will be real NAs and we can remove it by na.omit, is.na, complete.cases etc.

data

myvar <- data.frame(ID,myvar)

rawr · Answer

myvar <- as.data.frame(rbind(c("Walter","NA","NA","NA","NA"),
                             c("Walter","NA","NA","NA","NA"),
                             c("Walter","Jesse","NA","NA","NA"),
                             c("Gus","Tuco","Mike","NA","NA"), 
                             c("Gus","Mike","Hank","Saul","Flynn")))
ID <- as.factor(c(1:5))   
df <- data.frame(ID, myvar)

Using base reshape. (I'm converting your "NA" character strings to NA which you may not have to do, this is just due to how you created this example)

df[df == 'NA'] <- NA
na.omit(reshape(df, direction = 'long', varying = list(2:6))[, c('ID','V1')])

#     ID     V1
# 1.1  1 Walter
# 2.1  2 Walter
# 3.1  3 Walter
# 4.1  4    Gus
# 5.1  5    Gus
# 3.2  3  Jesse
# 4.2  4   Tuco
# 5.2  5   Mike
# 4.3  4   Mike
# 5.3  5   Hank
# 5.4  5   Saul
# 5.5  5  Flynn

or using reshape2

library('reshape2')
## na.omit(melt(df, id.vars = 'ID')[, c('ID','value')])

## or better yet as ananda suggests:
melt(df, id.vars = 'ID', na.rm = TRUE)[, c('ID','value')]

#    ID  value
# 1   1 Walter
# 2   2 Walter
# 3   3 Walter
# 4   4    Gus
# 5   5    Gus
# 8   3  Jesse
# 9   4   Tuco
# 10  5   Mike
# 14  4   Mike
# 15  5   Hank
# 20  5   Saul
# 25  5  Flynn

you get warnings that the factor levels over the columns are not the same but that's fine.

thelatemail · Answer

Fix your "NA" so that they are actually NA first:

mydf[mydf == "NA"] <- NA

Using some subsetting to do it all in one fell swoop:

data.frame(ID=mydf$ID[row(mydf[-1])[!is.na(mydf[-1])]], V=mydf[-1][!is.na(mydf[-1])])

#   ID      V
#1   1 Walter
#2   2 Walter
#3   3 Walter
#4   4    Gus
#5   5    Gus
#6   3  Jesse
#7   4   Tuco
#8   5   Mike
#9   4   Mike
#10  5   Hank
#11  5   Saul
#12  5  Flynn

Or much more readable in base R:

sel <- which(!is.na(mydf[-1]), arr.ind=TRUE)
data.frame(ID=mydf$ID[sel[,1]], V=mydf[-1][sel])

Alex · Answer

Using tidyr

library("tidyr")

myvar <- as.data.frame(rbind(c("Walter","NA","NA","NA","NA"),
                             c("Walter","NA","NA","NA","NA"),
                             c("Walter","Jesse","NA","NA","NA"),
                             c("Gus","Tuco","Mike","NA","NA"), 
                             c("Gus","Mike","Hank","Saul","Flynn")))
ID <- as.factor(c(1:5))   

myvar <- data.frame(ID,myvar)

myvar %>% 
    gather(ID, Name, V1:V5 ) %>%
    select(ID, value) %>%
    filter(value != "NA")

If your NAs are coded as NA instead of "NA", then we can in fact use the na.rm = TRUE option in gather. E.g.:

myvar[myvar == "NA"] <- NA
myvar %>% 
    gather(ID, Name, V1:V5, na.rm = TRUE ) %>%
    select(ID, value)

gives

   ID  value
1   1 Walter
2   2 Walter
3   3 Walter
4   4    Gus
5   5    Gus
6   3  Jesse
7   4   Tuco
8   5   Mike
9   4   Mike
10  5   Hank
11  5   Saul
12  5  Flynn

Reshape data frame with different column lengths into two columns replicating column ID

Tags:

r

multiple-columns

reshape

ALS.Meyer

4 Answers

data

akrun

rawr

thelatemail

Alex

Recent Activity

Donate For Us

Reshape data frame with different column lengths into two columns replicating column ID

Tags:

r

multiple-columns

reshape

ALS.Meyer

4 Answers

data

akrun

rawr

thelatemail

Alex

Related questions

Recent Activity

Donate For Us