I have the following data frame, with different row lengths:
myvar <- as.data.frame(rbind(c("Walter","NA","NA","NA","NA"),
c("Walter","NA","NA","NA","NA"),
c("Walter","Jesse","NA","NA","NA"),
c("Gus","Tuco","Mike","NA","NA"),
c("Gus","Mike","Hank","Saul","Flynn")))
ID <- as.factor(c(1:5))
data.frame(ID,myvar)
ID V1 V2 V3 V4 V5
1 Walter NA NA NA NA
2 Walter NA NA NA NA
3 Walter Jesse NA NA NA
4 Gus Tuco Mike NA NA
5 Gus Mike Hank Saul Flynn
My goal is to switch this data frame into a two column data frame. The first column would be the ID and the other one would be the character name. Note that the ID must be correspondent to the row the character were originally placed. I'm expecting the following result:
ID V
1 Walter
2 Walter
3 Walter
3 Jesse
4 Gus
4 Tuco
4 Mike
5 Gus
5 Mike
5 Hank
5 Saul
5 Flynn
I've tried dcast {reshape2} but it doesn't returned what I need. It is noteworthy that my original data frame is quite big. Any tips? Cheers.
You could use unlist
res <- subset(data.frame(ID,value=unlist(myvar[-1],
use.names=FALSE)), value!='NA')
res
# ID value
#1 1 Walter
#2 2 Walter
#3 3 Walter
#4 4 Gus
#5 5 Gus
#6 3 Jesse
#7 4 Tuco
#8 5 Mike
#9 4 Mike
#10 5 Hank
#11 5 Saul
#12 5 Flynn
NOTE: The NAs
are 'character' elements in the dataset, it is better to create it without quotes so that it will be real NAs and we can remove it by na.omit
, is.na
, complete.cases
etc.
myvar <- data.frame(ID,myvar)
myvar <- as.data.frame(rbind(c("Walter","NA","NA","NA","NA"),
c("Walter","NA","NA","NA","NA"),
c("Walter","Jesse","NA","NA","NA"),
c("Gus","Tuco","Mike","NA","NA"),
c("Gus","Mike","Hank","Saul","Flynn")))
ID <- as.factor(c(1:5))
df <- data.frame(ID, myvar)
Using base reshape. (I'm converting your "NA"
character strings to NA
which you may not have to do, this is just due to how you created this example)
df[df == 'NA'] <- NA
na.omit(reshape(df, direction = 'long', varying = list(2:6))[, c('ID','V1')])
# ID V1
# 1.1 1 Walter
# 2.1 2 Walter
# 3.1 3 Walter
# 4.1 4 Gus
# 5.1 5 Gus
# 3.2 3 Jesse
# 4.2 4 Tuco
# 5.2 5 Mike
# 4.3 4 Mike
# 5.3 5 Hank
# 5.4 5 Saul
# 5.5 5 Flynn
or using reshape2
library('reshape2')
## na.omit(melt(df, id.vars = 'ID')[, c('ID','value')])
## or better yet as ananda suggests:
melt(df, id.vars = 'ID', na.rm = TRUE)[, c('ID','value')]
# ID value
# 1 1 Walter
# 2 2 Walter
# 3 3 Walter
# 4 4 Gus
# 5 5 Gus
# 8 3 Jesse
# 9 4 Tuco
# 10 5 Mike
# 14 4 Mike
# 15 5 Hank
# 20 5 Saul
# 25 5 Flynn
you get warnings that the factor levels over the columns are not the same but that's fine.
Fix your "NA"
so that they are actually NA
first:
mydf[mydf == "NA"] <- NA
Using some subsetting to do it all in one fell swoop:
data.frame(ID=mydf$ID[row(mydf[-1])[!is.na(mydf[-1])]], V=mydf[-1][!is.na(mydf[-1])])
# ID V
#1 1 Walter
#2 2 Walter
#3 3 Walter
#4 4 Gus
#5 5 Gus
#6 3 Jesse
#7 4 Tuco
#8 5 Mike
#9 4 Mike
#10 5 Hank
#11 5 Saul
#12 5 Flynn
Or much more readable in base R:
sel <- which(!is.na(mydf[-1]), arr.ind=TRUE)
data.frame(ID=mydf$ID[sel[,1]], V=mydf[-1][sel])
Using tidyr
library("tidyr")
myvar <- as.data.frame(rbind(c("Walter","NA","NA","NA","NA"),
c("Walter","NA","NA","NA","NA"),
c("Walter","Jesse","NA","NA","NA"),
c("Gus","Tuco","Mike","NA","NA"),
c("Gus","Mike","Hank","Saul","Flynn")))
ID <- as.factor(c(1:5))
myvar <- data.frame(ID,myvar)
myvar %>%
gather(ID, Name, V1:V5 ) %>%
select(ID, value) %>%
filter(value != "NA")
If your NAs are coded as NA
instead of "NA"
, then we can in fact use the na.rm = TRUE
option in gather
. E.g.:
myvar[myvar == "NA"] <- NA
myvar %>%
gather(ID, Name, V1:V5, na.rm = TRUE ) %>%
select(ID, value)
gives
ID value
1 1 Walter
2 2 Walter
3 3 Walter
4 4 Gus
5 5 Gus
6 3 Jesse
7 4 Tuco
8 5 Mike
9 4 Mike
10 5 Hank
11 5 Saul
12 5 Flynn
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With