I have a 114 row by 16 column data frame where the rows are individuals, and the columns are either their names or NA. For example, the first 3 rows looks like this:
name name.1 name.2 name.3 name.4 name.5 name.6 name.7 name.8 name.9 name.10 name.11 name.12 name.13 name.14 name.15
1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> Aanestad <NA> Aanestad <NA> Aanestad <NA>
2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> Ackerman <NA> Ackerman <NA> Ackerman <NA> Ackerman <NA>
3 <NA> <NA> <NA> <NA> <NA> <NA> Alarcon <NA> Alarcon <NA> Alarcon <NA> Alarcon <NA> <NA> <NA>
I want to generate a list (if multiple unique names per row) or vector (if only one unique name per row) of all the unique names, with length 114.
When I try apply(x,1,unique)
I get a 2xNcol array where sometimes the first row cell is NA and sometimes the second row cell is NA.
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] NA NA NA NA "Alquist" NA "Ayala" NA NA
[2,] "Aanestad" "Ackerman" "Alarcon" "Alpert" NA "Ashburn" NA "Baca" "Battin"
When what I'd like is just:
Aanestad
Ackerman
Alarcon
...
I can't seem to figure out how to apply unique() while ignoring NA. na.rm, na.omit etc don't seem to work. I feel like I'm missing something real simple ...
Thanks!
First, if we want to exclude missing values from mathematical operations use the na. rm = TRUE argument. If you do not exclude these values most functions will return an NA . We may also desire to subset our data to obtain complete observations, those observations (rows) in our data that contain no missing data.
When you import dataset from other statistical applications the missing values might be coded with a number, for example 99 . In order to let R know that is a missing value you need to recode it. Another useful function in R to deal with missing values is na. omit() which delete incomplete observations.
To remove all rows having NA, we can use na. omit function. For Example, if we have a data frame called df that contains some NA values then we can remove all rows that contains at least one NA by using the command na. omit(df).
unique
does not appear to have an na.rm
argument, but you can remove the missing values yourself before calling it:
A <- matrix(c(NA,"A","A",
"B", NA, NA,
NA, NA, "C"), nr=3, byrow=TRUE)
apply(A, 1, function(x)unique(x[!is.na(x)]))
gives
[1] "A" "B" "C"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With