Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace NA's in R - works in a practice dataset but warning message when applied to actual data

I have a dataset in R which looks like, and has been reshaped in the same way as, the following example. The aim is to turn NA values in to something else (e.g. "FALSE" or "0") which can then be used to create a new column

ortho.test<-data.frame(rep("a",10));colnames(ortho.test)=("ODB6")
ortho.test$FBGN=c("FBgn0132258","FBgn0131535","FBgn0138769","FBgn01561235","FBgn0316645","FBgn874916","FBgn5758641","FBgn5279946","FBgn67543154","FBgn2451645")
ortho.test$Species=c("DROME","DROSI","DROSE","DROAN","DROYA","DROPS","DROPE","DROVI","DROGR","DROWI")

ortho<-reshape(ortho.test,direction="wide",idvar="ODB6",timevar="Species")
ortho$FBGN.DROME<-NA
is.na(ortho)

Which returns a vector telling me all but the FBGN.DROME are FALSE With the following str() output:

> str(ortho)
'data.frame':   1 obs. of  11 variables:
 $ ODB6      : Factor w/ 1 level "a": 1
 $ FBGN.DROME: logi NA
 $ FBGN.DROSI: chr "FBgn0131535"
 $ FBGN.DROSE: chr "FBgn0138769"
 $ FBGN.DROAN: chr "FBgn01561235"
 $ FBGN.DROYA: chr "FBgn0316645"
 $ FBGN.DROPS: chr "FBgn874916"
 $ FBGN.DROPE: chr "FBgn5758641"
 $ FBGN.DROVI: chr "FBgn5279946"
 $ FBGN.DROGR: chr "FBgn67543154"
 $ FBGN.DROWI: chr "FBgn2451645"
 - attr(*, "reshapeWide")=List of 5
  ..$ v.names: NULL
  ..$ timevar: chr "Species"
  ..$ idvar  : chr "ODB6"
  ..$ times  : chr  "DROME" "DROSI" "DROSE" "DROAN" ...
  ..$ varying: chr [1, 1:10] "FBGN.DROME" "FBGN.DROSI" "FBGN.DROSE" "FBGN.DROAN" ...

I change my NA to 0

ortho[is.na(ortho)]<-0
is.na(ortho)

Which returns a vector telling me all are now FALSE - a success because now I can create a column using ifelse() to show which of the rows have no 0's or FALSE's (or whatever text label I use to replace the NA's) in any column...

However, when I apply this to the full blown dataframe the NA's do not convert and I get the following warnings

> ortho[is.na(ortho)]<-0
There were 12 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In `[<-.factor`(`*tmp*`, thisvar, value = structure(c(62938L,  ... :
  invalid factor level, NAs generated
2: In `[<-.factor`(`*tmp*`, thisvar, value = structure(c(67667L,  ... :
  invalid factor level, NAs generated
3: In `[<-.factor`(`*tmp*`, thisvar, value = structure(c(122384L,  ... :
  invalid factor level, NAs generated
4: In `[<-.factor`(`*tmp*`, thisvar, value = structure(c(136498L,  ... :
  invalid factor level, NAs generated
5: In `[<-.factor`(`*tmp*`, thisvar, value = structure(c(84764L,  ... :
  invalid factor level, NAs generated
6: In `[<-.factor`(`*tmp*`, thisvar, value = structure(c(162734L,  ... :
  invalid factor level, NAs generated
7: In `[<-.factor`(`*tmp*`, thisvar, value = structure(c(33586L,  ... :
  invalid factor level, NAs generated
8: In `[<-.factor`(`*tmp*`, thisvar, value = structure(c(38959L,  ... :
  invalid factor level, NAs generated
9: In `[<-.factor`(`*tmp*`, thisvar, value = structure(c(149363L,  ... :
  invalid factor level, NAs generated
10: In `[<-.factor`(`*tmp*`, thisvar, value = structure(c(846L,  ... :
  invalid factor level, NAs generated
11: In `[<-.factor`(`*tmp*`, thisvar, value = structure(c(98228L,  ... :
  invalid factor level, NAs generated
12: In `[<-.factor`(`*tmp*`, thisvar, value = structure(c(110267L,  ... :
  invalid factor level, NAs generated

and this is the str() output

  > str(ortho)
    'data.frame':   17217 obs. of  13 variables:
     $ ODB6      : Factor w/ 17217 levels "EOG60023J","EOG60023K",..: 1 2 3 4 5 6 7 8 9 10 ...
     $ FBGN.DROGR: Factor w/ 164289 levels "FBgn0000008",..: 62938 54687 54705 56261 52591 58895 52161 52477 59180 53404 ...
     $ FBGN.DROMO: Factor w/ 164289 levels "FBgn0000008",..: 67667 65117 65951 66506 68291 71722 73134 68667 72523 76080 ...
     $ FBGN.DROVI: Factor w/ 164289 levels "FBgn0000008",..: 122384 121133 120018 121674 NA 125620 123754 123969 127130 130755 ...
     $ FBGN.DROWI: Factor w/ 164289 levels "FBgn0000008",..: 136498 136809 139642 137108 NA 141689 136363 137237 135869 132801 ...
     $ FBGN.DROPE: Factor w/ 164289 levels "FBgn0000008",..: 84764 78121 81229 80829 85509 82276 79001 80267 77133 87679 ...
     $ FBGN.DROPS: Factor w/ 164289 levels "FBgn0000008",..: 162734 158625 162203 158653 158028 22427 158179 13830 19898 160874 ...
     $ FBGN.DROAN: Factor w/ 164289 levels "FBgn0000008",..: 33586 35261 35694 23649 33601 25796 33808 33861 25917 29992 ...
     $ FBGN.DROER: Factor w/ 164289 levels "FBgn0000008",..: 38959 41203 40738 39865 38807 46087 38821 44982 47952 38091 ...
     $ FBGN.DROYA: Factor w/ 164289 levels "FBgn0000008",..: 149363 153417 153106 152243 149654 147146 149664 149482 147635 144838 ...
     $ FBGN.DROME: Factor w/ 164289 levels "FBgn0000008",..: 846 7219 6958 162946 525 1892 125 3510 163839 10111 ...
     $ FBGN.DROSE: Factor w/ 164289 levels "FBgn0000008",..: 98228 94438 94153 102953 98068 95380 98082 92553 93497 95950 ...
     $ FBGN.DROSI: Factor w/ 164289 levels "FBgn0000008",..: 110267 108223 107983 107246 110164 117494 116973 110504 106459 NA ...
     - attr(*, "reshapeWide")=List of 5
      ..$ v.names: NULL
      ..$ timevar: chr "Species"
      ..$ idvar  : chr "ODB6"
      ..$ times  : Factor w/ 12 levels "DROAN","DROER",..: 3 5 10 11 6 7 1 2 12 4 ...
      ..$ varying: chr [1, 1:12] "FBGN.DROGR" "FBGN.DROMO" "FBGN.DROVI" "FBGN.DROWI" ...
    >

Could you help me get the main dataframe to play along like the test one did? (PS - I know I'm going to get "this is a duplicate, read the help pages and search properly" response - but I have searched, which is how I found out how to replace NA's, and I haven't found any with this same issue.)

like image 698
rg255 Avatar asked Mar 11 '13 14:03

rg255


People also ask

Is NA in R replace with 0?

The NA value in a data frame can be replaced by 0 using the following functions. is.na() is an in-built function in R, which is used to evaluate a value at a cell in the data frame. It returns a true value in case the value is NA or missing, otherwise, it returns a boolean false value.

How do I check if a value is na in R?

To test if a value is NA, use is.na(). The function is.na(x) returns a logical vector of the same size as x with value TRUE if and only if the corresponding element in x is NA.

How do you replace null values with 0 in R?

To replace NA with 0 in an R data frame, use is.na() function and then select all those values with NA and assign them to 0.


1 Answers

You have a factors problem. If you look at your real data set, you'll notice the

Factor w/ 164289 levels .....

For example,

R> x = factor(c("A", "B"))
R> x[x=="A"] = 0
Warning message:
In `[<-.factor`(`*tmp*`, x == "A", value = 0) :
  invalid factor level, NAs generated

You need to add 0 as a level. So something like:

x = factor(x, levels=c(levels(x), 0))
x[is.na(x)] = 0

should do the trick. However, a better tactic would be to change how you read in the data. For example,

read.table(filename, stringsAsFactors=FALSE)
like image 148
csgillespie Avatar answered Nov 15 '22 19:11

csgillespie