Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read null values from file

Tags:

r

I need to read a data frame from a file containing NULL values. Here's an example file:

charCol floatCol intCol
a       1.5      10
b       NULL     3
c       3.9      NULL
d       -3.4     4

I read this file into a data frame:

> df <- read.table('example.dat', header=TRUE)

But "NULL" entries are not interpreted by R as NULL:

> is.null(df$floatCol[2])
[1] FALSE

How should I format my input file so that R properly treats such entries as NULL?

like image 427
Leo Avatar asked Oct 27 '10 11:10

Leo


2 Answers

Always always always do summary(thing) if something is unexpected.

> summary(df)
 charCol floatCol  intCol 
 a:1     1.5 :1   10  :1  
 b:1     -3.4:1   3   :1  
 c:1     3.9 :1   4   :1  
 d:1     NULL:1   NULL:1  

that looks a bit weird. Drill down:

> summary(df$floatCol)
 1.5 -3.4  3.9 NULL 
   1    1    1    1 

what the heck is it?

> class(df$floatCol)
[1] "factor"

The presence of an invalid numeric format (the string 'NULL') has caused R to go "oh I guess these aren't numbers, I'll read them into character strings and make a factor (categorical variable) for you".

The solution has just been posted to use na.string="NULL", but remember that NA isn't the same as NULL in R. NA is a marker for missing data, NULL is a genuine non-value. Compare:

> c(1,2,3,NULL,4)
[1] 1 2 3 4
> c(1,2,3,NA,4)
[1]  1  2  3 NA  4

Once you've read it in correctly, the appropriate test is usually is.na(foo)

like image 107
Spacedman Avatar answered Sep 23 '22 17:09

Spacedman


Try this:

> Lines <- "charCol floatCol intCol
+ a       1.5      10
+ b       NULL     3
+ c       3.9      NULL
+ d       -3.4     4"
> 
> # DF <- read.table("myfile", header = TRUE, na.strings = "NULL")
> DF <- read.table(textConnection(Lines), header = TRUE, na.strings = "NULL")
> DF
  charCol floatCol intCol
1       a      1.5     10
2       b       NA      3
3       c      3.9     NA
4       d     -3.4      4
like image 41
G. Grothendieck Avatar answered Sep 25 '22 17:09

G. Grothendieck