I need to read a data frame from a file containing NULL values. Here's an example file:
charCol floatCol intCol a 1.5 10 b NULL 3 c 3.9 NULL d -3.4 4
I read this file into a data frame:
> df <- read.table('example.dat', header=TRUE)
But "NULL" entries are not interpreted by R as NULL:
> is.null(df$floatCol[2])
[1] FALSE
How should I format my input file so that R properly treats such entries as NULL?
Always always always do summary(thing) if something is unexpected.
> summary(df)
charCol floatCol intCol
a:1 1.5 :1 10 :1
b:1 -3.4:1 3 :1
c:1 3.9 :1 4 :1
d:1 NULL:1 NULL:1
that looks a bit weird. Drill down:
> summary(df$floatCol)
1.5 -3.4 3.9 NULL
1 1 1 1
what the heck is it?
> class(df$floatCol)
[1] "factor"
The presence of an invalid numeric format (the string 'NULL') has caused R to go "oh I guess these aren't numbers, I'll read them into character strings and make a factor (categorical variable) for you".
The solution has just been posted to use na.string="NULL", but remember that NA isn't the same as NULL in R. NA is a marker for missing data, NULL is a genuine non-value. Compare:
> c(1,2,3,NULL,4)
[1] 1 2 3 4
> c(1,2,3,NA,4)
[1] 1 2 3 NA 4
Once you've read it in correctly, the appropriate test is usually is.na(foo)
Try this:
> Lines <- "charCol floatCol intCol
+ a 1.5 10
+ b NULL 3
+ c 3.9 NULL
+ d -3.4 4"
>
> # DF <- read.table("myfile", header = TRUE, na.strings = "NULL")
> DF <- read.table(textConnection(Lines), header = TRUE, na.strings = "NULL")
> DF
charCol floatCol intCol
1 a 1.5 10
2 b NA 3
3 c 3.9 NA
4 d -3.4 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With