Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using data.table's fread with varying lengths for blank missing values

Tags:

r

data.table

I have a dataset with many missing values. Some of the missing values are NAs, some are Nulls, and others have varying lengths of blank spaces. I would like to utilize the fread function in R to be able to read all these values as missing.

Here is an example:

#Find fake data
iris <- data.table(iris)[1:5]

#Add missing values non-uniformly
iris[1,Species:='         ']
iris[2,Species:=' ']
iris[3,Species:='NULL']

#Write to csv and read back in using fread
write.csv(iris,file="iris.csv")
fread("iris.csv",na.strings=c("NULL"," "))

   V1 Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
1:  1          5.1         3.5          1.4         0.2          
2:  2          4.9         3.0          1.4         0.2        NA
3:  3          4.7         3.2          1.3         0.2        NA
4:  4          4.6         3.1          1.5         0.2    setosa
5:  5          5.0         3.6          1.4         0.2    setosa

From the above example, we see that I am unable to account for the first missing value since there are many blank spaces. Any one know of a way to account for this?

like image 885
Mike.Gahan Avatar asked Oct 01 '22 14:10

Mike.Gahan


1 Answers

Thanks so much for the wonderful answer from @eddi.

fread("sed 's/ *//g' iris.csv",na.strings=c("",NA,"NULL"))
like image 184
Mike.Gahan Avatar answered Oct 10 '22 20:10

Mike.Gahan