I have a dataset with many missing values. Some of the missing values are NAs, some are Nulls, and others have varying lengths of blank spaces. I would like to utilize the fread
function in R
to be able to read all these values as missing.
Here is an example:
#Find fake data
iris <- data.table(iris)[1:5]
#Add missing values non-uniformly
iris[1,Species:=' ']
iris[2,Species:=' ']
iris[3,Species:='NULL']
#Write to csv and read back in using fread
write.csv(iris,file="iris.csv")
fread("iris.csv",na.strings=c("NULL"," "))
V1 Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1: 1 5.1 3.5 1.4 0.2
2: 2 4.9 3.0 1.4 0.2 NA
3: 3 4.7 3.2 1.3 0.2 NA
4: 4 4.6 3.1 1.5 0.2 setosa
5: 5 5.0 3.6 1.4 0.2 setosa
From the above example, we see that I am unable to account for the first missing value since there are many blank spaces. Any one know of a way to account for this?
Thanks so much for the wonderful answer from @eddi.
fread("sed 's/ *//g' iris.csv",na.strings=c("",NA,"NULL"))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With