Apologies as I thought there would be a very obvious answer but I can't find anything on the net...
I often get very large datasets where missing values are blank e.g. (in short)
#Some description of the dataset
#cover x number of lines
31 3213 313 64 63
31 3213 313 64 63
31 3213 313 64 63
31 3213 313 64 63
31 3213 313 64 63
12 178 190 865
532 31 6164 68
614 131 864 808
I would like to replace all the blanks by, for example, -999. If I use read table such that
dat = read.table('file.txt',skip=2)
I get the error message
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 6 did not have 5 elements
I could open the file as a data frame and do
dat = data.frame('file.txt',skip=2)
is.na(rad1) = which(rad1 == '')
but I don't know if it would work because I don't know how to skip the top 2 lines when reading a dataframe (e.g. the equivalent of "skip") and I couldn't find the answer either anywhere. Could anyone help?
Thanks.
If you know the widths of each column then you can use read.fwf
e.g.
> dat <- read.fwf('temp.txt', skip=2, widths=c(5,5,6,6,6))
> dat
V1 V2 V3 V4 V5
1 31 3213 313 64 63
2 31 3213 313 64 63
3 31 3213 313 64 63
4 31 3213 313 64 63
5 31 3213 313 64 63
6 12 178 NA 190 865
7 532 31 6164 68 NA
8 614 NA 131 864 808
Although it's easy to replace NA
values with any value you want, that's just a bad idea, because R has many great way of dealing with NA values.
For example, to take the mean of column two, use:
mean(dat$V2, na.rm=TRUE)
[1] 163.4286
R has other functions to deal with missing data. For example, you can use na.omit()
to completely remove rows with missing data.
> na.omit(dat)
V1 V2 V3 V4 V5
1 31 3213 313 64 63
2 31 3213 313 64 63
3 31 3213 313 64 63
4 31 3213 313 64 63
5 31 3213 313 64 63
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With