Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace blanks in datasets with value in r

Tags:

r

Apologies as I thought there would be a very obvious answer but I can't find anything on the net...

I often get very large datasets where missing values are blank e.g. (in short)

#Some description of the dataset
#cover x number of lines
31   3213 313   64    63
31   3213 313   64    63
31   3213 313   64    63
31   3213 313   64    63
31   3213 313   64    63
12   178        190   865
532  31   6164  68
614       131   864   808

I would like to replace all the blanks by, for example, -999. If I use read table such that

dat = read.table('file.txt',skip=2)

I get the error message

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
line 6 did not have 5 elements

I could open the file as a data frame and do

dat = data.frame('file.txt',skip=2)
is.na(rad1) = which(rad1 == '')

but I don't know if it would work because I don't know how to skip the top 2 lines when reading a dataframe (e.g. the equivalent of "skip") and I couldn't find the answer either anywhere. Could anyone help?

Thanks.

like image 751
SnowFrog Avatar asked Dec 09 '22 18:12

SnowFrog


1 Answers

If you know the widths of each column then you can use read.fwf

e.g.

> dat <- read.fwf('temp.txt', skip=2, widths=c(5,5,6,6,6))
> dat
   V1   V2   V3  V4  V5
1  31 3213  313  64  63
2  31 3213  313  64  63
3  31 3213  313  64  63
4  31 3213  313  64  63
5  31 3213  313  64  63
6  12  178   NA 190 865
7 532   31 6164  68  NA
8 614   NA  131 864 808

Although it's easy to replace NA values with any value you want, that's just a bad idea, because R has many great way of dealing with NA values.

For example, to take the mean of column two, use:

mean(dat$V2, na.rm=TRUE)
[1] 163.4286

R has other functions to deal with missing data. For example, you can use na.omit() to completely remove rows with missing data.

> na.omit(dat)
  V1   V2  V3 V4 V5
1 31 3213 313 64 63
2 31 3213 313 64 63
3 31 3213 313 64 63
4 31 3213 313 64 63
5 31 3213 313 64 63
like image 185
CHP Avatar answered Dec 11 '22 09:12

CHP