Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting a character to a numeric value in R

I have a file that I read in into R and is translated to a dataframe (called CA1) to have the structure as followed:

   Station_ID Guage_Type   Lat   Long     Date Time_Zone Time_Frame H0 H1 H2 H3 H4 H5  H6  H7  H8  H9 H10 H11 H12 H13 H14 H15 H16 H17 H18 H19 H20 H21 H22 H23
 1    4457700         HI 41.52 124.03 19480701         8        LST  0  0  0  0  0  0   0   0   0   0   0   0 MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS
 2    4457700         HI 41.52 124.03 19480705         8        LST  0  1  1  1  1  1   2   2   2   4   5   5   4   7   1   1   0   0  10  13   5   1   1   3
 3    4457700         HI 41.52 124.03 19480706         8        LST  1  1  1  0  1  1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
 4    4457700         HI 41.52 124.03 19480727         8        LST  3  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
 5    4457700         HI 41.52 124.03 19480801         8        LST  0  0  0  0  0  0   0   0   0   0   0   0 MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS
 6    4457700         HI 41.52 124.03 19480817         8        LST  0  0  0  0  0  0 ACC ACC ACC ACC ACC ACC   6   1   0   0   0   0   0   0   0   0   0   0

H0 through H23 are read in as character() since there will be cases when the value will not be numeric and will have values such as MIS, ACC, or DEL.

My question: is there a way to typecast the values for each column H0 through H23 to be numeric and have the character values (MIS, ACC, DEL) as NA or NAN which I can test for it if they are (is.nan or is.na) so I can run some numeric models on it. Or would it be best to have the character values to be changed to an identifier, such as -9999?

I have tried many ways. I have found a few on this site but none of work. Such as:

 for (i in 8:31)
 {
     CA1[6,i] <- as.numeric(as.character(CA1[6,i]))
 }

which of course gives warnings but as I test if two specific values is_numeric() (CA1[6,8] and CA1[6,19]) I get a false statement for both. The first I don't understand why, but the second I do since it is a "". However, I can test that with is.na(CA1[6,19]) and returns true, which is just fine for me to know it is not numeric.

A second way I tried is:

 for (i in 8:31)
 {
     CA1[6,i] <- as.numeric(levels(CA1[6,i]))[CA1[6,i]]
 }

which I get the same results as before.

Is there a way of doing what I am trying to do in an efficient manner? Your help is greatly appreciated. Thank you

like image 328
Luciano Rodriguez Avatar asked May 04 '12 09:05

Luciano Rodriguez


1 Answers

The immediate problem is each column of a data frame can only contain values of one type. The 6 in CA1[6,i] in your code means that only a single value is being converted in each column, so, when it is inserted after conversion, it has to be coerced back to a string to match the rest of the column.

You can solve this by converting the whole column in one go, so that the column is entirely replaced. i.e. remove the 6:

 for (i in 8:31)
 {
     CA1[,i] <- as.numeric(as.character(CA1[,i]))
 }
like image 123
huon Avatar answered Oct 03 '22 04:10

huon