Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I test for numeric values in a dataframe of characters, and convert those to numeric?

Tags:

dataframe

r

apply

I have a dataframe somewhat like the following:

> theDF
   ID        Ticker INDUSTRY_SECTOR              VAR             CVAR
1   1      USD CASH                                0                0
12  2      ZAR CASH                 -181412.82055904 -301731.22832191
23  3 BAT SJ EQUITY       Financial  61711.951234826 102641.162795691
34  4 HCI SJ EQUITY       Financial 1095.16002541256 1821.50290513369
45  5 PSG SJ EQUITY       Financial 16498.2192382422  27440.331617902

We can see these are all character columns:

> apply(theDF, 2, mode)
             ID          Ticker INDUSTRY_SECTOR             VAR            CVAR 
    "character"     "character"     "character"     "character"     "character" 

I would like something that will change ONLY the numeric-type vectors to numeric. Basically, if it "looks like" a numeric, make it numeric, otherwise leave it be. I cannot find anything on StackOverflow which does not require knowing the names or columns you want to convert before hand. This DF will not always be in the same order, or have the columns, so I need some dynamic way to check if the columns "look like" numeric and make those columns numerics.

This (obviously) gives me a bunch of NA;s for the character columns:

> apply(theDF, 2, as.numeric)
     ID Ticker INDUSTRY_SECTOR        VAR        CVAR
[1,]  1     NA              NA       0.00       0.000
[2,]  2     NA              NA -181412.82 -301731.228
[3,]  3     NA              NA   61711.95  102641.163
[4,]  4     NA              NA    1095.16    1821.503
[5,]  5     NA              NA   16498.22   27440.332

I tried something like this, but not only does it not work, it seems horribly inefficient:

> apply(theDF, 2, function(x) tryCatch(as.numeric(x),error=function(e) e, warning=function(w) x))
     ID  Ticker          INDUSTRY_SECTOR VAR                CVAR              
[1,] "1" "USD CASH"      ""              "0"                "0"               
[2,] "2" "ZAR CASH"      ""              "-181412.82055904" "-301731.22832191"
[3,] "3" "BAT SJ EQUITY" "Financial"     "61711.951234826"  "102641.162795691"
[4,] "4" "HCI SJ EQUITY" "Financial"     "1095.16002541256" "1821.50290513369"
[5,] "5" "PSG SJ EQUITY" "Financial"     "16498.2192382422" "27440.331617902" 

Is there a better way to do this?

EDIT: People keep asking for this, so here goes...

> apply(theDF, 2, mode)
             ID          Ticker INDUSTRY_SECTOR             VAR            CVAR 
    "character"     "character"     "character"     "character"     "character" 
> sapply(theDF, mode)
             ID          Ticker INDUSTRY_SECTOR             VAR            CVAR 
    "character"     "character"     "character"     "character"     "character" 
> apply(theDF, 2, class)
             ID          Ticker INDUSTRY_SECTOR             VAR            CVAR 
    "character"     "character"     "character"     "character"     "character" 
> sapply(theDF, class)
             ID          Ticker INDUSTRY_SECTOR             VAR            CVAR 
    "character"     "character"     "character"     "character"     "character" 
like image 749
lukehawk Avatar asked Mar 09 '16 15:03

lukehawk


1 Answers

Looks like a job for type.convert().

theDF[] <- lapply(theDF, type.convert, as.is = TRUE)
## check the result
sapply(theDF, class)
#          ID          Ticker INDUSTRY_SECTOR             VAR            CVAR 
#   "integer"     "character"     "character"       "numeric"       "numeric" 

type.convert() coerces a vector to its "most appropriate" type. Setting as.is = TRUE allows us to keep characters as such, where they otherwise would be coerced to factors.

Update: For columns that are not character, they will need to be first coerced to character.

theDF[] <- lapply(theDF, function(x) type.convert(as.character(x), as.is = TRUE))
like image 54
Rich Scriven Avatar answered Nov 15 '22 10:11

Rich Scriven