Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to determine if a character vector is a valid numeric or integer vector

Tags:

r

lapply

rbind

I am trying to turn a nested list structure into a dataframe. The list looks similar to the following (it is serialized data from parsed JSON read in using the httr package).

  myList <- list(object1 = list(w=1, x=list(y=0.1, z="cat")), object2 = list(w=NULL, x=list(z="dog")))

EDIT: my original example data was too simple. The actual data are ragged, meaning that not all variables exist for every object, and some of the list elements are NULL. I edited the data to reflect this.

unlist(myList) does a great job of recursively flattening the list, and I can then use lapply to flatten all the objects nicely.

  flatList <- lapply(myList, FUN= function(object) {return(as.data.frame(rbind(unlist(object))))}) 

And finally, I can button it up using plyr::rbind.fill

  myDF <- do.call(plyr::rbind.fill, flatList)
  str(myDF)

  #'data.frame':    2 obs. of  3 variables:
  #$ w  : Factor w/ 2 levels "1","2": 1 2
  #$ x.y: Factor w/ 2 levels "0.1","0.2": 1 2
  #$ x.z: Factor w/ 2 levels "cat","dog": 1 2

The problem is that w and x.y are now being interpreted as character vectors, which by default get parsed as factors in the dataframe. I believe that unlist() is the culprit, but I can't figure out another way to recursively flatten the list structure. A workaround would be to post-process the dataframe, and assign data types then. What is the best way to determine if a vector is a valid numeric or integer vector?

like image 212
Andrew Barr Avatar asked Jun 09 '14 21:06

Andrew Barr


People also ask

How do you check if a vector is an integer?

To check whether all values in a vector in R are integer or not, we can round the vector using floor function then subtract the vector values from it and check whether the output is zero or not. If the output will be zero that means the value is integer otherwise it is not.

How do I check if a character is numeric in R?

numeric() Function. is. numeric() function in R Language is used to check if the object passed to it as argument is of numeric type.

How do you check if something is a numeric vector in R?

To check if type of given vector is numeric in R, that is either integer or double, call is. numeric() function and pass the vector as argument to this function. If the given vector is of type integer or double, then is. numeric() returns TRUE, or else, it returns FALSE.

What is a numeric vector?

numeric creates a real vector of the specified length. The elements of the vector are all equal to 0 . as. numeric attempts to coerce its argument to numeric type (either integer or real).


2 Answers

As discussed here, checking if as.numeric returns NA values is a simple approach to checking if a character string contains numeric data. Now you can do something like:

myDF2 <- lapply(myDF, function(col) {
  if (suppressWarnings(all(!is.na(as.numeric(as.character(col)))))) {
    as.numeric(as.character(col))
  } else {
    col
  }
})
str(myDF2)
# List of 3
#  $ w  : num [1:2] 1 2
#  $ x.y: num [1:2] 0.1 0.2
#  $ x.z: Factor w/ 2 levels "cat","dog": 1 2
like image 117
josliber Avatar answered Oct 20 '22 18:10

josliber


When NAs are included @josliber's original function didn't work (though it answered the question well for the sample data). @Amy M's function should work but requires loading Hmisc package.

What about something like this:

can.be.numeric <- function(x) {
    stopifnot(is.atomic(x) || is.list(x)) # check if x is a vector
    numNAs <- sum(is.na(x))
    numNAs_new <- suppressWarnings(sum(is.na(as.numeric(x))))
    return(numNAs_new == numNAs)
}

It counts NAs in input vector x and NAs in the output of as.numeric(x) and returns TRUE if the vector can be "safely" converted to numeric (i.e. without adding any additional NA values).

UPDATE: Request to show how to use the function. You want to call this function on each column and only convert columns that can be numeric.

myDF2 <- lapply(myDF, function(col) {
  if (can.be.numeric(col)) {
    as.numeric(col)
  } else {
    col
  }
})
str(as.data.frame(myDF2))
# 'data.frame': 2 obs. of  3 variables:
#  $ w  : num  1 NA
#  $ x.y: num  0.1 NA
#  $ x.z: chr  "cat" "dog"
like image 25
Stefan Avey Avatar answered Oct 20 '22 18:10

Stefan Avey