how to determine if a character vector is a valid numeric or integer vector

Tags:

I am trying to turn a nested list structure into a dataframe. The list looks similar to the following (it is serialized data from parsed JSON read in using the httr package).

  myList <- list(object1 = list(w=1, x=list(y=0.1, z="cat")), object2 = list(w=NULL, x=list(z="dog")))

EDIT: my original example data was too simple. The actual data are ragged, meaning that not all variables exist for every object, and some of the list elements are NULL. I edited the data to reflect this.

unlist(myList) does a great job of recursively flattening the list, and I can then use lapply to flatten all the objects nicely.

  flatList <- lapply(myList, FUN= function(object) {return(as.data.frame(rbind(unlist(object))))})

And finally, I can button it up using plyr::rbind.fill

  myDF <- do.call(plyr::rbind.fill, flatList)
  str(myDF)

  #'data.frame':    2 obs. of  3 variables:
  #$ w  : Factor w/ 2 levels "1","2": 1 2
  #$ x.y: Factor w/ 2 levels "0.1","0.2": 1 2
  #$ x.z: Factor w/ 2 levels "cat","dog": 1 2

The problem is that w and x.y are now being interpreted as character vectors, which by default get parsed as factors in the dataframe. I believe that unlist() is the culprit, but I can't figure out another way to recursively flatten the list structure. A workaround would be to post-process the dataframe, and assign data types then. What is the best way to determine if a vector is a valid numeric or integer vector?

212

asked Jun 09 '14 21:06

Andrew Barr

2 Answers

As discussed here, checking if as.numeric returns NA values is a simple approach to checking if a character string contains numeric data. Now you can do something like:

myDF2 <- lapply(myDF, function(col) {
  if (suppressWarnings(all(!is.na(as.numeric(as.character(col)))))) {
    as.numeric(as.character(col))
  } else {
    col
  }
})
str(myDF2)
# List of 3
#  $ w  : num [1:2] 1 2
#  $ x.y: num [1:2] 0.1 0.2
#  $ x.z: Factor w/ 2 levels "cat","dog": 1 2

117

answered Oct 20 '22 18:10

josliber

When NAs are included @josliber's original function didn't work (though it answered the question well for the sample data). @Amy M's function should work but requires loading Hmisc package.

What about something like this:

can.be.numeric <- function(x) {
    stopifnot(is.atomic(x) || is.list(x)) # check if x is a vector
    numNAs <- sum(is.na(x))
    numNAs_new <- suppressWarnings(sum(is.na(as.numeric(x))))
    return(numNAs_new == numNAs)
}

It counts NAs in input vector x and NAs in the output of as.numeric(x) and returns TRUE if the vector can be "safely" converted to numeric (i.e. without adding any additional NA values).

UPDATE: Request to show how to use the function. You want to call this function on each column and only convert columns that can be numeric.

myDF2 <- lapply(myDF, function(col) {
  if (can.be.numeric(col)) {
    as.numeric(col)
  } else {
    col
  }
})
str(as.data.frame(myDF2))
# 'data.frame': 2 obs. of  3 variables:
#  $ w  : num  1 NA
#  $ x.y: num  0.1 NA
#  $ x.z: chr  "cat" "dog"

answered Oct 20 '22 18:10

Stefan Avey

Related questions
                            
                                R equivalent to MATLAB's "stop if error"
                            
                                Why are " preferred over ' in R
                            
                                Subsetting data.table by 2nd column only of a 2 column key, using binary search not vector scan
                            
                                Emoticons in Twitter Sentiment Analysis in r
                            
                                Is there a quick way to get the R equivalent of ls() in Python?
                            
                                export data frames to Excel via xlsx with conditional formatting
                            
                                How to "unmelt" data with reshape r
                            
                                Downloading png from Shiny (R)
                            
                                Associate a color palette with ggplot2 theme
                            
                                Filter dataframe using global variable with the same name as column name [duplicate]
                            
                                Horizontal Rule hr() in R Shiny Sidebar
                            
                                R + plotly: solid of revolution
                            
                                In ESS/Emacs, how can I get the R process buffer to scroll to the bottom after a C-c C-j or C-c C-r
                            
                                Exceeding memory limit in R (even with 24GB RAM)
                            
                                could not find function "cast" despite reshape2 installed and loaded
                            
                                Abbreviation of "collapse" in paste?
                            
                                read.table reads "T" as TRUE and "F" as FALSE, how to avoid?
                            
                                subsetting a data.table using !=<some non-NA> excludes NA too
                            
                                Should I get a habit of removing unused variables in R?
                            
                                Filter data.table using inequalities and variable column names

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

how to determine if a character vector is a valid numeric or integer vector

Tags:

r

lapply

rbind

Andrew Barr

People also ask

2 Answers

josliber

Stefan Avey

Recent Activity

Donate For Us