I am trying to turn a nested list structure into a dataframe. The list looks similar to the following (it is serialized data from parsed JSON read in using the httr package).
myList <- list(object1 = list(w=1, x=list(y=0.1, z="cat")), object2 = list(w=NULL, x=list(z="dog")))
EDIT: my original example data was too simple. The actual data are ragged, meaning that not all variables exist for every object, and some of the list elements are NULL. I edited the data to reflect this.
unlist(myList)
does a great job of recursively flattening the list, and I can then use lapply
to flatten all the objects nicely.
flatList <- lapply(myList, FUN= function(object) {return(as.data.frame(rbind(unlist(object))))})
And finally, I can button it up using plyr::rbind.fill
myDF <- do.call(plyr::rbind.fill, flatList)
str(myDF)
#'data.frame': 2 obs. of 3 variables:
#$ w : Factor w/ 2 levels "1","2": 1 2
#$ x.y: Factor w/ 2 levels "0.1","0.2": 1 2
#$ x.z: Factor w/ 2 levels "cat","dog": 1 2
The problem is that w and x.y are now being interpreted as character vectors, which by default get parsed as factors in the dataframe. I believe that unlist()
is the culprit, but I can't figure out another way to recursively flatten the list structure. A workaround would be to post-process the dataframe, and assign data types then. What is the best way to determine if a vector is a valid numeric or integer vector?
To check whether all values in a vector in R are integer or not, we can round the vector using floor function then subtract the vector values from it and check whether the output is zero or not. If the output will be zero that means the value is integer otherwise it is not.
numeric() Function. is. numeric() function in R Language is used to check if the object passed to it as argument is of numeric type.
To check if type of given vector is numeric in R, that is either integer or double, call is. numeric() function and pass the vector as argument to this function. If the given vector is of type integer or double, then is. numeric() returns TRUE, or else, it returns FALSE.
numeric creates a real vector of the specified length. The elements of the vector are all equal to 0 . as. numeric attempts to coerce its argument to numeric type (either integer or real).
As discussed here, checking if as.numeric
returns NA
values is a simple approach to checking if a character string contains numeric data. Now you can do something like:
myDF2 <- lapply(myDF, function(col) {
if (suppressWarnings(all(!is.na(as.numeric(as.character(col)))))) {
as.numeric(as.character(col))
} else {
col
}
})
str(myDF2)
# List of 3
# $ w : num [1:2] 1 2
# $ x.y: num [1:2] 0.1 0.2
# $ x.z: Factor w/ 2 levels "cat","dog": 1 2
When NAs are included @josliber's original function didn't work (though it answered the question well for the sample data). @Amy M's function should work but requires loading Hmisc
package.
What about something like this:
can.be.numeric <- function(x) {
stopifnot(is.atomic(x) || is.list(x)) # check if x is a vector
numNAs <- sum(is.na(x))
numNAs_new <- suppressWarnings(sum(is.na(as.numeric(x))))
return(numNAs_new == numNAs)
}
It counts NA
s in input vector x
and NA
s in the output of as.numeric(x)
and returns TRUE
if the vector can be "safely" converted to numeric
(i.e. without adding any additional NA
values).
UPDATE: Request to show how to use the function. You want to call this function on each column and only convert columns that can be numeric.
myDF2 <- lapply(myDF, function(col) {
if (can.be.numeric(col)) {
as.numeric(col)
} else {
col
}
})
str(as.data.frame(myDF2))
# 'data.frame': 2 obs. of 3 variables:
# $ w : num 1 NA
# $ x.y: num 0.1 NA
# $ x.z: chr "cat" "dog"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With