Dataframe within dataframe?

Tags:

3 Answers

@akrun solved 90% of my problem. But I had data.frames buried within data.frames, buried within data.frames and so on, without knowing the depth to which this was happening.

In this case, I thought sharing my recursive solution might be helpful to others searching this thread as I was:

    unnest_dataframes <- function(x) {

        y <- do.call(data.frame, x)

        if("data.frame" %in% sapply(y, class)) unnest_dataframes(y)

        y

    }

    new_data <- unnest_dataframes(df)

Although this itself sometimes has problems and it can be helpful to separate all columns of class "data.frame" from the original data set then cbind() it back together like so:

  # Find all columns that are data.frame
  # Assuming your data frame is stored in variable 'y'
  data.frame.cols <- unname(sapply(y, function(x) class(x) == "data.frame"))
  z <- y[, !data.frame.cols]

  # All columns of class "data.frame"
  dfs <- y[, data.frame.cols]

  # Recursively unnest each of these columns
  unnest_dataframes <- function(x) {
    y <- do.call(data.frame, x)
    if("data.frame" %in% sapply(y, class)) {
        unnest_dataframes(y)
    } else {
        cat('Nested data.frames successfully unpacked\n')
      }
    y
  }

  df2 <- unnest_dataframes(dfs)

  # Combine with original data
  all_columns <- cbind(z, df2)

answered Oct 11 '22 08:10

The reason is because you assigned a single new column to a 2 column matrix output by apply. So, the result will be a matrix in a single column. You can convert it back to normal data.frame with

 do.call(data.frame, df)

A more straightforward method will be to assign 2 columns and I use lapply instead of apply as there can be cases where the columns are of different classes. apply returns a matrix and with mixed class, the columns will be 'character' class. But, lapply gets the output in a list and preserves the class

df[paste0('new.letters', names(df)[2:3])] <- lapply(df[2:3], fun.split)

111

answered Oct 11 '22 06:10

akrun

In this case R doesn't behave like one would expect but maybe if we dig deeper we can solve it. What is a data frame? as Norman Matloff says in his book (chapter 5):

a data frame is a list, with the components of that list being equal-length vectors

The following code might be useful to understand.

class(df$new.letters)
[1] "matrix"


str(df)
'data.frame':   10 obs. of  4 variables:
 $ id         : int  1 2 3 4 5 6 7 8 9 10
 $ var1       : Factor w/ 10 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10
 $ var2       : Factor w/ 10 levels "F","G","H","I",..: 1 2 3 4 5 6 7 8 9 10
 $ new.letters: chr [1:10, 1:2] "a" "b" "c" "d" ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr  "var1" "var2"

Maybe the reason why it looks strange is in the print methods. Consider this:

colnames(df$new.letters)
[1] "var1" "var2"

maybe there must something in the print methods that combine the sub-names of objects and display them all.

For example here the vectors that constitute the df are:

names(df)
[1] "id"          "var1"        "var2"        "new.letters"

but in this case the vector new.letters also has a dim attributes (in fact it is a matrix) were dimensions have names var1 and var1 too. See this code:

attributes(df$new.letters)
$dim
[1] 10  2

$dimnames
$dimnames[[1]]
NULL

$dimnames[[2]]
[1] "var1" "var2"

but when we print we see all of them like they were separated vectors (and so columns of the data.frame!).

Edit: Print methods

Just for curiosity in order to improve this question I looked inside the methods of the print functions:

methods(print)

The previous code produces a very long list of methods for the generic function print but there is no one for data.frame. The one that looks for data frame (but I am sure there is a more technically way to find out that) is listof.

getS3method("print", "listof")
function (x, ...) 
{
    nn <- names(x)
    ll <- length(x)
    if (length(nn) != ll) 
        nn <- paste("Component", seq.int(ll))
    for (i in seq_len(ll)) {
        cat(nn[i], ":\n")
        print(x[[i]], ...)
        cat("\n")
    }
    invisible(x)
}
<bytecode: 0x101afe1c8>
<environment: namespace:base>

Maybe I am wrong but It seems to me that in this code there might be useful informations about why that happens, specifically when the if (length(nn) != ll) is stated.

answered Oct 11 '22 06:10

SabDeM

Related questions
                            
                                as.Date(as.POSIXct()) gives the wrong date?
                            
                                How to round a time?
                            
                                How can I avoid having my R script printed every time I run it?
                            
                                rowMeans function in dplyr
                            
                                Is `if` faster than ifelse?
                            
                                Are there raw strings in R for regular expressions?
                            
                                Group by columns and summarize a column into a list
                            
                                How to Switch Between NavBar Tabs with a Button R Shiny
                            
                                How can I parse CSV data from a character vector to extract a data frame?
                            
                                How to Parse Year + Week Number in R?
                            
                                Replacing all occurrences of a pattern in a string
                            
                                Argument is of length zero
                            
                                Changing the Color of negative numbers to Red in a table generated with xtable()?
                            
                                heatmap-like plot, but for categorical variables
                            
                                Return the character associated with the specified Ascii code in R
                            
                                Set global thousand separator on knitr
                            
                                Lazy sequences in R
                            
                                Shift values in single column of dataframe up
                            
                                "subset" and "[" on dataframe give slightly different results, why?
                            
                                how to download and display an image from an URL in R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Dataframe within dataframe?

Tags:

dataframe

r

Maximilian

People also ask

3 Answers

RDRR

akrun

Edit: Print methods

SabDeM

Recent Activity

Donate For Us