Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dataframe within dataframe?

Tags:

dataframe

r

Consider this example:

df <- data.frame(id=1:10,var1=LETTERS[1:10],var2=LETTERS[6:15])

fun.split <- function(x) tolower(as.character(x))
df$new.letters <- apply(df[ ,2:3],2,fun.split)

df$new.letters.var1
#NULL

colnames(df)
# [1] "id"          "var1"        "var2"        "new.letters"

df$new.letters
#       var1 var2
# [1,]  "a"  "f" 
# [2,]  "b"  "g" 
# [3,]  "c"  "h" 
# [4,]  "d"  "i" 
# [5,]  "e"  "j" 
# [6,]  "f"  "k" 
# [7,]  "g"  "l" 
# [8,]  "h"  "m" 
# [9,]  "i"  "n" 
# [10,] "j"  "o" 

Would be someone so kind and explain what is going on here? A new dataframe within dataframe?

I expected this:

colnames(df)
# id var1 var2 new.letters.var1 new.letters.var2
like image 840
Maximilian Avatar asked Jun 17 '15 15:06

Maximilian


People also ask

How do you create a data frame in a DataFrame?

Dataframe can be created using dataframe() function. The dataframe() takes one or two parameters. The first one is the data which is to be filled in the dataframe table. The data can be in form of list of lists or dictionary of lists.

Can you append a DataFrame to another DataFrame in pandas?

append() function is used to append rows of other dataframe to the end of the given dataframe, returning a new dataframe object. Columns not in the original dataframes are added as new columns and the new cells are populated with NaN value.


3 Answers

@akrun solved 90% of my problem. But I had data.frames buried within data.frames, buried within data.frames and so on, without knowing the depth to which this was happening.

In this case, I thought sharing my recursive solution might be helpful to others searching this thread as I was:

    unnest_dataframes <- function(x) {

        y <- do.call(data.frame, x)

        if("data.frame" %in% sapply(y, class)) unnest_dataframes(y)

        y

    }

    new_data <- unnest_dataframes(df)

Although this itself sometimes has problems and it can be helpful to separate all columns of class "data.frame" from the original data set then cbind() it back together like so:

  # Find all columns that are data.frame
  # Assuming your data frame is stored in variable 'y'
  data.frame.cols <- unname(sapply(y, function(x) class(x) == "data.frame"))
  z <- y[, !data.frame.cols]

  # All columns of class "data.frame"
  dfs <- y[, data.frame.cols]

  # Recursively unnest each of these columns
  unnest_dataframes <- function(x) {
    y <- do.call(data.frame, x)
    if("data.frame" %in% sapply(y, class)) {
        unnest_dataframes(y)
    } else {
        cat('Nested data.frames successfully unpacked\n')
      }
    y
  }

  df2 <- unnest_dataframes(dfs)

  # Combine with original data
  all_columns <- cbind(z, df2)
like image 20
RDRR Avatar answered Oct 11 '22 08:10

RDRR


The reason is because you assigned a single new column to a 2 column matrix output by apply. So, the result will be a matrix in a single column. You can convert it back to normal data.frame with

 do.call(data.frame, df)

A more straightforward method will be to assign 2 columns and I use lapply instead of apply as there can be cases where the columns are of different classes. apply returns a matrix and with mixed class, the columns will be 'character' class. But, lapply gets the output in a list and preserves the class

df[paste0('new.letters', names(df)[2:3])] <- lapply(df[2:3], fun.split)
like image 111
akrun Avatar answered Oct 11 '22 06:10

akrun


In this case R doesn't behave like one would expect but maybe if we dig deeper we can solve it. What is a data frame? as Norman Matloff says in his book (chapter 5):

a data frame is a list, with the components of that list being equal-length vectors

The following code might be useful to understand.

class(df$new.letters)
[1] "matrix"


str(df)
'data.frame':   10 obs. of  4 variables:
 $ id         : int  1 2 3 4 5 6 7 8 9 10
 $ var1       : Factor w/ 10 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10
 $ var2       : Factor w/ 10 levels "F","G","H","I",..: 1 2 3 4 5 6 7 8 9 10
 $ new.letters: chr [1:10, 1:2] "a" "b" "c" "d" ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr  "var1" "var2"

Maybe the reason why it looks strange is in the print methods. Consider this:

colnames(df$new.letters)
[1] "var1" "var2"

maybe there must something in the print methods that combine the sub-names of objects and display them all.

For example here the vectors that constitute the df are:

names(df)
[1] "id"          "var1"        "var2"        "new.letters"

but in this case the vector new.letters also has a dim attributes (in fact it is a matrix) were dimensions have names var1 and var1 too. See this code:

attributes(df$new.letters)
$dim
[1] 10  2

$dimnames
$dimnames[[1]]
NULL

$dimnames[[2]]
[1] "var1" "var2"

but when we print we see all of them like they were separated vectors (and so columns of the data.frame!).

Edit: Print methods

Just for curiosity in order to improve this question I looked inside the methods of the print functions:

methods(print)

The previous code produces a very long list of methods for the generic function print but there is no one for data.frame. The one that looks for data frame (but I am sure there is a more technically way to find out that) is listof.

getS3method("print", "listof")
function (x, ...) 
{
    nn <- names(x)
    ll <- length(x)
    if (length(nn) != ll) 
        nn <- paste("Component", seq.int(ll))
    for (i in seq_len(ll)) {
        cat(nn[i], ":\n")
        print(x[[i]], ...)
        cat("\n")
    }
    invisible(x)
}
<bytecode: 0x101afe1c8>
<environment: namespace:base>

Maybe I am wrong but It seems to me that in this code there might be useful informations about why that happens, specifically when the if (length(nn) != ll) is stated.

like image 42
SabDeM Avatar answered Oct 11 '22 06:10

SabDeM