Behavior of <- NULL on lists versus data.frames for removing data

Q: In which way does a data frame differ from a list?

Lists can have components of the same type or mode, or components of different types or modes. They can hence combine different components (numeric, logical…) in a single object. A Data frame is simply a List of a specified class called “data.

Q: Can a data frame contain lists?

Data frame columns can contain lists Taking into account the list structure of the column, we can type the following to change the values in a single cell. You can also create a data frame having a list as a column using the data.

Q: What are data frames used for?

A data frame is used for storing data tables. It is a list of vectors of equal length. For example, the following variable df is a data frame containing three vectors n, s, b.

Q: What is data frame in statistics?

Data Frames are data displayed in a format as a table. Data Frames can have different types of data inside it. While the first column can be character , the second and third can be numeric or logical . However, each column should have the same type of data.

Tags:

dataframe

r

Many R users eventually figure out lots of ways to remove elements from their data. One way is to use NULL, particularly when you want to do something like drop a column from a data.frame or drop an element from a list.

Eventually, a user comes across a situation where they want to drop several columns from a data.frame at once, and they hit upon <- list(NULL) as the solution (since using <- NULL will result in an error).

A data.frame is a special type of list, so it wouldn't be too tough to imagine that the approaches for removing items from a list should be the same as removing columns from a data.frame. However, they produce different results, as can be seen in the example below.

## Make some small data--two data.frames and two lists
cars1 <- cars2 <- head(mtcars)[1:4]
cars3 <- cars4 <- as.list(cars2)

## Demonstration that the `list(NULL)` approach works
cars1[c("mpg", "cyl")] <- list(NULL)
cars1
#                   disp  hp
# Mazda RX4          160 110
# Mazda RX4 Wag      160 110
# Datsun 710         108  93
# Hornet 4 Drive     258 110
# Hornet Sportabout  360 175
# Valiant            225 105

## Demonstration that simply using `NULL` does not work
cars2[c("mpg", "cyl")] <- NULL
# Error in `[<-.data.frame`(`*tmp*`, c("mpg", "cyl"), value = NULL) : 
#   replacement has 0 items, need 12

Switch to applying the same concept to a list, and compare the difference in behavior.

## Does not fully drop the items, but sets them to `NULL`
cars3[c("mpg", "cyl")] <- list(NULL)
# $mpg
# NULL
# 
# $cyl
# NULL
# 
# $disp
# [1] 160 160 108 258 360 225
# 
# $hp
# [1] 110 110  93 110 175 105

## *Does* drop the `list` items while this would
##   have produced an error with a `data.frame`
cars4[c("mpg", "cyl")] <- NULL
# $disp
# [1] 160 160 108 258 360 225
# 
# $hp
# [1] 110 110  93 110 175 105

The main questions I have are, if a data.frame is a list, why does it behave so differently in this scenario? Is there a foolproof way of knowing when an element will be dropped, when it will produce an error, and when it will simply be given a NULL value? Or do we depend on trial-and-error for this?

833

asked Oct 17 '13 18:10

A5C1D2H2I1M1N2O1R2T1

1 Answers

DISCLAIMER : This is a relatively long answer, not very clear, and not very interesting, so feel free to skip it or to only read the (sort of) conclusion.

I've tried a bit of tracing on [<-.data.frame, as suggested by Ari B. Friedman. Debugging starts on line 162 of the function, where there is a test to determine if value (the replacement value argument) is not a list.

Case 1 : `value` is not a list

Then it is considered as a vector. Matrices and arrays are considered as one vector, like the help page says :

Note that when the replacement value is an array (including a matrix) it is not treated as a series of columns (as 'data.frame’ and ‘as.data.frame’ do) but inserted as a single column.

If only one column of the data frame is selected in the LHS, then the only constraint is that the number of rows to be replaced must be equal to or a multiple of length(value). If this is the case, value is recycled with rep if necessary and converted to a list. If length(value)==0, there is no recycling (as it is impossible), and value is just converted to a list.

If several columns of the data frame are selected in the LHS, then the constraint is a bit more complex : length(value) must be equal to or a multiple of the total number of elements to be replaced, ie the number of rows * the number of columns.

The exact test is the following :

(m < n * p && (m == 0L || (n * p)%%m))

Where n is the number of rows, p the number of columns, and m the length of value. If the condition is FALSE, then value is converted into an n x p matrix (thus recycled if necessary) and the matrix is splitted by columns into a list.

If value is NULL, then the condition is TRUE as m==0, and the function is stopped. Note that the problem occurs for every value of length 0. For example,

cars1[,c("mpg")] <- numeric(0)

works, whereas :

cars1[,c("mpg","disp")] <- numeric(0)

fails in the same way as cars1[,c("mpg","disp")] <- NULL

Case 2 : `value` is a list

If value is a list, then it is used to replace several columns at the same time. For example :

cars1[,c("mpg","disp")] <- list(1,2)

will replace cars1$mpg with a vector of 1s, and cars1$disp with a vector of 2s.

There is a sort of "double recycling" which happens here :

first, the length of the value list must be less than or equal to the number of columns to be replaced. If it is less, then a classic recycling is done.
second, for each element of the value list, its length must be equal to, greater than or a multiple of the number of rows to be replaced. If it is less, another recycling is done for each list element to match the number of rows. If it is more, a warning is displayed.

When the value in RHS is list(NULL), nothing really happens, as recycling is impossible (rep(NULL, 10) is always NULL). But the code continues and in the end each column to be replaced is assigned NULL, ie is removed.

Summary and (sort of) conclusion

data.frame and list behave differently because of the specific constraint on data frames, where each element must be of the same length. Removing several columns by assigning NULL fails not because of the NULL value by itself, but because NULL is of length 0. The error comes from a test which verifies if the length of the assigned value is a multiple of the number of elements to be replaced (number of rows * number of columns).

Handling the case of value=NULL for multiple columns doesn't seem difficult (by adding about four lines of simple code), but it requires to consider NULL as a special case. I'm not able to determine if it is not handled because it would break the logic of the function implementation, or because it would have side effects I don't know.

150

answered Sep 29 '22 14:09

juba

Related questions
                            
                                testthat fails within devtools::check but works in devtools::test
                            
                                Structure of lists in foreach package
                            
                                Packaging supporting R code in a python module?
                            
                                How do you undo a setkey ordering in data.table?
                            
                                Significance level of ACF and PACF in R
                            
                                SparkR filterRDD and flatMap not working
                            
                                Are rCharts and DT compatible in rmarkdown?
                            
                                Enabling vignette compression for R CMD build in RStudio
                            
                                Unexpected Convolution Results
                            
                                What does "argument to 'which' is not logical" mean in FactoMineR MCA?
                            
                                How to move out of auto-completed quotes or parentheses in RStudio?
                            
                                Trouble with strings with <U+0092> Unicode characters
                            
                                Code chunk font size in Beamer with knitr and latex
                            
                                collect only if query returns less than n_max rows
                            
                                How to change the order of the panels in simple Lattice graphs
                            
                                Is there an implementation of Hadley's ddply for python?
                            
                                Difference between installing a package from source and from compiled binary [duplicate]
                            
                                R connecting to EC2 instance for parallel processing
                            
                                "Incorrect number of dimensions" error, help me understand why
                            
                                How to avoid implicit character conversion when using apply on dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Behavior of <- NULL on lists versus data.frames for removing data

Tags:

dataframe

r

A5C1D2H2I1M1N2O1R2T1

People also ask

1 Answers

Case 1 : `value` is not a list

Case 2 : `value` is a list

Summary and (sort of) conclusion

juba

Recent Activity

Donate For Us

Behavior of <- NULL on lists versus data.frames for removing data

Tags:

dataframe

r

A5C1D2H2I1M1N2O1R2T1

People also ask

1 Answers

Case 1 : value is not a list

Case 2 : value is a list

Summary and (sort of) conclusion

juba

Related questions

Recent Activity

Donate For Us

Case 1 : `value` is not a list

Case 2 : `value` is a list