I'm sure this is kind of basic, but I'd just like to really understand the logic of R data structures here. If I subset a matrix by index out of bounds, I get exactly that error: <pre class="prettyprint"><code>m <- matrix(data = c("foo", "bar"), nrow = 1) m[2,] # Error in m[2, ] : subscript out of bounds </code></pre> If I do the same do a data frame, however, I get all <code>NA</code> rows: <pre class="prettyprint"><code>df <- data.frame(foo = "foo", bar = "bar") df[2,] # foo bar # NA <NA> <NA> </code></pre> If I subset into a non-existent data frame column I get the familiar <pre class="prettyprint"><code>df[, 3] # Error in `[.data.frame`(df, , 3) : undefined columns selected </code></pre> I know (roughly) that data frame rows are weird and to be treated carefully, but I don't quite see the connection to the above behavior. Can someone explain why R behaves in this way for non-existent df rows? Update To be sure, giving <code>NA</code> on out-of-bounds subsets, is normal R behavior for 1D vectors: <pre class="prettyprint"><code>vec <- c("foo", "bar") vec[3] # [1] NA </code></pre> So in a way, the weird one out here is matrix subsetting, not dataframe subsetting, depending from where you're starting out. Still the different 2D subsetting behavior (<code>m[2, ]</code> vs <code>df[2, ]</code>) might strike a dense user (as I am right now) as inconsistent.

<blockquote> Can someone explain why R behaves in this way[?] </blockquote> Short answer: No, probably not. <hr> Longer answer: Once upon a time I was thinking about something similar and read this thread on R-devel: Definition of <code>[[</code>. Basically it boils down to: <blockquote> The semantics of <code>[</code> and <code>[[</code> don't seem to be fully specified in the Reference manual. [...] I assume that these are features, not bugs, but I can't find documentation for them </blockquote> Duncan Murdoch, a former member of the R core team gives a very nice reply: <blockquote> There is more documentation in the man page for <code>Extract</code>, but I think it is incomplete. The most complete documentation is of course the source code*, but it may not answer the question of what's intentional and what's accidental </blockquote> As mentioned in the R-devel thread, the only description in the manual is 3.4.1 Indexing by vectors: <blockquote> If <code>i</code> is positive and exceeds <code>length(x)</code> then the corresponding selection is <code>NA</code> </blockquote> But, this applies to "indexing of simple vectors". Similar out of bounds indexing for "non-simple" vectors does not seem to be described. Duncan Murdoch again: <blockquote> So what is a simple vector? That is not explicitly defined, and it probably should be. </blockquote> Thus, it may seem like no one knows the answer to your why question. <hr> See also "8.2.13 nonexistent value in subscript" in the excellent R Inferno by Patrick Burns, and the section "Missing/out of bounds indices" in Hadley's book. <hr> *Source code for the <code>[</code> subset operator. A search for <code>R_MSG_subs_o_b</code> (which corresponds to error message "subscript out of bounds") provides no obvious clue why OOB <code>[</code> indexing of matrices and when using <code>[[</code> give an error, whereas OOB <code>[</code> indexing of "simple vectors" results in <code>NA</code>.

Why does 'out of bounds' indexing differ between a matrix and a data.frame?

Tags:

dataframe

data-structures

r

subset

I'm sure this is kind of basic, but I'd just like to really understand the logic of R data structures here.

If I subset a matrix by index out of bounds, I get exactly that error:

m <- matrix(data = c("foo", "bar"), nrow = 1)
m[2,]
# Error in m[2, ] : subscript out of bounds

If I do the same do a data frame, however, I get all NA rows:

df <- data.frame(foo = "foo", bar = "bar")
df[2,]
#    foo  bar
# NA <NA> <NA>

If I subset into a non-existent data frame column I get the familiar

df[, 3]
# Error in `[.data.frame`(df, , 3) : undefined columns selected

I know (roughly) that data frame rows are weird and to be treated carefully, but I don't quite see the connection to the above behavior.

Can someone explain why R behaves in this way for non-existent df rows?

Update

To be sure, giving NA on out-of-bounds subsets, is normal R behavior for 1D vectors:

vec <- c("foo", "bar")
vec[3]
# [1] NA

So in a way, the weird one out here is matrix subsetting, not dataframe subsetting, depending from where you're starting out. Still the different 2D subsetting behavior (m[2, ] vs df[2, ]) might strike a dense user (as I am right now) as inconsistent.

815

asked Nov 23 '18 14:11

maxheld

1 Answers

Can someone explain why R behaves in this way[?]

Short answer: No, probably not.

Longer answer: Once upon a time I was thinking about something similar and read this thread on R-devel: Definition of [[. Basically it boils down to:

The semantics of [ and [[ don't seem to be fully specified in the Reference manual. [...] I assume that these are features, not bugs, but I can't find documentation for them

Duncan Murdoch, a former member of the R core team gives a very nice reply:

There is more documentation in the man page for Extract, but I think it is incomplete. The most complete documentation is of course the source code*, but it may not answer the question of what's intentional and what's accidental

As mentioned in the R-devel thread, the only description in the manual is 3.4.1 Indexing by vectors:

If i is positive and exceeds length(x) then the corresponding selection is NA

But, this applies to "indexing of simple vectors". Similar out of bounds indexing for "non-simple" vectors does not seem to be described. Duncan Murdoch again:

So what is a simple vector? That is not explicitly defined, and it probably should be.

Thus, it may seem like no one knows the answer to your why question.

See also "8.2.13 nonexistent value in subscript" in the excellent R Inferno by Patrick Burns, and the section "Missing/out of bounds indices" in Hadley's book.

*Source code for the [ subset operator. A search for R_MSG_subs_o_b (which corresponds to error message "subscript out of bounds") provides no obvious clue why OOB [ indexing of matrices and when using [[ give an error, whereas OOB [ indexing of "simple vectors" results in NA.

133

answered Oct 21 '22 19:10

Henrik

Related questions
                            
                                force boxplots from geom_boxplot to constant width
                            
                                Pie Charts in ggsubplot (ggplot2)
                            
                                Access/use R console when running a shiny app
                            
                                How to solve this error message in rmarkdown?
                            
                                Using dplyr and broom to compute kmeans on a training and test set
                            
                                Memory Leak When Opening Data Frame With RDCOMClient
                            
                                Basis provided by Ns() in R Epi package
                            
                                How to plot interaction effects from extremely large data sets (esp. from rxGlm output)
                            
                                Sliding time intervals for time series data in R
                            
                                Remove "floating" axis labels in facet_wrap plot?
                            
                                Calculating the analogue of Euler angles/Tait-Bryan angles for dimensions >3
                            
                                R: Plotting predictions of MASS polr ordinal model
                            
                                Login issue with gconnect() in gtrendsR package
                            
                                Simulating Data Efficiently with data.table
                            
                                How to keep abreast of known bugs and bug fixes in R packages?
                            
                                Increasing the plot area in ggplot to cope with geom_text at plot edges
                            
                                How to unlock environment in R?
                            
                                How can I make vim indent dplyr code with the pipe (%>%) operator correctly?
                            
                                == and %in% differ based on character encoding?
                            
                                Dynamically display a dashboardPage

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With