I want to extract the 4th, 5th, and 6th column from a data table named dt the following method works: <pre class="prettyprint"><code> dt[, c(4,5,6)] </code></pre> but the following doesn't: <pre class="prettyprint"><code> a = c(4,5,6) dt[, a] </code></pre> In fact, the second method gives me a reult of: <pre class="prettyprint"><code> 4 5 6 </code></pre> Can someone tell me why this is happening? The two method looks equivalent to me.

We can use double dots (<code>..</code>) before the object 'a' to extract the columns <pre class="prettyprint"><code>dt[, ..a] # col4 col5 col6 #1: 4 5 6 #2: 5 6 7 #3: 6 7 8 #4: 7 8 9 </code></pre> Or another option is <code>with = FALSE</code> <pre class="prettyprint"><code>dt[, a, with = FALSE] </code></pre> <h3>data</h3> <pre class="prettyprint"><code>dt <- data.table(col1 = 1:4, col2 = 2:5, col3 = 3:6, col4 = 4:7, col5 = 5:8, col6 = 6:9) </code></pre>

@akrun's answer gives you the correct alternative. If you want to know why you need it, here's the more detailed explanation: The way the data.table subset operation works, in most cases the <code>j</code> expression in <code>dt[i, j, by]</code> with no <code>i</code> or <code>by</code>, is evaluated in the frame of the data table, and returned as is, whether or not it has anything to do with the data table outside the brackets. In versions earlier than 1.9.8, your first code snippet: <code>dt[,c(4, 5, 6)]</code> evaluates to the numeric vector <code>c(4, 5, 6)</code>, not the 4th, 5th, and 6th columns. This changed as of data.table v1.9.8 (released November 2016) ( scroll down to v.1.9.8 potentially breaking changes), because people, unsurprisingly, expected <code>dt[,c(4, 5, 6)]</code> to give the 4th 5th and 6th columns. Now, if the j expression is the variable names or numbers, <code>with</code> is automatically set to <code>FALSE</code>. This effectively produces behavior similar to subsetting a data frame (not exactly the same, but similar). So your second code snippet (where <code>dt[, a]</code> evaluates to <code>a</code>, rather than uses <code>a</code> to subset the columns) is actually the default, and the first is a special case. To illustrate the odd but standard behavior here, try: <pre class="prettyprint"><code>dt[, diag(5)] # [,1] [,2] [,3] [,4] [,5] # [1,] 1 0 0 0 0 # [2,] 0 1 0 0 0 # [3,] 0 0 1 0 0 # [4,] 0 0 0 1 0 # [5,] 0 0 0 0 1 </code></pre> No matter what your <code>dt</code> is, so long as it is a data.table, it will evaluate to the 5*5 identity matrix

Extract columns from data table by numeric indices stored in a vector

Tags:

r

data.table

indices

I want to extract the 4th, 5th, and 6th column from a data table named dt

the following method works:

    dt[, c(4,5,6)]

but the following doesn't:

    a = c(4,5,6)
    dt[, a]

In fact, the second method gives me a reult of:

    4 5 6

Can someone tell me why this is happening? The two method looks equivalent to me.

582

asked Mar 12 '18 02:03

Amazonian

2 Answers

We can use double dots (..) before the object 'a' to extract the columns

dt[, ..a]
#   col4 col5 col6
#1:    4    5    6
#2:    5    6    7
#3:    6    7    8
#4:    7    8    9

Or another option is with = FALSE

dt[, a, with = FALSE]

data

dt <- data.table(col1 = 1:4, col2 = 2:5, col3 = 3:6, col4 = 4:7, col5 = 5:8, col6 = 6:9)

answered Nov 09 '22 23:11

akrun

@akrun's answer gives you the correct alternative. If you want to know why you need it, here's the more detailed explanation:

The way the data.table subset operation works, in most cases the j expression in dt[i, j, by] with no i or by, is evaluated in the frame of the data table, and returned as is, whether or not it has anything to do with the data table outside the brackets. In versions earlier than 1.9.8, your first code snippet: dt[,c(4, 5, 6)] evaluates to the numeric vector c(4, 5, 6), not the 4th, 5th, and 6th columns. This changed as of data.table v1.9.8 (released November 2016) ( scroll down to v.1.9.8 potentially breaking changes), because people, unsurprisingly, expected dt[,c(4, 5, 6)] to give the 4th 5th and 6th columns. Now, if the j expression is the variable names or numbers, with is automatically set to FALSE. This effectively produces behavior similar to subsetting a data frame (not exactly the same, but similar).

So your second code snippet (where dt[, a] evaluates to a, rather than uses a to subset the columns) is actually the default, and the first is a special case.

To illustrate the odd but standard behavior here, try:

dt[, diag(5)]
#      [,1] [,2] [,3] [,4] [,5]
# [1,]    1    0    0    0    0
# [2,]    0    1    0    0    0
# [3,]    0    0    1    0    0
# [4,]    0    0    0    1    0
# [5,]    0    0    0    0    1

No matter what your dt is, so long as it is a data.table, it will evaluate to the 5*5 identity matrix

answered Nov 10 '22 00:11

De Novo

Related questions
                            
                                Merging all column by reference in a data.table
                            
                                How to use purrr for extracting elements from a list?
                            
                                Change background color of selectInput in R Shiny
                            
                                How do I use color in a geom_dotplot?
                            
                                Using mutate_at() with negated select helpers e.g(not one_of())
                            
                                ggplot2 - adding secondary y-axis with different breaks and labels
                            
                                mc.cores > 1 is not support on windows
                            
                                Rename columns using `starts_with()` where new prefix is a string
                            
                                dplyr: deselecting columns given by
                            
                                Convert number of days since Jan 1 2000 into date format
                            
                                reshape/melt an asymmetric matrix according to a rowKey
                            
                                is.atomic() vs is.vector()
                            
                                dplyr::select_if can use colnames and their values at the same time?
                            
                                Replace NA in all columns of a dplyr chain
                            
                                Get column names with zero variance using dplyr
                            
                                Extract city names from large text with R
                            
                                Extract portion of string startswith 4 digit number and ends with period
                            
                                Extract first sentence in string
                            
                                How to convert list of -sf dataframes into single dataframe with geometry per row in R?
                            
                                Getting Stargazer Column labels to print on two or three lines?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With