I have the two following tables: <pre class="prettyprint"><code>df <- data.frame(eth = c("A","B","B","A","C"),ZIP1 = c(1,1,2,3,5)) Inc <- data.frame(ZIP2 = c(1,2,3,4,5,6,7),A = c(56,98,43,4,90,19,59), B = c(49,10,69,30,10,4,95),C = c(69,2,59,8,17,84,30)) eth ZIP1 ZIP2 A B C A 1 1 56 49 69 B 1 2 98 10 2 B 2 3 43 69 59 A 3 4 4 30 8 C 5 5 90 10 17 6 19 4 84 7 59 95 39 </code></pre> I would like to create a variable Inc in the df data frame where for each observation, the value is the intersection of the eth and ZIP of the observation. In my example, it would lead to: <pre class="prettyprint"><code> eth ZIP1 Inc A 1 56 B 1 49 B 2 10 A 3 43 C 5 17 </code></pre> A loop or quite brute force could solve it but it takes time on my dataset, I'm looking for a more subtle way maybe using data.table. It seems to me that it is a very standard question and I'm apologizing if it is, my unability to formulate a precise title for this problem (as you may have noticed..) is maybe why I haven't found any similar question in searching on the forum.. Thanks !

Sure, it can be done in data.table: <pre class="prettyprint"><code>library(data.table) setDT(df) df[ melt(Inc, id.var="ZIP2", variable.name="eth", value.name="Inc"), Inc := i.Inc , on=c(ZIP1 = "ZIP2","eth") ] </code></pre> The syntax for this "merge-assign" operation is <code>X[i, Xcol := expression, on=merge_cols]</code>. You can run the <code>i = melt(Inc, id.var="ZIP", variable.name="eth", value.name="Inc")</code> part on its own to see how it works. Inside the merge, columns from <code>i</code> can be referred to with <code>i.*</code> prefixes. <hr> Alternately... <pre class="prettyprint"><code>setDT(df) setDT(Inc) df[, Inc := Inc[.(ZIP1), eth, on="ZIP2", with=FALSE], by=eth] </code></pre> This is built on a similar idea. The package vignettes are a good place to start for this sort of syntax.

We can use <code>row/column</code> indexing <pre class="prettyprint"><code>df$Inc <- Inc[cbind(match(df$ZIP1, Inc$ZIP2), match(df$eth, colnames(Inc)))] df # eth ZIP1 Inc #1 A 1 56 #2 B 1 49 #3 B 2 10 #4 A 3 43 #5 C 5 17 </code></pre>

R - Create a new variable where each observation depends on another table and other variables in the data frame

Q: How do I create a new dataset from an existing dataset in R?

Create DataFrame From Existing using data. data. frame() method is used to create a DataFrame in R and also is used to create an empty DataFrame. Similarly, you can also use this to create a DataFrame by selecting subset columns and rows from an existing one.

Tags:

r

data.table

I have the two following tables:

df <- data.frame(eth = c("A","B","B","A","C"),ZIP1 = c(1,1,2,3,5))
Inc <- data.frame(ZIP2 = c(1,2,3,4,5,6,7),A = c(56,98,43,4,90,19,59), B = c(49,10,69,30,10,4,95),C = c(69,2,59,8,17,84,30))

eth    ZIP1         ZIP2    A    B    C
A      1            1      56   49   69
B      1            2      98   10   2
B      2            3      43   69   59
A      3            4      4    30   8
C      5            5      90   10   17
                    6      19   4    84
                    7      59   95   39

I would like to create a variable Inc in the df data frame where for each observation, the value is the intersection of the eth and ZIP of the observation. In my example, it would lead to:

   eth    ZIP1   Inc        
    A      1    56
    B      1    49
    B      2    10
    A      3    43
    C      5    17

A loop or quite brute force could solve it but it takes time on my dataset, I'm looking for a more subtle way maybe using data.table. It seems to me that it is a very standard question and I'm apologizing if it is, my unability to formulate a precise title for this problem (as you may have noticed..) is maybe why I haven't found any similar question in searching on the forum..

Thanks !

521

asked Nov 13 '15 23:11

Yurienu

2 Answers

Sure, it can be done in data.table:

library(data.table)
setDT(df)

df[ melt(Inc, id.var="ZIP2", variable.name="eth", value.name="Inc"), 
  Inc := i.Inc
, on=c(ZIP1 = "ZIP2","eth") ]

The syntax for this "merge-assign" operation is X[i, Xcol := expression, on=merge_cols].

You can run the i = melt(Inc, id.var="ZIP", variable.name="eth", value.name="Inc") part on its own to see how it works. Inside the merge, columns from i can be referred to with i.* prefixes.

Alternately...

setDT(df)
setDT(Inc)
df[, Inc := Inc[.(ZIP1), eth, on="ZIP2", with=FALSE], by=eth]

This is built on a similar idea. The package vignettes are a good place to start for this sort of syntax.

107

answered Sep 28 '22 04:09

Frank

We can use row/column indexing

df$Inc <- Inc[cbind(match(df$ZIP1, Inc$ZIP2), match(df$eth, colnames(Inc)))]

df
#  eth ZIP1 Inc
#1   A    1  56
#2   B    1  49
#3   B    2  10
#4   A    3  43
#5   C    5  17

answered Sep 28 '22 02:09

akrun

Related questions
                            
                                RODBC fails: "invalid character value for cast specification" - Excel 2007
                            
                                How to annotate across or between plots in multi-plot panels in R
                            
                                List and description of all packages in CRAN from within R
                            
                                drop = TRUE doesn't drop factor levels in data.frame while in vector it does
                            
                                Escaping backslash (\) in string or paths in R
                            
                                adding percentile lines to a density plot [duplicate]
                            
                                Use max on each element of a matrix
                            
                                R nls singular gradient
                            
                                R string removes punctuation on split
                            
                                Row product of matrix and column sum of matrix
                            
                                R load script objects to workspace
                            
                                Producing an animated comet plot in R
                            
                                Ordering Permutation in Rcpp i.e. base::order()
                            
                                Print r vector to copy paste into other code. [duplicate]
                            
                                Binning data in R
                            
                                What does mfrow & mfcol stand for in par()?
                            
                                How to create mean and s.d. columns in data.table
                            
                                Create frequency tables for multiple factor columns in R
                            
                                R Installing rCharts on R 3.4.2 x64
                            
                                Check if a string contains at least one numeric character in R [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With