Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Maintain attributes of data frame columns after merge

Tags:

merge

r

It seems that merge causes columns of a data frame to lose their attributes:

attr(mtcars$mpg, "units") <- "miles.per.gallon"
new.df <- data.frame(gear=3:5, my.opinion=c("not enough", "just right", "too many"))
merged.df <- merge(new.df, mtcars)

attr(merged.df$mpg, "units") returns NULL.

Is there a way to get merge to preserve attributes of columns?

(A workaround would be to query the attributes of each column of each data frame before the merge, and then to re-assign them after the merge. However that seems inelegant.)

like image 876
Drew Steen Avatar asked Nov 30 '13 23:11

Drew Steen


2 Answers

If you don't mind using dplyr, this one seems to work.

Your data:

attr(mtcars$mpg, "units") <- "miles.per.gallon"
new.df <- data.frame(gear=3:5, my.opinion=c("not enough", "just right", "too many"))

> attr(mtcars$mpg, "units")
[1] "miles.per.gallon"

Function inner_join from dplyr.

inner.df<-dplyr::inner_join(new.df, mtcars,"gear")

The resulting data frame is as follows:

> inner.df
    gear my.opinion  mpg cyl  disp  hp drat    wt  qsec vs am carb
1     3 not enough 21.4   6 258.0 110 3.08 3.215 19.44  1  0    1
2     3 not enough 18.7   8 360.0 175 3.15 3.440 17.02  0  0    2
3     3 not enough 18.1   6 225.0 105 2.76 3.460 20.22  1  0    1
4     3 not enough 14.3   8 360.0 245 3.21 3.570 15.84  0  0    4
5     3 not enough 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3
6     3 not enough 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3
7     3 not enough 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3
8     3 not enough 10.4   8 472.0 205 2.93 5.250 17.98  0  0    4
9     3 not enough 10.4   8 460.0 215 3.00 5.424 17.82  0  0    4
10    3 not enough 14.7   8 440.0 230 3.23 5.345 17.42  0  0    4
11    3 not enough 21.5   4 120.1  97 3.70 2.465 20.01  1  0    1
12    3 not enough 15.5   8 318.0 150 2.76 3.520 16.87  0  0    2
13    3 not enough 15.2   8 304.0 150 3.15 3.435 17.30  0  0    2
14    3 not enough 13.3   8 350.0 245 3.73 3.840 15.41  0  0    4
15    3 not enough 19.2   8 400.0 175 3.08 3.845 17.05  0  0    2
16    4 just right 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4
17    4 just right 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4
18    4 just right 22.8   4 108.0  93 3.85 2.320 18.61  1  1    1
19    4 just right 24.4   4 146.7  62 3.69 3.190 20.00  1  0    2
20    4 just right 22.8   4 140.8  95 3.92 3.150 22.90  1  0    2
21    4 just right 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4
22    4 just right 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4
23    4 just right 32.4   4  78.7  66 4.08 2.200 19.47  1  1    1
24    4 just right 30.4   4  75.7  52 4.93 1.615 18.52  1  1    2
25    4 just right 33.9   4  71.1  65 4.22 1.835 19.90  1  1    1
26    4 just right 27.3   4  79.0  66 4.08 1.935 18.90  1  1    1
27    4 just right 21.4   4 121.0 109 4.11 2.780 18.60  1  1    2
28    5   too many 26.0   4 120.3  91 4.43 2.140 16.70  0  1    2
29    5   too many 30.4   4  95.1 113 3.77 1.513 16.90  1  1    2
30    5   too many 15.8   8 351.0 264 4.22 3.170 14.50  0  1    4
31    5   too many 19.7   6 145.0 175 3.62 2.770 15.50  0  1    6
32    5   too many 15.0   8 301.0 335 3.54 3.570 14.60  0  1    8

Where the attribute is mantained:

> attr(inner.df$mpg, "units")
[1] "miles.per.gallon"
like image 164
storm surge Avatar answered Oct 24 '22 07:10

storm surge


There's also data.table

library(data.table)

dt1 = as.data.table(mtcars)
dt2 = as.data.table(new.df)

inner.dt <- dt1[dt2, on = "gear"]

attr(inner.dt$mpg, "units")

...

> attr(inner.dt$mpg, "units")
[1] "miles.per.gallon"

but...

library(microbenchmark)
microbenchmark(dplyr::inner_join(new.df, mtcars,"gear"),
               dt1[dt2, on = "gear"])

...

> microbenchmark(dplyr::inner_join(new.df, mtcars,"gear"),
+                    dt1[dt2, on = "gear"])
Unit: microseconds
             expr     min       lq     mean  median      uq      max neval
 dplyr            544.877 568.5840 625.6442 606.319 658.870 1005.197   100
 data.table       860.186 892.1915 961.2788 938.618 979.711 1510.166   100
like image 24
SCDCE Avatar answered Oct 24 '22 08:10

SCDCE