It seems that merge
causes columns of a data frame to lose their attributes:
attr(mtcars$mpg, "units") <- "miles.per.gallon"
new.df <- data.frame(gear=3:5, my.opinion=c("not enough", "just right", "too many"))
merged.df <- merge(new.df, mtcars)
attr(merged.df$mpg, "units")
returns NULL
.
Is there a way to get merge
to preserve attributes of columns?
(A workaround would be to query the attributes of each column of each data frame before the merge, and then to re-assign them after the merge. However that seems inelegant.)
If you don't mind using dplyr, this one seems to work.
Your data:
attr(mtcars$mpg, "units") <- "miles.per.gallon"
new.df <- data.frame(gear=3:5, my.opinion=c("not enough", "just right", "too many"))
> attr(mtcars$mpg, "units")
[1] "miles.per.gallon"
Function inner_join from dplyr.
inner.df<-dplyr::inner_join(new.df, mtcars,"gear")
The resulting data frame is as follows:
> inner.df
gear my.opinion mpg cyl disp hp drat wt qsec vs am carb
1 3 not enough 21.4 6 258.0 110 3.08 3.215 19.44 1 0 1
2 3 not enough 18.7 8 360.0 175 3.15 3.440 17.02 0 0 2
3 3 not enough 18.1 6 225.0 105 2.76 3.460 20.22 1 0 1
4 3 not enough 14.3 8 360.0 245 3.21 3.570 15.84 0 0 4
5 3 not enough 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3
6 3 not enough 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3
7 3 not enough 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3
8 3 not enough 10.4 8 472.0 205 2.93 5.250 17.98 0 0 4
9 3 not enough 10.4 8 460.0 215 3.00 5.424 17.82 0 0 4
10 3 not enough 14.7 8 440.0 230 3.23 5.345 17.42 0 0 4
11 3 not enough 21.5 4 120.1 97 3.70 2.465 20.01 1 0 1
12 3 not enough 15.5 8 318.0 150 2.76 3.520 16.87 0 0 2
13 3 not enough 15.2 8 304.0 150 3.15 3.435 17.30 0 0 2
14 3 not enough 13.3 8 350.0 245 3.73 3.840 15.41 0 0 4
15 3 not enough 19.2 8 400.0 175 3.08 3.845 17.05 0 0 2
16 4 just right 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4
17 4 just right 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4
18 4 just right 22.8 4 108.0 93 3.85 2.320 18.61 1 1 1
19 4 just right 24.4 4 146.7 62 3.69 3.190 20.00 1 0 2
20 4 just right 22.8 4 140.8 95 3.92 3.150 22.90 1 0 2
21 4 just right 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4
22 4 just right 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4
23 4 just right 32.4 4 78.7 66 4.08 2.200 19.47 1 1 1
24 4 just right 30.4 4 75.7 52 4.93 1.615 18.52 1 1 2
25 4 just right 33.9 4 71.1 65 4.22 1.835 19.90 1 1 1
26 4 just right 27.3 4 79.0 66 4.08 1.935 18.90 1 1 1
27 4 just right 21.4 4 121.0 109 4.11 2.780 18.60 1 1 2
28 5 too many 26.0 4 120.3 91 4.43 2.140 16.70 0 1 2
29 5 too many 30.4 4 95.1 113 3.77 1.513 16.90 1 1 2
30 5 too many 15.8 8 351.0 264 4.22 3.170 14.50 0 1 4
31 5 too many 19.7 6 145.0 175 3.62 2.770 15.50 0 1 6
32 5 too many 15.0 8 301.0 335 3.54 3.570 14.60 0 1 8
Where the attribute is mantained:
> attr(inner.df$mpg, "units")
[1] "miles.per.gallon"
There's also data.table
library(data.table)
dt1 = as.data.table(mtcars)
dt2 = as.data.table(new.df)
inner.dt <- dt1[dt2, on = "gear"]
attr(inner.dt$mpg, "units")
...
> attr(inner.dt$mpg, "units")
[1] "miles.per.gallon"
but...
library(microbenchmark)
microbenchmark(dplyr::inner_join(new.df, mtcars,"gear"),
dt1[dt2, on = "gear"])
...
> microbenchmark(dplyr::inner_join(new.df, mtcars,"gear"),
+ dt1[dt2, on = "gear"])
Unit: microseconds
expr min lq mean median uq max neval
dplyr 544.877 568.5840 625.6442 606.319 658.870 1005.197 100
data.table 860.186 892.1915 961.2788 938.618 979.711 1510.166 100
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With