Here is a short data.table:
DT <- data.table(Tag1 = c(22,253,6219,6219,252862,252864,312786,312812),
Tag2 = c(22,255,6220,252857,252863,252865,251191,252863),
Date= as.Date(as.character(c("7/25/2008","6/15/2000","6/30/2000","9/6/2002","9/6/2002","9/6/2002","9/3/2003","9/5/2003")),format = "%m/%d/%Y"))
DT
Tag1 Tag2 Date
1: 22 22 2008-07-25
2: 253 255 2000-06-15
3: 6219 6220 2000-06-30
4: 6219 252857 2002-09-06
5: 252862 252863 2002-09-06
6: 252864 252865 2002-09-06
7: 312786 251191 2003-09-03
8: 312812 252863 2003-09-05
I would like to sort the data.table in ascending order by 3 columns: Tag1, Tag2 and Date. I tested:
> test <- DT[order(Tag1, Tag2, Date)]
> test
Tag1 Tag2 Date
1: 22 22 2008-07-25
2: 253 255 2000-06-15
3: 6219 6220 2000-06-30
4: 6219 252857 2002-09-06
5: 252862 252863 2002-09-06
6: 252864 252865 2002-09-06
7: 312786 251191 2003-09-03
8: 312812 252863 2003-09-05
However, I would like to sort the data.table as follows:
> test
Tag1 Tag2 Date
1: 22 22 2008-07-25
2: 253 255 2000-06-15
3: 6219 6220 2000-06-30
4: 6219 252857 2002-09-06
5: 252862 252863 2002-09-06
6: 312812 252863 2003-09-05
7: 252864 252865 2002-09-06
8: 312786 251191 2003-09-03
In particular, the duplicated values for Tag1 or Tag1 should be disposed one below the other (for example: 6219 for Tag1 and 252863 for Tag2). How can I do this ?
EDIT:
The proposed solutions work for a short data.table (like the data.table above). Here is a longer version:
DT <- data.table(Tag1 = c(252860, 252862, 312812, 252864, 252866, 252868, 252870, 318880, 252872, 252874, 252876, 252878, 252880, 252880, 252881, 252883,
252885, 252887, 311264, 252889, 252889, 252892, 318879, 318880, 318881), Tag2 = c(252861, 252863, 252863, 252865, 252867, 252869, 252871, 252871, 252873,
252875, 252877, 252879, 414611, 905593, 252882, 252884, 252886, 252888, 252888, 252890, 318904, 252893, 318878, 414547, 318882), Date = c("9/6/2002",
"9/6/2002", "9/5/2003", "9/6/2002", "9/6/2002", "9/6/2002", "9/6/2002", "10/8/2003", "9/6/2002", "9/6/2002", "9/6/2002", "9/6/2002", "10/5/2004",
"9/6/2002", "9/6/2002", "9/6/2002", "9/10/2002", "9/10/2002", "7/15/2003", "9/10/2002", "10/15/2003", "9/10/2002", "10/8/2003", "9/29/2004","10/8/2003"))
Here is the expected result (i.e., data.table "After"). In particular, the data.table "After" should respect two conditions:
1) the rows are sorted by date in ascending order
2) the duplicated values for Tag1 or Tag1 are disposed one below the other (eventually not necessary in ascending order)
All duplicated values for Tag1 and Tag2 are in yellow.
Old Order
df[order(Tag1, Tag2, Date)]
# Tag1 Tag2 Date
# 1: 22 22 2008-07-25
# 2: 253 255 2000-06-15
# 3: 6219 6220 2000-06-30
# 4: 6219 252857 2002-09-06
# 5: 252862 252863 2002-09-06
# 6: 252864 252865 2002-09-06
# 7: 312786 251191 2003-09-03
# 8: 312812 252863 2003-09-05
New Order
Sort Date
column in descending order, then sort Tag1
in ascending order grouped by Tag2
.
setcolorder(dt1 <- df[order(-Date)][order(Tag1), .SD, by = Tag2], colnames(df))
dt1
# Tag1 Tag2 Date
# 1: 22 22 2008-07-25
# 2: 253 255 2000-06-15
# 3: 6219 252857 2002-09-06
# 4: 6219 6220 2000-06-30
# 5: 252862 252863 2002-09-06
# 6: 312812 252863 2003-09-05
# 7: 252864 252865 2002-09-06
# 8: 312786 251191 2003-09-03
Solution of @akrun in the comments disturbs the structure of the data. Here is the comparison. Look at #4: 6219 should have 252857 instead of 251191
df[,lapply(df, sort)]
# Tag1 Tag2 Date
# 1: 22 22 2000-06-15
# 2: 253 255 2000-06-30
# 3: 6219 6220 2002-09-06
# 4: 6219 251191 2002-09-06
# 5: 252862 252857 2002-09-06
# 6: 252864 252863 2003-09-03
# 7: 312786 252863 2003-09-05
# 8: 312812 252865 2008-07-25
The data in the example are already sorted so here is a shortened version in unsorted order to illustrate changing the sort order.
> library(data.table)
> DT <- data.table(Tag1 = c(22,253,22,22),
Tag2 = c(1,255,2,2),
Date = as.Date(as.character(c(
"1/1/2010","4/4/2000","3/3/2003","2/2/2000")), format = "%m/%d/%Y"))
> DT
Tag1 Tag2 Date
1: 22 1 2010-01-01
2: 253 255 2000-04-04
3: 22 2 2003-03-03
4: 22 2 2000-02-02
Use order
to create a new data.table in a particular sorted order. Order function takes a single field or multiple fields to sort on in the specified order.
# sorts first by Tag1 then by Tag2 and finally by Date (in ascending order)
> DT2 <- DT[order(Tag1, Tag2, Date)]
> DT2
Tag1 Tag2 Date
1: 22 1 2010-01-01
2: 22 2 2000-02-02
3: 22 2 2003-03-03
4: 253 255 2000-04-04
To re-order the data.table in place use setorder
function.
setorder(DT, Tag1, Tag2, Date)
If want to sort in descending order then use -
prefix on the field names.
> DT[order(Tag1, Tag2, -Date)]
> setorder(DT, Tag1, Tag2, -Date)
In an update to the question, OP indicated that the previous answers (including the accepted answer) do not work for a longer data set. Using the new data after the question edit, we can sort like this:
The trick is to start with the table ordered by date then tags, in ascending order (the initial data in OP are already in this state, but for generality I start my ensuring the data are ordered by setkey(DT, asDate, Tag1, Tag2)
). Then number the distinct values of tag2, in the order they appear and sort by these group numbers (the next two lines). This will make sure that identical values of tag2 follow each other, without otherwise disturbing the order of the table. Next, do the same thing for tag1.
DT[, asDate := as.Date(Date, format = "%m/%d/%Y")]
setkey(DT, asDate, Tag1, Tag2)
DT[, g2 := .GRP, Tag2]
setkey(DT, g2)
DT[, g1 := .GRP, Tag1]
setkey(DT, g1)
DT[, c("g1", "g2", "asDate") := NULL][]
Tag1 Tag2 Date
1: 252860 252861 9/6/2002
2: 252862 252863 9/6/2002
3: 312812 252863 9/5/2003
4: 252864 252865 9/6/2002
5: 252866 252867 9/6/2002
6: 252868 252869 9/6/2002
7: 252870 252871 9/6/2002
8: 318880 252871 10/8/2003
9: 318880 414547 9/29/2004
10: 252872 252873 9/6/2002
11: 252874 252875 9/6/2002
12: 252876 252877 9/6/2002
13: 252878 252879 9/6/2002
14: 252880 905593 9/6/2002
15: 252880 414611 10/5/2004
16: 252881 252882 9/6/2002
17: 252883 252884 9/6/2002
18: 252885 252886 9/10/2002
19: 252887 252888 9/10/2002
20: 311264 252888 7/15/2003
21: 252889 252890 9/10/2002
22: 252889 318904 10/15/2003
23: 252892 252893 9/10/2002
24: 318879 318878 10/8/2003
25: 318881 318882 10/8/2003
Tag1 Tag2 Date
Notes
This works perfectly on the sample data. However, a word of caution. There is no guarantee that a solution even exists for all possible data (either by this method or by any method), particularly if repeated tags are a large fraction of all the tags. Consider, for example, the following columns in which it is not possible to order the rows such that the same letters always appear consecutively in both columns simultaneously:
a b
a c
b a
b c
c a
c b
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With