I am trying to exclude some rows from a datatable based on, let's say, days and month - excluding for example summer holidays, that always begin for example 15th of June and end the 15th of next month. I can extract those days based on Date, but as as.Date function is awfully slow to operate with, I have separate integer columns for Month and Day and I want to do it using only them.
It is easy to select the given entries by
DT[Month==6][Day>=15]
DT[Month==7][Day<=15]
Is there any way how to make "difference" of the two data.tables
(the original ones and the ones I selected). (Why not subset? Maybe I am missing something simple, but I don't want to exclude days like 10/6, 31/7.)
I am aware of a way to do it with join, but only day by day
setkey(DT, Month, Day)
DT[-DT[J(Month,Day), which= TRUE]]
Can anyone help how to solve it in more general way?
Because the data table values are in an array, you cannot edit or clear individual cells. If you try to change one cell, you will see an error message - "Cannot change part of a data table." If you want to remove the entire table, or the resulting values, follow the steps below.
Right-click in a table cell, row, or column you want to delete. On the menu, click Delete Cells. To delete one cell, choose Shift cells left or Shift cells up. To delete the row, click Delete entire row.
Using the "Excel" actions, write the data table to a Microsoft Excel worksheet ("Write to Excel Worksheet" action). Use the "Delete Row/Column from Excel Worksheet" action to delete the desired column.
If your Excel worksheet has data in a table format and you no longer want the data and its formatting, here's how you can remove the entire table. Select all the cells in the table, click Clear and pick Clear All. Tip: You can also select the table and press Delete.
Great question. I've edited the question title to match the question.
A simple approach avoiding as.Date
which reads nicely :
DT[!(Month*100L+Day) %between% c(0615L,0715L)]
That's probably fast enough in many cases. If you have a lot of different ranges, then you may want to step up a gear :
DT[,mmdd:=Month*100L+Day]
from = DT[J(0615),mult="first",which=TRUE]
to = DT[J(0715),mult="first",which=TRUE]
DT[-(from:to)]
That's a bit long and error prone because it's DIY. So one idea is that a list
column in an i
table would represent a range query (FR#203, like a binary search %between%
). Then a not-join (also not yet implemented, FR#1384) could be combined with the list column range query to do exactly what you asked :
setkey(DT,mmdd)
DT[-J(list(0615,0715))]
That would extend to multiple different ranges, or the same range for many different ids, in the usual way; i.e., more rows added to i
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With