Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove a range in data.table

Tags:

r

data.table

I am trying to exclude some rows from a datatable based on, let's say, days and month - excluding for example summer holidays, that always begin for example 15th of June and end the 15th of next month. I can extract those days based on Date, but as as.Date function is awfully slow to operate with, I have separate integer columns for Month and Day and I want to do it using only them.

It is easy to select the given entries by

DT[Month==6][Day>=15]
DT[Month==7][Day<=15]

Is there any way how to make "difference" of the two data.tables (the original ones and the ones I selected). (Why not subset? Maybe I am missing something simple, but I don't want to exclude days like 10/6, 31/7.)

I am aware of a way to do it with join, but only day by day

setkey(DT, Month, Day)
DT[-DT[J(Month,Day), which= TRUE]]

Can anyone help how to solve it in more general way?

like image 781
krhlk Avatar asked Oct 22 '12 18:10

krhlk


People also ask

Can you edit a data table in Excel?

Because the data table values are in an array, you cannot edit or clear individual cells. If you try to change one cell, you will see an error message - "Cannot change part of a data table." If you want to remove the entire table, or the resulting values, follow the steps below.

Is it possible to delete a single cell in output range of data table?

Right-click in a table cell, row, or column you want to delete. On the menu, click Delete Cells. To delete one cell, choose Shift cells left or Shift cells up. To delete the row, click Delete entire row.

How do I remove a column from a data table?

Using the "Excel" actions, write the data table to a Microsoft Excel worksheet ("Write to Excel Worksheet" action). Use the "Delete Row/Column from Excel Worksheet" action to delete the desired column.

How do you delete a data table in Excel?

If your Excel worksheet has data in a table format and you no longer want the data and its formatting, here's how you can remove the entire table. Select all the cells in the table, click Clear and pick Clear All. Tip: You can also select the table and press Delete.


1 Answers

Great question. I've edited the question title to match the question.

A simple approach avoiding as.Date which reads nicely :

DT[!(Month*100L+Day) %between% c(0615L,0715L)]

That's probably fast enough in many cases. If you have a lot of different ranges, then you may want to step up a gear :

DT[,mmdd:=Month*100L+Day]
from = DT[J(0615),mult="first",which=TRUE]
to = DT[J(0715),mult="first",which=TRUE]
DT[-(from:to)]

That's a bit long and error prone because it's DIY. So one idea is that a list column in an i table would represent a range query (FR#203, like a binary search %between%). Then a not-join (also not yet implemented, FR#1384) could be combined with the list column range query to do exactly what you asked :

setkey(DT,mmdd)
DT[-J(list(0615,0715))]

That would extend to multiple different ranges, or the same range for many different ids, in the usual way; i.e., more rows added to i.

like image 133
Matt Dowle Avatar answered Sep 21 '22 15:09

Matt Dowle