Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

py-datatable 'in' operator?

I am unable to perform a standard in operation with a pre-defined list of items. I am looking to do something like this:

# Construct a simple example frame
from datatable import *
df = Frame(V1=['A','B','C','D'], V2=[1,2,3,4])

# Filter frame to a list of items (THIS DOES NOT WORK)
items = ['A','B']
df[f.V1 in items,:]

This example results in the error:

TypeError: A boolean value cannot be used as a row selector

Unfortunately, there doesn't appear to be a built-in object for in operations. I would like to use something like the %in% operator that is native to the R language. Is there any method for accomplishing this in python?

I can take this approach with the use of multiple 'equals' operators, but this is inconvenient when you want to consider a large number of items:

df[(f.V1 == 'A') | (f.V1 == 'B'),:]

datatable 0.10.1
python 3.6

like image 279
Dale Kube Avatar asked Jun 14 '20 22:06

Dale Kube


2 Answers

You could also try this out:

First import all the necessary packages as,

import datatable as dt
from datatable import by,f,count
import functools
import operator

Create a sample datatable:

DT = dt.Frame(V1=['A','B','C','D','E','B','A'], V2=[1,2,3,4,5,6,7])

Make a list of values to be filtered among the observations, in your case it is

sel_obs = ['A','B']

Now create a filter expression using funtools and operators modules,

filter_rows = functools.reduce(operator.or_,(f.V1==obs for obs in sel_obs))

Finally apply the above created filter on datatable

DT[fil_rows,:]

its output as-

Out[6]: 
   | V1  V2
-- + --  --
 0 | A    1
 1 | B    2
 2 | B    6
 3 | A    7

[4 rows x 2 columns]

You can just play around with operators to do different type of filterings.

@sammyweemy's solution should also work.

like image 149
myamulla_ciencia Avatar answered Oct 14 '22 06:10

myamulla_ciencia


It turns out that when you pass a list of expressions to python datatable, it will evaluate them as or.

So you can just do:

import datatable
df = datatable.Frame(V1=['A','B','C','D'], V2=[1,2,3,4])

items = ['A','B']
df[[datatable.f.V1 == i for i in items],:]

Note that there are some considerations for this: it's not described in the docs and I absolutely don't know if it will always work. Moreover, it also work only to filter one column - if you would try to filter rows where V1==A or V2==1 the approach with list would create duplicates.

If you would need to do some fancier filtering you can just adjust the filter expression inside the list, such as:

df[([(datatable.f.V1 == i) & (datatable.f.V2 >= 2) for i in items]),:]

Which will return just the second row from the example, as expected.

like image 32
ira Avatar answered Oct 14 '22 05:10

ira