Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select subset of rows of dataframe using multiple conditions

I would like to select a subset of a dataframe that satisfies multiple conditions on multiple rows. I know I could this sequentially -- first selecting the subset that matches the first condition, then the portion of those that match the second, etc, but it seems like it should be able to be done in a single step. The following seems like it should work, but doesn't. Apparently it does work like this in other languages' implementations of DataFrame. Any thoughts?

using DataFrames
df = DataFrame()
df[:A]=[ 1, 3, 4, 7, 9]
df[:B]=[ "a", "c", "c", "D", "c"]
df[(df[:A].<5)&&(df[:B].=="c"),:] 

type: non-boolean (DataArray{Bool,1}) used in boolean context
while loading In[18], in expression starting on line 5
like image 785
ARM Avatar asked Apr 02 '15 19:04

ARM


People also ask

How do you select rows from a DataFrame based on multiple conditions?

To select the rows based on mutiple condition we can use the & operator.In this example we have passed mutiple conditon using this code dfobj. loc[(dobj['Name'] == 'Rack') & (dobj['Marks'] == 100)]. This code will return a subset of dataframe rows where name='Rack' and marks =100.

How do you put multiple conditions in a DataFrame?

Using Loc to Filter With Multiple Conditions The loc function in pandas can be used to access groups of rows or columns by label. Add each condition you want to be included in the filtered result and concatenate them with the & operator. You'll see our code sample will return a pd. dataframe of our filtered rows.


2 Answers

This is a Julia thing, not so much a DataFrame thing: you want & instead of &&. For example:

julia> [true, true] && [false, true]
ERROR: TypeError: non-boolean (Array{Bool,1}) used in boolean context

julia> [true, true] & [false, true]
2-element Array{Bool,1}:
 false
  true

julia> df[(df[:A].<5)&(df[:B].=="c"),:]
2x2 DataFrames.DataFrame
| Row | A | B   |
|-----|---|-----|
| 1   | 3 | "c" |
| 2   | 4 | "c" |

FWIW, this works the same way in pandas in Python:

>>> df[(df.A < 5) & (df.B == "c")]
   A  B
1  3  c
2  4  c
like image 68
DSM Avatar answered Sep 25 '22 22:09

DSM


I have the same now as https://stackoverflow.com/users/5526072/jwimberley , occurring on my update to julia 0.6 from 0.5, and now using dataframes v 0.10.1.

Update: I made the following change to fix:

r[(r[:l] .== l) & (r[:w] .== w), :] # julia 0.5

r[.&(r[:l] .== l, r[:w] .== w), :] # julia 0.6

but this gets very slow with long chains (time taken \propto 2^chains) so maybe Query is the better way now:

# r is a dataframe
using Query
q1 = @from i in r begin
    @where i.l == l && i.w == w && i.nl == nl && i.lt == lt && 
    i.vz == vz && i.vw == vw && i.vδ == vδ && 
    i.ζx == ζx && i.ζy == ζy && i.ζδx == ζδx
    @select {absu=i.absu, i.dBU}
    @collect DataFrame
end

for example. This is fast. It's in the DataFrames documentation.

like image 38
kilgore trout Avatar answered Sep 24 '22 22:09

kilgore trout