I would like to select a subset of a dataframe that satisfies multiple conditions on multiple rows. I know I could this sequentially -- first selecting the subset that matches the first condition, then the portion of those that match the second, etc, but it seems like it should be able to be done in a single step. The following seems like it should work, but doesn't. Apparently it does work like this in other languages' implementations of DataFrame. Any thoughts?
using DataFrames
df = DataFrame()
df[:A]=[ 1, 3, 4, 7, 9]
df[:B]=[ "a", "c", "c", "D", "c"]
df[(df[:A].<5)&&(df[:B].=="c"),:]
type: non-boolean (DataArray{Bool,1}) used in boolean context
while loading In[18], in expression starting on line 5
To select the rows based on mutiple condition we can use the & operator.In this example we have passed mutiple conditon using this code dfobj. loc[(dobj['Name'] == 'Rack') & (dobj['Marks'] == 100)]. This code will return a subset of dataframe rows where name='Rack' and marks =100.
Using Loc to Filter With Multiple Conditions The loc function in pandas can be used to access groups of rows or columns by label. Add each condition you want to be included in the filtered result and concatenate them with the & operator. You'll see our code sample will return a pd. dataframe of our filtered rows.
This is a Julia thing, not so much a DataFrame thing: you want &
instead of &&
. For example:
julia> [true, true] && [false, true]
ERROR: TypeError: non-boolean (Array{Bool,1}) used in boolean context
julia> [true, true] & [false, true]
2-element Array{Bool,1}:
false
true
julia> df[(df[:A].<5)&(df[:B].=="c"),:]
2x2 DataFrames.DataFrame
| Row | A | B |
|-----|---|-----|
| 1 | 3 | "c" |
| 2 | 4 | "c" |
FWIW, this works the same way in pandas in Python:
>>> df[(df.A < 5) & (df.B == "c")]
A B
1 3 c
2 4 c
I have the same now as https://stackoverflow.com/users/5526072/jwimberley , occurring on my update to julia 0.6 from 0.5, and now using dataframes v 0.10.1.
Update: I made the following change to fix:
r[(r[:l] .== l) & (r[:w] .== w), :] # julia 0.5
r[.&(r[:l] .== l, r[:w] .== w), :] # julia 0.6
but this gets very slow with long chains (time taken \propto 2^chains) so maybe Query is the better way now:
# r is a dataframe
using Query
q1 = @from i in r begin
@where i.l == l && i.w == w && i.nl == nl && i.lt == lt &&
i.vz == vz && i.vw == vw && i.vδ == vδ &&
i.ζx == ζx && i.ζy == ζy && i.ζδx == ζδx
@select {absu=i.absu, i.dBU}
@collect DataFrame
end
for example. This is fast. It's in the DataFrames documentation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With