This seems like something that should be almost dead simple, yet I cannot accomplish it.
I have a dataframe df
in julia, where one column is of type Array{Union{Missing, Int64},1}
.
The values in that column are: [missing, 1, 2]
.
I would simply like to subset the dataframe df
to just see those rows that correspond to a condition, such as where the column is equal to 2.
What I have tried --> result:
df[df[:col].==2]
--> MethodError: no method matching getindex
df[df[:col].==2, :]
--> ArgumentError: invalid row index of type Bool
df[df[:col].==2, :col]
--> BoundsError: attempt to access String
(note that doing just df[!, :col]
results in: 1339-element Array{Union{Missing, Int64},1}: [...eliding output...]
, with my favorite warning so far in julia: Warning: getindex(df::DataFrame, col_ind::ColumnIndex) is deprecated, use df[!, col_ind] instead.
Having just used that would seem to exempt me from the warning, but whatever.)
This cannot be as hard as it seems.
Just as FYI, I can get what I want through using Query
and making a multi-line sql query just to subset data, which seems...burdensome.
There are two ways to solve your problem:
isequal
instead of ==
, as ==
implements 3-valued logic., so just writing one of will work:df[isequal.(df.col,2), :] # new data frame
filter(:col => isequal(2), df) # new data frame
filter!(:col => isequal(2), df) # update old data frame in place
==
use coalesce
on top of it, e.g.:df[coalesce.(df.col .== 2, false), :] # new data frame
There is nothing special about it related to DataFrames.jl. Indexing works the same way in Julia Base:
julia> x = [1, 2, missing]
3-element Array{Union{Missing, Int64},1}:
1
2
missing
julia> x[x .== 2]
ERROR: ArgumentError: unable to check bounds for indices of type Missing
julia> x[isequal.(x, 2)]
1-element Array{Union{Missing, Int64},1}:
2
(in general you can expect that, where possible, DataFrames.jl will work consistently with Julia Base; except for some corner cases where it is not possible - the major differences come from the fact that DataFrame
has heterogeneous column element types while Matrix
in Julia Base has homogeneous element type)
DataFrame
is a two-dimensional object. It has rows and columns. In Julia, normally, df[...]
notation is used to access object via locations in its dimensions. Therefore df[:col]
is not a valid way to index into a DataFrame
. You are trying to use one indexing dimension, while specifying both row and column indices is required. You are getting a warning, because you are using an invalid indexing approach (in the next release of DataFrames.jl this warning will be gone and you will just get an error).
Actually your example df[df[:col].==2]
shows why we disallow single-dimensional indexing. In df[:col]
you try to use a single dimensional index to subset columns, but in outer df[df[:col].==2]
you want to subset rows using a single dimensional index.
The easiest way to get a column from a data frame is df.col
or df."col"
(the second way is usually used if you have characters like spaces in the column name). This way you can access column :col
without copying it. An equivalent way to write this selection using indexing is df[!, :col]
. If you would want to copy the column write df[:, :col]
.
Indeed in Julia Base, if a
is an array (of whatever dimension) then a[i]
is a valid index if i
is an integer or CartesianIndex
. Doing df[i]
, where i
is an integer is not allowed for DataFrame
as it was judged that it would be too confusing for users if we wanted to follow the convention from Julia Base (as it is related to storage mode of arrays which is not the same as for DataFrame
). You are though allowed to write df[i]
when i
is CartesianIndex
(as this is unambiguous). I guess this is not something you are looking for.
All the rules what is allowed for indexing a DataFrame
are described in detail here. Also during JuliaCon 2020 there is going to be a workshop during which the design of indexing in DataFrames.jl will be discussed in detail (how it works, why it works this way, and how it is implemented internally).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With