I often want to find the unique combinations of some grouping variables in a data table. With R + dplyr, my normal workflow is to combine groupby(data, var1, var2, var3) %>% summarise
, which returns a new table with the columns var1
, var2
, var3
, with one row for each unique combination of values found in data
.
What's the idiomatic way to do this in DataFrames.jl?
In DataFrames.jl, a DataFrame is a collection of rows. So the right mental model here is to first select only the columns you care about, then get the unique rows from that table, as in
select(data, [:var1, :var2, :var3]) |> unique!
(Or if you hate the pipe/love extra parens:
unique!(select(data, [:var1, :var2, :var3]))
unique!
is recommended here because select
makes a copy of the underlying columns. Alternatively, you could use a view or indexing, but these require unique
(which does not mutate the underlying column vectors) so as not to corrupt the original data frame:
unique(data[!, [:var1, :var2, :var3]])
unique(view(data, :, [:var1, :var2, :var3]))
Alternatively you can write:
keys(groupby(data, [:var1, :var2, :var3]))
to get a vector of unique grouping keys. Then you can collect them to a DataFrame
if you want by writing:
groupby(data, [:var1, :var2, :var3]) |> keys |> DataFrame
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With