Summary table of unique value combinations in DataFrames.jl

Question

I often want to find the unique combinations of some grouping variables in a data table. With R + dplyr, my normal workflow is to combine groupby(data, var1, var2, var3) %>% summarise, which returns a new table with the columns var1, var2, var3, with one row for each unique combination of values found in data.

What's the idiomatic way to do this in DataFrames.jl?

Dave Kleinschmidt · Accepted Answer

In DataFrames.jl, a DataFrame is a collection of rows. So the right mental model here is to first select only the columns you care about, then get the unique rows from that table, as in

select(data, [:var1, :var2, :var3]) |> unique!

(Or if you hate the pipe/love extra parens:

unique!(select(data, [:var1, :var2, :var3]))

unique! is recommended here because select makes a copy of the underlying columns. Alternatively, you could use a view or indexing, but these require unique (which does not mutate the underlying column vectors) so as not to corrupt the original data frame:

unique(data[!, [:var1, :var2, :var3]])
unique(view(data, :, [:var1, :var2, :var3]))

Bogumił Kamiński · Answer

Alternatively you can write:

keys(groupby(data, [:var1, :var2, :var3]))

to get a vector of unique grouping keys. Then you can collect them to a DataFrame if you want by writing:

groupby(data, [:var1, :var2, :var3]) |> keys |> DataFrame

Summary table of unique value combinations in DataFrames.jl

Tags:

dataframe

julia

Dave Kleinschmidt

2 Answers

Dave Kleinschmidt

Bogumił Kamiński

Recent Activity

Donate For Us

Summary table of unique value combinations in DataFrames.jl

Tags:

dataframe

julia

Dave Kleinschmidt

2 Answers

Dave Kleinschmidt

Bogumił Kamiński

Related questions

Recent Activity

Donate For Us