I have performed calculations on subsets of a DataFrame by using the groupby function:
using RDatasets
iris = dataset("datasets", "iris")
describe(iris)
iris_grouped = groupby(iris,:Species)
iris_avg = map(:SepalLength => mean,iris_grouped::GroupedDataFrame)
Now I would like to plot the results, but I get an error message for the following plot:
@df iris_avg bar(:Species,:SepalLength)
Only tables are supported
What would be the best way to plot the data? My idea would be to create a single DataFrame and go from there. How would I do this, ie how do I convert a GroupedDataFrame to a single DataFrame? Thanks!
To convert GroupedDataFrame into a DataFrame just call DataFrame on it, e.g.:
julia> DataFrame(iris_avg)
3×2 DataFrame
│ Row │ Species │ SepalLength_mean │
│ │ Categorical… │ Float64 │
├─────┼──────────────┼──────────────────┤
│ 1 │ setosa │ 5.006 │
│ 2 │ versicolor │ 5.936 │
│ 3 │ virginica │ 6.588 │
in your case.
You could also have written:
julia> combine(:SepalLength => mean, iris_grouped)
3×2 DataFrame
│ Row │ Species │ SepalLength_mean │
│ │ Categorical… │ Float64 │
├─────┼──────────────┼──────────────────┤
│ 1 │ setosa │ 5.006 │
│ 2 │ versicolor │ 5.936 │
│ 3 │ virginica │ 6.588 │
on an original GroupedDataFrame or
julia> by(:SepalLength => mean, iris, :Species)
3×2 DataFrame
│ Row │ Species │ SepalLength_mean │
│ │ Categorical… │ Float64 │
├─────┼──────────────┼──────────────────┤
│ 1 │ setosa │ 5.006 │
│ 2 │ versicolor │ 5.936 │
│ 3 │ virginica │ 6.588 │
on an original DataFrame.
I write the transformation as the first argument here, but typically, you would write it as the last (as then you can pass multiple transformations), e.g.:
julia> by(iris, :Species, :SepalLength => mean, :SepalWidth => minimum)
3×3 DataFrame
│ Row │ Species │ SepalLength_mean │ SepalWidth_minimum │
│ │ Categorical… │ Float64 │ Float64 │
├─────┼──────────────┼──────────────────┼────────────────────┤
│ 1 │ setosa │ 5.006 │ 2.3 │
│ 2 │ versicolor │ 5.936 │ 2.0 │
│ 3 │ virginica │ 6.588 │ 2.2 │
I think you might be better off using the by function to get to your iris_avg directly. by iterates through a DataFrame, and then applies the given function to the the results. Often, it's used with a do block.
julia> by(iris, :Species) do df
DataFrame(sepal_mean = mean(df.SepalLength))
end
3×2 DataFrame
│ Row │ Species │ sepal_mean │
│ │ Categorical… │ Float64 │
├─────┼──────────────┼────────────┤
│ 1 │ setosa │ 5.006 │
│ 2 │ versicolor │ 5.936 │
│ 3 │ virginica │ 6.588 │
Or equivalently,
julia> by(iris, :Species, SepalLength_mean = :SepalLength => mean)
3×2 DataFrame
│ Row │ Species │ SepalLength_mean │
│ │ Categorical… │ Float64 │
├─────┼──────────────┼──────────────────┤
│ 1 │ setosa │ 5.006 │
│ 2 │ versicolor │ 5.936 │
│ 3 │ virginica │ 6.588 │
See here for more details/examples.
Alternatively, you can do it in several steps as you've done, then use DataFrame constructor to convert to a proper DataFrame:
julia> iris_grouped = groupby(iris,:Species);
julia> iris_avg = map(:SepalLength => mean,iris_grouped::GroupedDataFrame);
julia> DataFrame(iris_avg)
3×2 DataFrame
│ Row │ Species │ SepalLength_mean │
│ │ Categorical… │ Float64 │
├─────┼──────────────┼──────────────────┤
│ 1 │ setosa │ 5.006 │
│ 2 │ versicolor │ 5.936 │
│ 3 │ virginica │ 6.588 │
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With