I am given a data set that looks something like this
and I am trying to graph all the points with a 1 on the first column separate from the points with a 0, but I want to put them in the same chart.
I know the final result should be something similar to this
But I can't find a way to filter the points in Julia. I'm using LinearAlgebra, CSV, Plots, DataFrames for my project, and so far I haven't found a way to make DataFrames storage types work nicely with Plots functions. I keep running into errors like Cannot convert Float64 to series data for plotting
when I try plotting the points individually with a for loop as a filter as shown in the code below
filter = select(data, :1)
newData = select(data, 2:3)
#graph one initial point to create the plot
plot(newData[1,1], newData[1,2], seriestype = :scatter, title = "My Scatter Plot")
#add the additional points with the 1 in front
for i in 2:size(newData)
if filter[i] == 1
plot!(newData[i, 1], newData[i, 2], seriestype = :scatter, title = "My Scatter Plot")
end
end
Other approaches have given me other errors, but I haven't recorded those.
I'm using Julia 1.4.0 and the latest versions of all of the packages mentioned.
Quick Edit:
It might help to know that I am trying to replicate the Nonlinear dimensionality reduction section of this article https://sebastianraschka.com/Articles/2014_kernel_pca.html#principal-component-analysis
Highlight the two columns you want to include in your scatter plot. Then, go to the “Insert” tab of your Excel menu bar and click on the scatter plot icon in the “Recommended Charts” area of your ribbon. Select “Scatter” from the options in the “Recommended Charts” section of your ribbon.
With Plots.jl you can do the following (I am passing a fully reproducible code):
julia> df = DataFrame(c=rand(Bool, 100), x = 2 .* rand(100) .- 1);
julia> df.y = ifelse.(df.c, 1, -1) .* df.x .^ 2;
julia> plot(df.x, df.y, color=ifelse.(df.c, "blue", "red"), seriestype=:scatter, legend=nothing)
However, in this case I would additionally use StatsPlots.jl as then you can just write:
julia> using StatsPlots
julia> @df df plot(:x, :y, group=:c, seriestype=:scatter, legend=nothing)
If you want to do it manually by groups it is easiest to use the groupby
function:
julia> gdf = groupby(df, :c);
julia> summary(gdf) # check that we have 2 groups in data
"GroupedDataFrame with 2 groups based on key: c"
julia> plot(gdf[1].x, gdf[1].y, seriestype=:scatter, legend=nothing)
julia> plot!(gdf[2].x, gdf[2].y, seriestype=:scatter)
Note that gdf
variable is bound to a GroupedDataFrame
object from which you can get groups defined by the grouping column (:c
) in this case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With