Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make a scatter plot based on the values of a column in the data set?

I am given a data set that looks something like this

data

and I am trying to graph all the points with a 1 on the first column separate from the points with a 0, but I want to put them in the same chart.

I know the final result should be something similar to this enter image description here

But I can't find a way to filter the points in Julia. I'm using LinearAlgebra, CSV, Plots, DataFrames for my project, and so far I haven't found a way to make DataFrames storage types work nicely with Plots functions. I keep running into errors like Cannot convert Float64 to series data for plotting when I try plotting the points individually with a for loop as a filter as shown in the code below

filter = select(data, :1)
newData = select(data, 2:3)

#graph one initial point to create the plot
plot(newData[1,1], newData[1,2], seriestype = :scatter, title = "My Scatter Plot")

#add the additional points with the 1 in front
for i in 2:size(newData)
    if filter[i] == 1
        plot!(newData[i, 1], newData[i, 2], seriestype = :scatter, title = "My Scatter Plot")
    end
end

Other approaches have given me other errors, but I haven't recorded those.

I'm using Julia 1.4.0 and the latest versions of all of the packages mentioned.

Quick Edit:

It might help to know that I am trying to replicate the Nonlinear dimensionality reduction section of this article https://sebastianraschka.com/Articles/2014_kernel_pca.html#principal-component-analysis

like image 626
KeyboardHunter Avatar asked May 07 '20 06:05

KeyboardHunter


People also ask

How do you make a scatter plot in Excel with two columns of data?

Highlight the two columns you want to include in your scatter plot. Then, go to the “Insert” tab of your Excel menu bar and click on the scatter plot icon in the “Recommended Charts” area of your ribbon. Select “Scatter” from the options in the “Recommended Charts” section of your ribbon.


1 Answers

With Plots.jl you can do the following (I am passing a fully reproducible code):

julia> df = DataFrame(c=rand(Bool, 100), x = 2 .* rand(100) .- 1);

julia> df.y = ifelse.(df.c, 1, -1) .* df.x .^ 2;

julia> plot(df.x, df.y, color=ifelse.(df.c, "blue", "red"), seriestype=:scatter, legend=nothing)

However, in this case I would additionally use StatsPlots.jl as then you can just write:

julia> using StatsPlots

julia> @df df plot(:x, :y, group=:c, seriestype=:scatter, legend=nothing)

If you want to do it manually by groups it is easiest to use the groupby function:

julia> gdf = groupby(df, :c);

julia> summary(gdf) # check that we have 2 groups in data
"GroupedDataFrame with 2 groups based on key: c"

julia> plot(gdf[1].x, gdf[1].y, seriestype=:scatter, legend=nothing)

julia> plot!(gdf[2].x, gdf[2].y, seriestype=:scatter)

Note that gdf variable is bound to a GroupedDataFrame object from which you can get groups defined by the grouping column (:c) in this case.

like image 86
Bogumił Kamiński Avatar answered Oct 10 '22 00:10

Bogumił Kamiński