Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding the line of identity to a scatter plot using altair

I have created a basic scatter plot to compare two variables using altair. I expect the variables to be strongly correlated and the points should end up on or close to the line of identity.

How can I add the line of identity to the plot?

I would like it to be a line similar to those created by mark_rule, but extending diagonally instead of vertically or horizontally.

Here is as far as I have gotten:

import altair as alt
import numpy as np
import pandas as pd

norm = np.random.multivariate_normal([0, 0], [[2, 1.8],[1.8, 2]], 100)

df = pd.DataFrame(norm, columns=['var1', 'var2'])

chart = alt.Chart(df, width=500, height=500).mark_circle(size=100).encode(
    alt.X('var1'),
    alt.Y('var2'),
).interactive()

line = alt.Chart(
    pd.DataFrame({'var1': [-4, 4], 'var2': [-4, 4]})).mark_line().encode(
            alt.X('var1'),
            alt.Y('var2'),
).interactive()

chart + line

The problems with this example is that the line doesn't extend forever when zooming (like a rule mark) and that the plot gets automatically scaled to the line endings instead of only the points.

like image 568
Rikard N Avatar asked Jan 16 '20 13:01

Rikard N


People also ask

How do I add a line in scatter?

Select the data that you want to plot in the line chart. Click the Insert tab, and then click Insert Line or Area Chart. Click Line with Markers.

How do you make a scatter plot on Altair?

For this, we use Chart() function in Altair to load the data and then use the mark_point() function to make a scatter plot. We then use the aesthetics x and y-axis to encode() function.

How to describe a scatter plot?

A scatterplot shows the relationship between two quantitative variables measured for the same individuals. The values of one variable appear on the horizontal axis, and the values of the other variable appear on the vertical axis. Each individual in the data appears as a point on the graph.


1 Answers

It's not perfect but you could make the line longer and set the scale domain.

import altair as alt
import numpy as np
import pandas as pd

norm = np.random.multivariate_normal([0, 0], [[2, 1.8],[1.8, 2]], 100)

df = pd.DataFrame(norm, columns=['var1', 'var2'])

chart = alt.Chart(df, width=500, height=500).mark_circle(size=100).encode(
    alt.X('var1', scale=alt.Scale(domain=[-4,4])),
    alt.Y('var2', scale=alt.Scale(domain=[-4,4])),
).interactive()

line = alt.Chart(
    pd.DataFrame({'var1': [-100, 100], 'var2': [-100, 100]})).mark_line().encode(
            alt.X('var1'),
            alt.Y('var2'),
).interactive()

chart + line
like image 154
dominik Avatar answered Oct 03 '22 03:10

dominik