I have created a basic scatter plot to compare two variables using altair. I expect the variables to be strongly correlated and the points should end up on or close to the line of identity.
How can I add the line of identity to the plot?
I would like it to be a line similar to those created by mark_rule
, but extending diagonally instead of vertically or horizontally.
Here is as far as I have gotten:
import altair as alt
import numpy as np
import pandas as pd
norm = np.random.multivariate_normal([0, 0], [[2, 1.8],[1.8, 2]], 100)
df = pd.DataFrame(norm, columns=['var1', 'var2'])
chart = alt.Chart(df, width=500, height=500).mark_circle(size=100).encode(
alt.X('var1'),
alt.Y('var2'),
).interactive()
line = alt.Chart(
pd.DataFrame({'var1': [-4, 4], 'var2': [-4, 4]})).mark_line().encode(
alt.X('var1'),
alt.Y('var2'),
).interactive()
chart + line
The problems with this example is that the line doesn't extend forever when zooming (like a rule mark) and that the plot gets automatically scaled to the line endings instead of only the points.
Select the data that you want to plot in the line chart. Click the Insert tab, and then click Insert Line or Area Chart. Click Line with Markers.
For this, we use Chart() function in Altair to load the data and then use the mark_point() function to make a scatter plot. We then use the aesthetics x and y-axis to encode() function.
A scatterplot shows the relationship between two quantitative variables measured for the same individuals. The values of one variable appear on the horizontal axis, and the values of the other variable appear on the vertical axis. Each individual in the data appears as a point on the graph.
It's not perfect but you could make the line longer and set the scale domain.
import altair as alt
import numpy as np
import pandas as pd
norm = np.random.multivariate_normal([0, 0], [[2, 1.8],[1.8, 2]], 100)
df = pd.DataFrame(norm, columns=['var1', 'var2'])
chart = alt.Chart(df, width=500, height=500).mark_circle(size=100).encode(
alt.X('var1', scale=alt.Scale(domain=[-4,4])),
alt.Y('var2', scale=alt.Scale(domain=[-4,4])),
).interactive()
line = alt.Chart(
pd.DataFrame({'var1': [-100, 100], 'var2': [-100, 100]})).mark_line().encode(
alt.X('var1'),
alt.Y('var2'),
).interactive()
chart + line
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With