I am trying to link several Altair charts that share aspects of the same data. I can do this by merging all the data into one data frame, but because of the nature of the data the merged data frame is much larger than is needed to have two separate data frames for each of the two charts. This is because the columns unique to each chart have many repeated rows for each entry in the shared column.
Would using transform_lookup save space over just using the merged data frame, or does transform_lookup end up doing the whole merge internally?
No, the entire dataset is still included in the vegaspec when you use transform_lookup. You can see this by printing the json spec of the charts you create. With the example from the docs:
import altair as alt
import pandas as pd
from vega_datasets import data
people = data.lookup_people().head(3)
people
name age height
0 Alan 25 180
1 George 32 174
2 Fred 39 182
groups = data.lookup_groups().head(3)
groups
group person
0 1 Alan
1 1 George
2 1 Fred
With pandas merge:
merged = pd.merge(groups, people, how='left',
left_on='person', right_on='name')
print(alt.Chart(merged).mark_bar().encode(
x='mean(age):Q',
y='group:O'
).to_json())
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.8.1.json",
"config": {
"view": {
"continuousHeight": 300,
"continuousWidth": 400
}
},
"data": {
"name": "data-b41b97ffc89b39c92e168871d447e720"
},
"datasets": {
"data-b41b97ffc89b39c92e168871d447e720": [
{
"age": 25,
"group": 1,
"height": 180,
"name": "Alan",
"person": "Alan"
},
{
"age": 32,
"group": 1,
"height": 174,
"name": "George",
"person": "George"
},
{
"age": 39,
"group": 1,
"height": 182,
"name": "Fred",
"person": "Fred"
}
]
},
"encoding": {
"x": {
"aggregate": "mean",
"field": "age",
"type": "quantitative"
},
"y": {
"field": "group",
"type": "ordinal"
}
},
"mark": "bar"
}
With transform lookup all the data is there but as to separate dataset (so technically it takes a little bit of more space with the additional braces and the transform):
print(alt.Chart(groups).mark_bar().encode(
x='mean(age):Q',
y='group:O'
).transform_lookup(
lookup='person',
from_=alt.LookupData(data=people, key='name',
fields=['age'])
).to_json())
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.8.1.json",
"config": {
"view": {
"continuousHeight": 300,
"continuousWidth": 400
}
},
"data": {
"name": "data-5fe242a79352d1fe243b588af570c9c6"
},
"datasets": {
"data-2b374d1509415e1d327c3a7521f8117c": [
{
"age": 25,
"height": 180,
"name": "Alan"
},
{
"age": 32,
"height": 174,
"name": "George"
},
{
"age": 39,
"height": 182,
"name": "Fred"
}
],
"data-5fe242a79352d1fe243b588af570c9c6": [
{
"group": 1,
"person": "Alan"
},
{
"group": 1,
"person": "George"
},
{
"group": 1,
"person": "Fred"
}
]
},
"encoding": {
"x": {
"aggregate": "mean",
"field": "age",
"type": "quantitative"
},
"y": {
"field": "group",
"type": "ordinal"
}
},
"mark": "bar",
"transform": [
{
"from": {
"data": {
"name": "data-2b374d1509415e1d327c3a7521f8117c"
},
"fields": [
"age",
"height"
],
"key": "name"
},
"lookup": "person"
}
]
}
When transform_lookup can save space is if you use it with the URLs of two dataset:
people = data.lookup_people.url
groups = data.lookup_groups.url
print(alt.Chart(groups).mark_bar().encode(
x='mean(age):Q',
y='group:O'
).transform_lookup(
lookup='person',
from_=alt.LookupData(data=people, key='name',
fields=['age'])
).to_json())
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.8.1.json",
"config": {
"view": {
"continuousHeight": 300,
"continuousWidth": 400
}
},
"data": {
"url": "https://vega.github.io/vega-datasets/data/lookup_groups.csv"
},
"encoding": {
"x": {
"aggregate": "mean",
"field": "age",
"type": "quantitative"
},
"y": {
"field": "group",
"type": "ordinal"
}
},
"mark": "bar",
"transform": [
{
"from": {
"data": {
"url": "https://vega.github.io/vega-datasets/data/lookup_people.csv"
},
"fields": [
"age",
"height"
],
"key": "name"
},
"lookup": "person"
}
]
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With