Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

does `transform_lookup` save space?

Tags:

python

altair

I am trying to link several Altair charts that share aspects of the same data. I can do this by merging all the data into one data frame, but because of the nature of the data the merged data frame is much larger than is needed to have two separate data frames for each of the two charts. This is because the columns unique to each chart have many repeated rows for each entry in the shared column.

Would using transform_lookup save space over just using the merged data frame, or does transform_lookup end up doing the whole merge internally?

like image 440
Jesse Bloom Avatar asked Dec 27 '25 16:12

Jesse Bloom


1 Answers

No, the entire dataset is still included in the vegaspec when you use transform_lookup. You can see this by printing the json spec of the charts you create. With the example from the docs:

import altair as alt
import pandas as pd
from vega_datasets import data

people = data.lookup_people().head(3)
people
    name    age height
0   Alan    25  180
1   George  32  174
2   Fred    39  182
groups = data.lookup_groups().head(3)
groups
    group   person
0   1   Alan
1   1   George
2   1   Fred

With pandas merge:

merged = pd.merge(groups, people, how='left',
                  left_on='person', right_on='name')

print(alt.Chart(merged).mark_bar().encode(
    x='mean(age):Q',
    y='group:O'
).to_json())
{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.8.1.json",
  "config": {
    "view": {
      "continuousHeight": 300,
      "continuousWidth": 400
    }
  },
  "data": {
    "name": "data-b41b97ffc89b39c92e168871d447e720"
  },
  "datasets": {
    "data-b41b97ffc89b39c92e168871d447e720": [
      {
        "age": 25,
        "group": 1,
        "height": 180,
        "name": "Alan",
        "person": "Alan"
      },
      {
        "age": 32,
        "group": 1,
        "height": 174,
        "name": "George",
        "person": "George"
      },
      {
        "age": 39,
        "group": 1,
        "height": 182,
        "name": "Fred",
        "person": "Fred"
      }
    ]
  },
  "encoding": {
    "x": {
      "aggregate": "mean",
      "field": "age",
      "type": "quantitative"
    },
    "y": {
      "field": "group",
      "type": "ordinal"
    }
  },
  "mark": "bar"
}

With transform lookup all the data is there but as to separate dataset (so technically it takes a little bit of more space with the additional braces and the transform):

print(alt.Chart(groups).mark_bar().encode(
    x='mean(age):Q',
    y='group:O'
).transform_lookup(
    lookup='person',
    from_=alt.LookupData(data=people, key='name',
                         fields=['age'])
).to_json())
{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.8.1.json",
  "config": {
    "view": {
      "continuousHeight": 300,
      "continuousWidth": 400
    }
  },
  "data": {
    "name": "data-5fe242a79352d1fe243b588af570c9c6"
  },
  "datasets": {
    "data-2b374d1509415e1d327c3a7521f8117c": [
      {
        "age": 25,
        "height": 180,
        "name": "Alan"
      },
      {
        "age": 32,
        "height": 174,
        "name": "George"
      },
      {
        "age": 39,
        "height": 182,
        "name": "Fred"
      }
    ],
    "data-5fe242a79352d1fe243b588af570c9c6": [
      {
        "group": 1,
        "person": "Alan"
      },
      {
        "group": 1,
        "person": "George"
      },
      {
        "group": 1,
        "person": "Fred"
      }
    ]
  },
  "encoding": {
    "x": {
      "aggregate": "mean",
      "field": "age",
      "type": "quantitative"
    },
    "y": {
      "field": "group",
      "type": "ordinal"
    }
  },
  "mark": "bar",
  "transform": [
    {
      "from": {
        "data": {
          "name": "data-2b374d1509415e1d327c3a7521f8117c"
        },
        "fields": [
          "age",
          "height"
        ],
        "key": "name"
      },
      "lookup": "person"
    }
  ]
}

When transform_lookup can save space is if you use it with the URLs of two dataset:

people = data.lookup_people.url
groups = data.lookup_groups.url
print(alt.Chart(groups).mark_bar().encode(
    x='mean(age):Q',
    y='group:O'
).transform_lookup(
    lookup='person',
    from_=alt.LookupData(data=people, key='name',
                         fields=['age'])
).to_json())
{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.8.1.json",
  "config": {
    "view": {
      "continuousHeight": 300,
      "continuousWidth": 400
    }
  },
  "data": {
    "url": "https://vega.github.io/vega-datasets/data/lookup_groups.csv"
  },
  "encoding": {
    "x": {
      "aggregate": "mean",
      "field": "age",
      "type": "quantitative"
    },
    "y": {
      "field": "group",
      "type": "ordinal"
    }
  },
  "mark": "bar",
  "transform": [
    {
      "from": {
        "data": {
          "url": "https://vega.github.io/vega-datasets/data/lookup_people.csv"
        },
        "fields": [
          "age",
          "height"
        ],
        "key": "name"
      },
      "lookup": "person"
    }
  ]
}
like image 64
joelostblom Avatar answered Dec 30 '25 06:12

joelostblom



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!