does `transform_lookup` save space?

Question

I am trying to link several Altair charts that share aspects of the same data. I can do this by merging all the data into one data frame, but because of the nature of the data the merged data frame is much larger than is needed to have two separate data frames for each of the two charts. This is because the columns unique to each chart have many repeated rows for each entry in the shared column.

Would using transform_lookup save space over just using the merged data frame, or does transform_lookup end up doing the whole merge internally?

joelostblom · Accepted Answer

No, the entire dataset is still included in the vegaspec when you use transform_lookup. You can see this by printing the json spec of the charts you create. With the example from the docs:

import altair as alt
import pandas as pd
from vega_datasets import data

people = data.lookup_people().head(3)
people

    name    age height
0   Alan    25  180
1   George  32  174
2   Fred    39  182

groups = data.lookup_groups().head(3)
groups

    group   person
0   1   Alan
1   1   George
2   1   Fred

With pandas merge:

merged = pd.merge(groups, people, how='left',
                  left_on='person', right_on='name')

print(alt.Chart(merged).mark_bar().encode(
    x='mean(age):Q',
    y='group:O'
).to_json())

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.8.1.json",
  "config": {
    "view": {
      "continuousHeight": 300,
      "continuousWidth": 400
    }
  },
  "data": {
    "name": "data-b41b97ffc89b39c92e168871d447e720"
  },
  "datasets": {
    "data-b41b97ffc89b39c92e168871d447e720": [
      {
        "age": 25,
        "group": 1,
        "height": 180,
        "name": "Alan",
        "person": "Alan"
      },
      {
        "age": 32,
        "group": 1,
        "height": 174,
        "name": "George",
        "person": "George"
      },
      {
        "age": 39,
        "group": 1,
        "height": 182,
        "name": "Fred",
        "person": "Fred"
      }
    ]
  },
  "encoding": {
    "x": {
      "aggregate": "mean",
      "field": "age",
      "type": "quantitative"
    },
    "y": {
      "field": "group",
      "type": "ordinal"
    }
  },
  "mark": "bar"
}

With transform lookup all the data is there but as to separate dataset (so technically it takes a little bit of more space with the additional braces and the transform):

print(alt.Chart(groups).mark_bar().encode(
    x='mean(age):Q',
    y='group:O'
).transform_lookup(
    lookup='person',
    from_=alt.LookupData(data=people, key='name',
                         fields=['age'])
).to_json())

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.8.1.json",
  "config": {
    "view": {
      "continuousHeight": 300,
      "continuousWidth": 400
    }
  },
  "data": {
    "name": "data-5fe242a79352d1fe243b588af570c9c6"
  },
  "datasets": {
    "data-2b374d1509415e1d327c3a7521f8117c": [
      {
        "age": 25,
        "height": 180,
        "name": "Alan"
      },
      {
        "age": 32,
        "height": 174,
        "name": "George"
      },
      {
        "age": 39,
        "height": 182,
        "name": "Fred"
      }
    ],
    "data-5fe242a79352d1fe243b588af570c9c6": [
      {
        "group": 1,
        "person": "Alan"
      },
      {
        "group": 1,
        "person": "George"
      },
      {
        "group": 1,
        "person": "Fred"
      }
    ]
  },
  "encoding": {
    "x": {
      "aggregate": "mean",
      "field": "age",
      "type": "quantitative"
    },
    "y": {
      "field": "group",
      "type": "ordinal"
    }
  },
  "mark": "bar",
  "transform": [
    {
      "from": {
        "data": {
          "name": "data-2b374d1509415e1d327c3a7521f8117c"
        },
        "fields": [
          "age",
          "height"
        ],
        "key": "name"
      },
      "lookup": "person"
    }
  ]
}

When transform_lookup can save space is if you use it with the URLs of two dataset:

people = data.lookup_people.url
groups = data.lookup_groups.url
print(alt.Chart(groups).mark_bar().encode(
    x='mean(age):Q',
    y='group:O'
).transform_lookup(
    lookup='person',
    from_=alt.LookupData(data=people, key='name',
                         fields=['age'])
).to_json())

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.8.1.json",
  "config": {
    "view": {
      "continuousHeight": 300,
      "continuousWidth": 400
    }
  },
  "data": {
    "url": "https://vega.github.io/vega-datasets/data/lookup_groups.csv"
  },
  "encoding": {
    "x": {
      "aggregate": "mean",
      "field": "age",
      "type": "quantitative"
    },
    "y": {
      "field": "group",
      "type": "ordinal"
    }
  },
  "mark": "bar",
  "transform": [
    {
      "from": {
        "data": {
          "url": "https://vega.github.io/vega-datasets/data/lookup_people.csv"
        },
        "fields": [
          "age",
          "height"
        ],
        "key": "name"
      },
      "lookup": "person"
    }
  ]
}

does `transform_lookup` save space?

Tags:

python

altair

Jesse Bloom

1 Answers

joelostblom

Recent Activity

Donate For Us

does `transform_lookup` save space?

Tags:

python

altair

Jesse Bloom

1 Answers

joelostblom

Related questions

Recent Activity

Donate For Us