Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Cubes OLAP Framework - how to work with joins?

I'm trying to use python's olap framework cubes on a very simple database, but I am having some trouble joining tables.

My schema looks like this:

Users table
ID | name

Products table
ID | name | price

Purchases table
ID | user_id | product_id | date

And the cubes model:

{
    'dimensions': [
        {'name': 'user_id'},
        {'name': 'product_id'},
        {'name': 'date'},
    ],
    'cubes': [
        {
            'name': 'purchases',
            'dimensions': ['user_id', 'product_id', 'date'],
            'measures': ['price']
            'mappings': {
                'purchases.user_id': 'users.id',
                'purchases.product_id': 'products.id',
                'purchases.price': 'products.price'
            },
            'joins': [
                {
                    'master': 'purchases.user_id',
                    'detail': 'users.id'
                },
                {
                    'master': 'purchases.product_id',
                    'detail': 'products.id'
                }
            ]
        }
    ]
}

Now I would like to display all the purchases, showing the product's name, user's name and purchase date. I can't seem to find a way to do this. The documentation is a bit scarce.

Thank you

like image 949
user1491915 Avatar asked Nov 05 '12 14:11

user1491915


1 Answers

First let's fix the model a bit. In your schema you have more attributes per dimension: id and name, you might end up having more details in the future. You can add them by specifying attributes as a list: "attriubtes": ["id", "name"]. Note also that the dimension is named as entity product not as a key id_product. The key id_product is just an attribute of the product dimension, as is name or in the future maybe category. Dimension reflects analysts point of view.

For the time being, we ignore the fact that date should be a special dimension and consider date as single-value key, for example a year, not to make things complicated here.

"dimensions": [
    {"name": "user", "attributes": ["id", "name"]},
    {"name": "product", "attributes": ["id", "name"]},
    {"name": "date"}
],

Because we changed names of the dimensions, we have to change them in the cube's dimension list:

"cubes": [
    {
        "name": "purchases",
        "dimensions": ["user", "product", "date"],
        ...

Your schema reflects classic transactional schema, not traditional data warehouse schema. In this case, you have to be explicit, as you were, and mention all necessary mappings. The rule is: if the attribute belongs to a fact table (logical view), then the key is just attribute, such as price, no table specification. If the attribute belongs to a dimension, such as product.id, then the syntax is dimension.attribute. The value of the mappings dictionary is physical table and physical column. See more information about mappings. Mappings for your schema look like:

"mappings": {
    "price": "products.price",
    "product.id": "products.id",
    "product.name": "products.name",
    "user.id": "users.id",
    "user.name": "users.name"
}

You would not have to write mappings if your schema was:

fact purchases
id | date | user_id | product_id | amount

dimension product
id | name | price

dimension user
id | name

In this case you will need only joins, because all dimension attributes are in their respective dimension tables. Note the amount in the fact table, which in your case, as you do not have count of purchased products per purchase, would be the same as price in product.

Here is the updated model for your model:

{
    "dimensions": [
        {"name": "user", "attributes": ["id", "name"]},
        {"name": "product", "attributes": ["id", "name"]},
        {"name": "date"}
    ],
    "cubes": [
        {
            "name": "purchases",
            "dimensions": ["user", "product", "date"],
            "measures": ["price"],
            "mappings": {
                "price": "products.price",
                "product.id": "products.id",
                "product.name": "products.name",
                "user.id": "users.id",
                "user.name": "users.name"
            },
            "joins": [
                {
                    "master": "purchases.user_id",
                    "detail": "users.id"
                },
                {
                    "master": "purchases.product_id",
                    "detail": "products.id"
                }
            ]
        }

    ]
}

You can try the model without writing any Python code, just by using the slicer command. For that you will need slicer.ini configuration file:

[server]
backend: sql
port: 5000
log_level: info
prettyprint: yes

[workspace]
url: sqlite:///data.sqlite

[model]
path: model.json

Change url in [workspace] to point to your database and change path in [model] to point to your model file. Now you can try:

curl "http://localhost:5000/aggregate"

Also try to drill-down:

curl "http://localhost:5000/aggregate?drilldown=product"

If you need any further help, just let me know, I'm the Cubes author.

like image 195
Stiivi Avatar answered Nov 15 '22 14:11

Stiivi