Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Query OLAP Mondrian (MDX, XMLA) with a Python interface?

Actually I'm using R + Python with RPY2 to manipulate data and ggplot to create beautiful graphics.. I have some data in a PostgreSQL database, and I'm using psycopg2 to query data.

I'm starting a thesis, and in the future I need an OLAP cube to store my (very big) simulation data: multiple dimension, aggregation query, etc.

Is there any best or standard practice for interfacing between Python (and I want Python + R, no jpivot or some other dashboard in Java) and an OLAP engine like Mondrian? I searched on Google for any solution, and didn't I find anything.

I've briefly evaluated SQLAlchemy, and Django-ORM, but they have no MDX or XML/A interface to query an OLAP server (Mondrian or other) ...

Is it possible to write a query in MDX and, with psycopg + ODBC, query my OLAP server, and the OLAP server giving me an answer from my simulation data (no mapping on Python object, but it's OK for me)?

Update 1 :

Why do I need to search around OLAP + Mondrian technology ?

Because University of Laval (GeoSoa departements + Thierry Badard) wrote a spatial extension to OLAP: SOLAP, and implemented this in Mondrian as GeoMondrian. That interest me because I'm working on spatial multi agent based simulation ( ~= geosimulation).

The GeoSoa departement created an Ajax based component to communicate and visualize spatial data with GeoMondrian: SOLAPLAYERS, which can query a Mondrian server by its Xlma servlet.

Problem : probably slow in big data manipulation, need Internet or Apache 2. Briefly, it's only to visualize data or map ... In my case, I need raw data to make my own data manipulation + graphics with R: spatial analysis, regression analysis, rank-tail, etc. Here, SOLAP help me to prepare data for this later complex R analysis.

Why Python?

1 - Web access to spatial data -

I'm trying to use a "cool" Python framework, like GeoDjango or MapFish: big community in GIS, open-source, use GeoAlchemy to manipulate spatial query/data, include visualisation with JavaScript extensions and OpenLayers, etc.

2 - Local access to spatial data in GIS -

I want to create a plugin in QGIS (open source GIS) to access and visualize data, and QGIS plugin and API = Python.

3 - Automatic analysis of data -

A user or scientist runs a simulation with grid computing and choose automatic analysis (R + ggplot2 + MDX query) they want to run on this data. My goal here is to create a synthetic report of the simulation (graphic, tabular data, etc.).

So, after simulation, data go to OLAP/SOLAP cube, and many Python scripts (created by the user) get data with MDX, manipulate data with R + RPY2, and write and produce cool output for the scientist on doku-wiki or another community-platform.

Problem?

1 - Olap4j, the API core of Mondrian to communicate with an external component, is Java-made :/

2 - SOLAPLAYERS uses Ajax to access data, too slow for me.

3 - SQLAlchemy and GeoAlchemy have no driver connection to a multidimensional database (OLAP).

* Solution? *

1 - Py4j to access Java object or Java collection in olap4j with Python? Write my own function to access the Java mapped collection? => dangerous and not very easy?...

2 - XLMA with Ajax Mondrian server? It is too slow.

3 - Write my own py-connector to OLAP Mondrian ? => Ouch. It's an hard way, I think.

What should I do?

like image 677
reyman64 Avatar asked Sep 25 '10 09:09

reyman64


3 Answers

I don't know python, but I am author of mondrian/olap4j.

If you can use py4j to access olap4j, great. If not, definitely consider XMLA. It may not be as slow as you think (unless python's XML parsing is slow). The biggest problem is the complexity of constructing SOAP requests and understanding the responses.

Julian

like image 198
Julian Hyde Avatar answered Nov 19 '22 18:11

Julian Hyde


As you know, Mondrian is a complete OLAP engine written in java on top of a database like MySQL. So if I understand your question, you want to use Mondrian and wonder how to interface it with Python.

I use Mondrian packaged in a .jar to process MDX queries on command line and send back a JSON. Python calls it directly in the command line.

import commands
result = commands.getoutput('java -jar Mondrian_cli.jar -q 
select NON EMPTY Crossjoin({[Measures].[Store Sales]}, 
Crossjoin([Time].[1997].Children, [Store].[All Stores].Children)) ON COLUMNS, 
[Product].[All Products].Children ON ROWS from [Sales]') 

And for server use, I package it in a servlet and I send MDX with ajax. The ajax calls are not a big overhead and that's why I don't see the need of coupling Python and Java rather than just communicating with the Mondrian server.

like image 5
Biovisualize Avatar answered Nov 19 '22 18:11

Biovisualize


For very large data cubes storage and retrieval, HDF5 storage is working rather well (h5py or PyTables for a Python interface). Your application can then either run on a machine with a local copy of the HDF5 database or make an ad-hoc server solution (still in Python).

I have been designing hybrid SQL / HDF5 storage strategies when needed, and they are performing rather well.

If really need the MDX query language:

  • as an ORM (earlier answers on stackoverflow)

  • cubulus (although only a subset of MDX is implemented)

  • run the OLAP of your choice as a separate server and communicate with it through an ad-hoc interface (might even be XML through http).

like image 2
lgautier Avatar answered Nov 19 '22 17:11

lgautier