I'd like a way to summarise a database table so that rows sharing a common ID are summarised into one row of output.
My tools are SQLite and Python 2.x.
For example, given the following table of fruit prices at my local supermarkets...
+--------------------+--------------------+--------------------+
|Fruit |Shop |Price |
+--------------------+--------------------+--------------------+
|Apple |Coles |$1.50 |
|Apple |Woolworths |$1.60 |
|Apple |IGA |$1.70 |
|Banana |Coles |$0.50 |
|Banana |Woolworths |$0.60 |
|Banana |IGA |$0.70 |
|Cherry |Coles |$5.00 |
|Date |Coles |$2.00 |
|Date |Woolworths |$2.10 |
|Elderberry |IGA |$10.00 |
+--------------------+--------------------+--------------------+
... I want to produce a summary table showing me the price of each fruit at each supermarket. Blank spaces should be filled by NULLs.
+----------+----------+----------+----------+
|Fruit |Coles |Woolworths|IGA |
+----------+----------+----------+----------+
|Apple |$1.50 |$1.60 |$1.70 |
|Banana |$0.50 |$0.60 |$0.70 |
|Cherry |NULL |$5.00 |NULL |
|Date |$2.00 |$2.10 |NULL |
|Elderberry|NULL |NULL |$10.00 |
+----------+----------+----------+----------+
I believe the literature calls this a "pivot table" or a "pivot query", but apparently SQLite doesn't support PIVOT
. (The solution in that question uses hardcoded LEFT JOIN
s. This doesn't really appeal to me because I don't know the "column" names in advance.)
Right now I do this by iterating through the entire table in Python and accumulating a dict
of dicts
, which is a bit klutzy. I am open to better solutions, either in Python or SQLite, that will give the data in tabular form.
A PivotTable is an interactive way to quickly summarize large amounts of data. You can use a PivotTable to analyze numerical data in detail, and answer unanticipated questions about your data. A PivotTable is especially designed for: Querying large amounts of data in many user-friendly ways.
The pivot table's Sum function totals all the underlying values for each item in the field. The result is the same as using the SUM function on the worksheet to total the values. Blank cells, and cells with text are ignored.
Click anywhere in the PivotTable to show the PivotTable Tools on the ribbon. Click Design, and then click the More button in the PivotTable Styles gallery to see all available styles. Pick the style you want to use. If you don't see a style you like, you can create your own.
The pandas package can handle this very nicely.
>>> import pandas
>>> df=pandas.DataFrame(data, columns=['Fruit', 'Shop', 'Price'])
>>> df.pivot(index='Fruit', columns='Shop', values='Price')
Shop Coles IGA Woolworths
Fruit
Apple 1.5 1.7 1.6
Banana 0.5 0.7 0.6
Cherry 5.0 NaN NaN
Date 2.0 NaN 2.1
Elderberry NaN 10.0 NaN
The documentation: http://pandas.pydata.org/pandas-docs/stable/reshaping.html
Some IPython Notebooks to learn pandas: https://bitbucket.org/hrojas/learn-pandas
Hope that will help.
Regards
Patrick Brockmann
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With