Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I turn a flat file of data into a queryable data source

I generate files, lets call them .dwrf files, which contain a significant amount of data. Currently we export those to .CSV and the resulting files are large (2GB+). I would like to cut out the export process and make the contents of a .dwrf file queryable directly from Excel or other applications.

What I would like to do is write a utility/service - lets call it dwrfMiner - to extract data from the file and pass it on as a datasource and link dwrfMiner to .dwrf files in some way so that Excel recognises it as an external data source.

Any ideas?

like image 848
dwarFish Avatar asked Aug 10 '10 15:08

dwarFish


2 Answers

While writing an ODBC driver for this is probably overkill, if the format of the files you are working with is known in advance and isn't too hard to translate (it sounds like not considering you are already creating CSVs) then using an ODBC DSN sounds like your best bet.

There are a nice selection of ODBC drivers already built in to Windows (.txt, .csv, .mdb, .xl*, .dbf, Paradox .db, etc etc) and you can obtain other drivers from the web for a lot of common formats.

If the size of the existing format you're exporting to is too onerous (CSV) then the logical point to start is a transformation of your data to something more space-conscious that has ODBC support.

Failing that, your last option is the overkill option (Writing an ODBC driver).

like image 56
Kilanash Avatar answered Oct 17 '22 02:10

Kilanash


Excel can query external data souces, but beware that Excel (all versions) have hard-limits on the number of rows they can display, per work-book. I think in Excel 2003 the limit is ~65k. It's higher in other versions.

See my question: reporting tool/viewer for large datasets (and I had much less than > 2GB).

like image 32
FrustratedWithFormsDesigner Avatar answered Oct 17 '22 01:10

FrustratedWithFormsDesigner