Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Manipulation performance of Sqlite vs CSV file

Tags:

sqlite

csv

As a matter of manipulation performance, which one is better, Sqlite or CSV file?

like image 970
BobyCloud Avatar asked Nov 19 '16 17:11

BobyCloud


People also ask

Is SQLite faster than file system?

Summary. SQLite reads and writes small blobs (for example, thumbnail images) 35% faster¹ than the same blobs can be read from or written to individual files on disk using fread() or fwrite().

Which is faster CSV or SQL?

If you always need all data from a single table (like for application settings ), CSV is faster, otherwise not.

What is faster than SQLite?

With Actian Zen, developers and product managers get all the advantages of SQLite but in a powerful, secure, and scalable engine that can run serverless or as a client-server. Actian Zen is orders of magnitude faster than SQLite.

What is the main limitation of SQLite?

An SQLite database is limited in size to 281 terabytes (248 bytes, 256 tibibytes). And even if it could handle larger databases, SQLite stores the entire database in a single disk file and many filesystems limit the maximum size of files to something less than this.


2 Answers

Unless you're doing something very trivial to the CSV, and only doing it once, SQLite will be faster for runtime, coding time, and maintenance time, and it will be more flexible.

The major advantages of putting the CSV into SQLite are...

  • Query with a known query language.
  • Query with a flexible query language.
  • Take advantage of high performance indexing.
  • Don't have to write and maintain and document and test a bunch of custom query code.

You can look at the costs like this:

SQLite

  • Once...
    • Create the schema.
    • Import the CSV into SQLite (built in).
      • This may require you to write some code to translate the values.
    • [Optional, but recommended] Set up the indexes.
  • For each different query...
    • Do your query in SQL.

CSV

  • For each different query...
    • Write special code for your query.
    • Document how to use this special code.
    • Test your special query code.
    • Debug your special query code.
    • Run your special query code which has to...
      • Read the CSV file.
      • Parse the CSV file.
      • (Optional) Index the CSV file.
        • Come up with an indexing scheme.
      • Run your query.

Note that if your query is simple parsing and running can happen together. Something like "find all columns where field 5 is greater than 10".


It's easy to forget that even if you use a library to do the CSV parsing, there are coding and maintenance costs to writing special code to query a CSV file. Every query has to be coded, tested, and debugged. Every special case or option has to be coded, tested, and debugged.

Since it's all special stuff you made up, there's no convention to follow. People coming to use your query program have to understand what it does and how it works. If they want to do anything even slightly different, they (or you) have to get into the code, understand it, modify it, test it, debug it, and document it. This will generate a lot of support requests.

In contrast, SQLite requires you to write little or no special code beyond the SQL queries. SQL is a commonly known query language. You can say "this is a SQLite database" and it's very likely people will know what to do. Alternatively they'll go learn SQL which is generally applicable knowledge. Whereas learning your special CSV query program is one-off knowledge.

If people want to run a query you didn't anticipate they can just write the SQL themselves. You don't need to be bothered, and they don't need to puzzle out a bunch of code.

Finally, SQLite's query time will be far better with a well indexed table than anything you or I are likely to write. SQLite is a database collaborated on by many, many database experts. You're probably not going to outperform the carefully optimized code they've written in C. Even if you can edge out a bit of performance, don't you have better things to do?

like image 26
Schwern Avatar answered Nov 02 '22 15:11

Schwern


One clear advantage is that you cannot index a csv file. If you have to use subsets of your large data set, creating an index on the column in the sqlite table is an advantage.

like image 96
Rohit Chopra Avatar answered Nov 02 '22 15:11

Rohit Chopra