How to transform a huge CSV into SQLite using Pandas?

Tags:

I have a huge table (about 60 GB) in form of an archived CSV file. I want to transform it into an SQLite file.

What I do at the moment in the following:

import pandas
import sqlite3
cnx = sqlite3.connect('db.sqlite')
df = pandas.read_csv('db.gz', compression='gzip')
df.to_sql('table_name', cnx)

It works fine for smaller files but with the huge files I have memory problem. The problem is that pandas reads the whole table into memory (RAM) and then saves it into SQLite file.

Is there an elegant solution to this problem?

326

asked Jan 08 '16 08:01

Roman

2 Answers

I haven't done any work with CSVs of that size, but it sounds like the kind of thing Odo might solve quickly.

I did a cursory check of the docs, and it appears they've written something addressing the topic of larger-than-memory CSV parsing into SQL databases that specifically calls out SQLite3 as a destination.

Here's the example they publish for parsing a 33 GB text file.

In [1]: dshape = discover(resource('all.csv'))

In [2]: %time t = odo('all.no.header.csv', 'sqlite:///db.db::nyc',
   ...:               dshape=dshape)
CPU times: user 3.09 s, sys: 819 ms, total: 3.91 s
Wall time: 57min 31s

174

answered Oct 02 '22 16:10

cwcobb

This is going to be problematic with pandas due to its size. Any reason you can't use the csv module and just iterate through the file.

Basic idea (untested):

import gzip
import csv
import sqlite3

with gzip.open('db.gz') as f, sqlite3.connect('db.sqlite') as cnx:
    reader = csv.reader(f)
    c = cnx.cursor()
    c.executemany('insert into table_name values (?,?,...)', reader)

answered Oct 02 '22 14:10

AChampion

Related questions
                            
                                Can I make an O(1) search algorithm using a sorted array with a known step?
                            
                                Get access token from Paypal in Python - Using urllib2 or requests library
                            
                                Meaning of the <- symbol in Python
                            
                                Convert raw string to JSON object in python2.7
                            
                                Find indices of large array if it contains values in smaller array
                            
                                Django with Celery - existing object not found
                            
                                How to make a process run on AWS EC2 even after closing the local machine?
                            
                                Reading an HTML File from Folder in Python
                            
                                Running python script in Visual Studio Code; how to get `input ()` to work?
                            
                                How to validate time format?
                            
                                why does x -= x + 4 return -4 instead of 4
                            
                                How to decode unicode in a Chinese text
                            
                                Understanding Table Coordinate system in Python ReportLab
                            
                                Read next line in Python
                            
                                Why does numpy.dot behave in this way?
                            
                                python logistic regression (beginner)
                            
                                Insert or update on table "django_admin_log" violates foreign key constraint when saving new model in admin
                            
                                Python function execution [closed]
                            
                                how does one make lines thicker in pandas subplots
                            
                                Proper return value for __len__ for an object that acts as an infinite sequence

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to transform a huge CSV into SQLite using Pandas?

Tags:

python

sqlite

pandas

csv

ram

Roman

People also ask

2 Answers

cwcobb

AChampion

Recent Activity

Donate For Us