Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

export very large sql file into csv with Python or R

I have a large sql file (20 GB) that I would like to convert into csv. I plan to load the file into Stata for analysis. I have enough ram to load the entire file (my computer has 32GB in RAM)

Problem is: the solutions I found online with Python so far (sqlite3) seem to require more RAM than my current system has to:

  • read the SQL
  • write the csv

Here is the code

import sqlite3
import pandas as pd

con=sqlite3.connect('mydata.sql')
query='select * from mydata'
data=pd.read_sql(query,con)
data.to_csv('export.csv')
con.close()

The sql file contains about 15 variables that can be timestamps, strings or numerical values. Nothing really fancy.

I think one possible solution could be to read the sql and write the csv file one line at a time. However, I have no idea how to do that (either in R or in Python)

Any help really appreciated!

like image 370
ℕʘʘḆḽḘ Avatar asked Nov 01 '15 20:11

ℕʘʘḆḽḘ


People also ask

How do I export a large CSV file in Python?

read_csv(chunksize) One way to process large files is to read the entries in chunks of reasonable size, which are read into the memory and are processed before reading the next chunk. We can use the chunk size parameter to specify the size of the chunk, which is the number of lines.

How do I convert .SQL to CSV in Python?

Usage. Just run python mysqldump_to_csv.py followed by the filename of an SQL file. You can specify multiple SQL files, and they will all be concatenated into one CSV file.

Is SQL faster than Python pandas?

SQL is more efficient in querying data but it has less functions whereas in pandas, there might be lag for large volumes of data but it has more functions which enable us to manipulate data in an effective way.


1 Answers

You can read the SQL database in batches and write them to file instead of reading the whole database at once. Credit to How to add pandas data to an existing csv file? for how to add to an existing CSV file.

import sqlite3
import pandas as pd

# Open the file
f = open('output.csv', 'w')
# Create a connection and get a cursor
connection = sqlite3.connect('mydata.sql')
cursor = connection.cursor()
# Execute the query
cursor.execute('select * from mydata')
# Get data in batches
while True:
    # Read the data
    df = pd.DataFrame(cursor.fetchmany(1000))
    # We are done if there are no data
    if len(df) == 0:
        break
    # Let's write to the file
    else:
        df.to_csv(f, header=False)

# Clean up
f.close()
cursor.close()
connection.close()
like image 116
Till Hoffmann Avatar answered Sep 29 '22 12:09

Till Hoffmann