Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python-mysqldb : How to efficiently get millions/billions of records from database?

  • I have a table from which I have to fetch around 7 million records, and this will go upto billion records too(since data is added everyday)
  • I am using mysql-python to connect to remote MySQL database

  • I query like the following

cursor = conn.cursor()
cursor.execute(query)
return cursor

and try to print them as

sql = 'select * from reading table;' # has 7 million records
cursor = MySQLDB.execute(sql)
for row in cursor:
        print row
  • It is taking forever to print it

On server, I see the process is running

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                                                                                                                                                     
 3769 mysql     20   0 1120m 276m 5856 S  125  1.7   2218:09 mysqld      

Question What is the efficient way of querying a table with {m,b}illions of records using python?

Thank you

like image 282
daydreamer Avatar asked Nov 27 '25 10:11

daydreamer


1 Answers

I would suggest two options:

  1. Direct the required data into a file with SELECT OUTFILE or even with a mysql console, and work with the file.

  2. You should understand that by default, mysql sends the whole resultset to the client, and the client mimicks as if the data is read row by row (though the whole result is already in memory, or failed if there is not enough memory). Alternatively, the resultset can be formed on the server-side. For that you will need to add cursor=MySQLdb.cursors.SSCursor parameter to MySQLdb.connect (See http://mysql-python.sourceforge.net/MySQLdb.html for details).

like image 176
newtover Avatar answered Nov 29 '25 00:11

newtover