I would like to load csv files into a database

Because of the power of SQLAlchemy, I'm also using it on a project. It's power comes from the object-oriented way of "talking" to a database instead of hardcoding SQL statements that can be a pain to manage. Not to mention, it's also a lot faster. To answer your question bluntly, yes! Storing data from a CSV into a database using SQLAlchemy is a piece of cake. Here's a full working example (I used SQLAlchemy 1.0.6 and Python 2.7.6): <pre class="prettyprint"><code>from numpy import genfromtxt from time import time from datetime import datetime from sqlalchemy import Column, Integer, Float, Date from sqlalchemy.ext.declarative import declarative_base from sqlalchemy import create_engine from sqlalchemy.orm import sessionmaker def Load_Data(file_name): data = genfromtxt(file_name, delimiter=',', skip_header=1, converters={0: lambda s: str(s)}) return data.tolist() Base = declarative_base() class Price_History(Base): #Tell SQLAlchemy what the table name is and if there's any table-specific arguments it should know about __tablename__ = 'Price_History' __table_args__ = {'sqlite_autoincrement': True} #tell SQLAlchemy the name of column and its attributes: id = Column(Integer, primary_key=True, nullable=False) date = Column(Date) opn = Column(Float) hi = Column(Float) lo = Column(Float) close = Column(Float) vol = Column(Float) if __name__ == "__main__": t = time() #Create the database engine = create_engine('sqlite:///csv_test.db') Base.metadata.create_all(engine) #Create the session session = sessionmaker() session.configure(bind=engine) s = session() try: file_name = "t.csv" #sample CSV file used: http://www.google.com/finance/historical?q=NYSE%3AT&ei=W4ikVam8LYWjmAGjhoHACw&output=csv data = Load_Data(file_name) for i in data: record = Price_History(**{ 'date' : datetime.strptime(i[0], '%d-%b-%y').date(), 'opn' : i[1], 'hi' : i[2], 'lo' : i[3], 'close' : i[4], 'vol' : i[5] }) s.add(record) #Add all the records s.commit() #Attempt to commit all the records except: s.rollback() #Rollback the changes on error finally: s.close() #Close the connection print "Time elapsed: " + str(time() - t) + " s." #0.091s </code></pre> (Note: this is not necessarily the "best" way to do this, but I think this format is very readable for a beginner; it's also very fast: 0.091s for 251 records inserted!) I think if you go through it line by line, you'll see what a breeze it is to use. Notice the lack of SQL statements -- hooray! I also took the liberty of using numpy to load the CSV contents in two lines, but it can be done without it if you like. If you wanted to compare against the traditional way of doing it, here's a full-working example for reference: <pre class="prettyprint"><code>import sqlite3 import time from numpy import genfromtxt def dict_factory(cursor, row): d = {} for idx, col in enumerate(cursor.description): d[col[0]] = row[idx] return d def Create_DB(db): #Create DB and format it as needed with sqlite3.connect(db) as conn: conn.row_factory = dict_factory conn.text_factory = str cursor = conn.cursor() cursor.execute("CREATE TABLE [Price_History] ([id] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL UNIQUE, [date] DATE, [opn] FLOAT, [hi] FLOAT, [lo] FLOAT, [close] FLOAT, [vol] INTEGER);") def Add_Record(db, data): #Insert record into table with sqlite3.connect(db) as conn: conn.row_factory = dict_factory conn.text_factory = str cursor = conn.cursor() cursor.execute("INSERT INTO Price_History({cols}) VALUES({vals});".format(cols = str(data.keys()).strip('[]'), vals=str([data[i] for i in data]).strip('[]') )) def Load_Data(file_name): data = genfromtxt(file_name, delimiter=',', skiprows=1, converters={0: lambda s: str(s)}) return data.tolist() if __name__ == "__main__": t = time.time() db = 'csv_test_sql.db' #Database filename file_name = "t.csv" #sample CSV file used: http://www.google.com/finance/historical?q=NYSE%3AT&ei=W4ikVam8LYWjmAGjhoHACw&output=csv data = Load_Data(file_name) #Get data from CSV Create_DB(db) #Create DB #For every record, format and insert to table for i in data: record = { 'date' : i[0], 'opn' : i[1], 'hi' : i[2], 'lo' : i[3], 'close' : i[4], 'vol' : i[5] } Add_Record(db, record) print "Time elapsed: " + str(time.time() - t) + " s." #3.604s </code></pre> (Note: even in the "old" way, this is by no means the best way to do this, but it's very readable and a "1-to-1" translation from the SQLAlchemy way vs. the "old" way.) Notice the the SQL statements: one to create the table, the other to insert records. Also, notice that it's a bit more cumbersome to maintain long SQL strings vs. a simple class attribute addition. Liking SQLAlchemy so far? As for your foreign key inquiry, of course. SQLAlchemy has the power to do this too. Here's an example of how a class attribute would look like with a foreign key assignment (assuming the <code>ForeignKey</code> class has also been imported from the <code>sqlalchemy</code> module): <pre class="prettyprint"><code>class Asset_Analysis(Base): #Tell SQLAlchemy what the table name is and if there's any table-specific arguments it should know about __tablename__ = 'Asset_Analysis' __table_args__ = {'sqlite_autoincrement': True} #tell SQLAlchemy the name of column and its attributes: id = Column(Integer, primary_key=True, nullable=False) fid = Column(Integer, ForeignKey('Price_History.id')) </code></pre> which points the "fid" column as a foreign key to Price_History's id column. Hope that helps!

using sqlalchemy to load csv file into a database

1 Answers

Because of the power of SQLAlchemy, I'm also using it on a project. It's power comes from the object-oriented way of "talking" to a database instead of hardcoding SQL statements that can be a pain to manage. Not to mention, it's also a lot faster.

To answer your question bluntly, yes! Storing data from a CSV into a database using SQLAlchemy is a piece of cake. Here's a full working example (I used SQLAlchemy 1.0.6 and Python 2.7.6):

from numpy import genfromtxt from time import time from datetime import datetime from sqlalchemy import Column, Integer, Float, Date from sqlalchemy.ext.declarative import declarative_base from sqlalchemy import create_engine from sqlalchemy.orm import sessionmaker  def Load_Data(file_name):     data = genfromtxt(file_name, delimiter=',', skip_header=1, converters={0: lambda s: str(s)})     return data.tolist()  Base = declarative_base()  class Price_History(Base):     #Tell SQLAlchemy what the table name is and if there's any table-specific arguments it should know about     __tablename__ = 'Price_History'     __table_args__ = {'sqlite_autoincrement': True}     #tell SQLAlchemy the name of column and its attributes:     id = Column(Integer, primary_key=True, nullable=False)      date = Column(Date)     opn = Column(Float)     hi = Column(Float)     lo = Column(Float)     close = Column(Float)     vol = Column(Float)  if __name__ == "__main__":     t = time()      #Create the database     engine = create_engine('sqlite:///csv_test.db')     Base.metadata.create_all(engine)      #Create the session     session = sessionmaker()     session.configure(bind=engine)     s = session()      try:         file_name = "t.csv" #sample CSV file used:  http://www.google.com/finance/historical?q=NYSE%3AT&ei=W4ikVam8LYWjmAGjhoHACw&output=csv         data = Load_Data(file_name)           for i in data:             record = Price_History(**{                 'date' : datetime.strptime(i[0], '%d-%b-%y').date(),                 'opn' : i[1],                 'hi' : i[2],                 'lo' : i[3],                 'close' : i[4],                 'vol' : i[5]             })             s.add(record) #Add all the records          s.commit() #Attempt to commit all the records     except:         s.rollback() #Rollback the changes on error     finally:         s.close() #Close the connection     print "Time elapsed: " + str(time() - t) + " s." #0.091s

(Note: this is not necessarily the "best" way to do this, but I think this format is very readable for a beginner; it's also very fast: 0.091s for 251 records inserted!)

I think if you go through it line by line, you'll see what a breeze it is to use. Notice the lack of SQL statements -- hooray! I also took the liberty of using numpy to load the CSV contents in two lines, but it can be done without it if you like.

If you wanted to compare against the traditional way of doing it, here's a full-working example for reference:

import sqlite3 import time from numpy import genfromtxt  def dict_factory(cursor, row):     d = {}     for idx, col in enumerate(cursor.description):         d[col[0]] = row[idx]     return d   def Create_DB(db):           #Create DB and format it as needed     with sqlite3.connect(db) as conn:         conn.row_factory = dict_factory         conn.text_factory = str          cursor = conn.cursor()          cursor.execute("CREATE TABLE [Price_History] ([id] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL UNIQUE, [date] DATE, [opn] FLOAT, [hi] FLOAT, [lo] FLOAT, [close] FLOAT, [vol] INTEGER);")   def Add_Record(db, data):     #Insert record into table     with sqlite3.connect(db) as conn:         conn.row_factory = dict_factory         conn.text_factory = str          cursor = conn.cursor()          cursor.execute("INSERT INTO Price_History({cols}) VALUES({vals});".format(cols = str(data.keys()).strip('[]'),                      vals=str([data[i] for i in data]).strip('[]')                     ))   def Load_Data(file_name):     data = genfromtxt(file_name, delimiter=',', skiprows=1, converters={0: lambda s: str(s)})     return data.tolist()   if __name__ == "__main__":     t = time.time()       db = 'csv_test_sql.db' #Database filename      file_name = "t.csv" #sample CSV file used:  http://www.google.com/finance/historical?q=NYSE%3AT&ei=W4ikVam8LYWjmAGjhoHACw&output=csv      data = Load_Data(file_name) #Get data from CSV      Create_DB(db) #Create DB      #For every record, format and insert to table     for i in data:         record = {                 'date' : i[0],                 'opn' : i[1],                 'hi' : i[2],                 'lo' : i[3],                 'close' : i[4],                 'vol' : i[5]             }         Add_Record(db, record)      print "Time elapsed: " + str(time.time() - t) + " s." #3.604s

(Note: even in the "old" way, this is by no means the best way to do this, but it's very readable and a "1-to-1" translation from the SQLAlchemy way vs. the "old" way.)

Notice the the SQL statements: one to create the table, the other to insert records. Also, notice that it's a bit more cumbersome to maintain long SQL strings vs. a simple class attribute addition. Liking SQLAlchemy so far?

As for your foreign key inquiry, of course. SQLAlchemy has the power to do this too. Here's an example of how a class attribute would look like with a foreign key assignment (assuming the ForeignKey class has also been imported from the sqlalchemy module):

class Asset_Analysis(Base):     #Tell SQLAlchemy what the table name is and if there's any table-specific arguments it should know about     __tablename__ = 'Asset_Analysis'     __table_args__ = {'sqlite_autoincrement': True}     #tell SQLAlchemy the name of column and its attributes:     id = Column(Integer, primary_key=True, nullable=False)      fid = Column(Integer, ForeignKey('Price_History.id'))

which points the "fid" column as a foreign key to Price_History's id column.

Hope that helps!

195

answered Oct 08 '22 13:10

Manuel J. Diaz

Related questions
                            
                                How to fix " AttributeError at /api/doc 'AutoSchema' object has no attribute 'get_link' " error in Django
                            
                                Python Pandas iterate over rows and access column names
                            
                                How to use 'User' as foreign key in Django 1.5
                            
                                Not a Valid Choice for Dynamic Select Field WTFORMS
                            
                                Zero pad numpy array
                            
                                'if' statement in jinja2 template
                            
                                How to force migrations to a DB if some tables already exist in Django?
                            
                                Delete every non utf-8 symbols from string
                            
                                Jupyter (IPython) notebook not plotting
                            
                                Python: Get HTTP headers from urllib2.urlopen call?
                            
                                What is the best approach to change primary keys in an existing Django app?
                            
                                Removing u in list
                            
                                How to get the region of the current user from boto?
                            
                                supervisord for python 3?
                            
                                Java do nothing
                            
                                AttributeError: Can only use .dt accessor with datetimelike values
                            
                                Django - The included urlconf doesn't have any patterns in it
                            
                                Saving arrays as columns with np.savetxt
                            
                                Opencv polylines function in python throws exception
                            
                                ImportError: No module named _io in ubuntu 14.04

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

using sqlalchemy to load csv file into a database

Tags:

python

database

sqlalchemy

alex chan

People also ask

1 Answers

Manuel J. Diaz

Recent Activity

Donate For Us