Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL statement for CSV files on IPython notebook

I have a tabledata.csv file and I have been using pandas.read_csv to read or choose specific columns with specific conditions.

For instance I use the following code to select all "name" where session_id =1, which is working fine on IPython Notebook on datascientistworkbench.

             df = pandas.read_csv('/resources/data/findhelp/tabledata.csv')
             df['name'][df['session_id']==1]

I just wonder after I have read the csv file, is it possible to somehow "switch/read" it as a sql database. (i am pretty sure that i did not explain it well using the correct terms, sorry about that!). But what I want is that I do want to use SQL statements on IPython notebook to choose specific rows with specific conditions. Like I could use something like:

Select `name`, count(distinct `session_id`) from tabledata where `session_id` like "100.1%" group by `session_id` order by `session_id`

But I guess I do need to figure out a way to change the csv file into another version so that I could use sql statement. Many thx!

like image 550
yingnan liu Avatar asked Apr 05 '16 15:04

yingnan liu


People also ask

Can you SQL query a CSV file?

querycsv.py is a Python module and program that allows you to execute SQL code against data contained in one or more comma-separated-value (CSV) files. The output of the SQL query will be displayed on the console by default, but may be saved in a new CSV file.


1 Answers

Here is a quick primer on pandas and sql, using the builtin sqlite3 package. Generally speaking you can do all SQL operations in pandas in one way or another. But databases are of course useful. The first thing you need to do is store the original df in a sql database so that you can query it. Steps listed below.

import pandas as pd
import sqlite3

#read the CSV
df = pd.read_csv('/resources/data/findhelp/tabledata.csv')
#connect to a database
conn = sqlite3.connect("Any_Database_Name.db") #if the db does not exist, this creates a Any_Database_Name.db file in the current directory
#store your table in the database:
df.to_sql('Some_Table_Name', conn)
#read a SQL Query out of your database and into a pandas dataframe
sql_string = 'SELECT * FROM Some_Table_Name'
df = pd.read_sql(sql_string, conn)
like image 99
Sam Avatar answered Nov 04 '22 00:11

Sam