Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert psycopg2 DictRow query to Pandas dataframe

I would like to convert a psycopg2 DictRow query to a pandas dataframe, but pandas keeps complaining:

curs = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
curs.execute("SELECT * FROM mytable")
data = curs.fetchall()

print type(data)
print pd.DataFrame(list(data))

However, I always get an error although I specifically passed a list???

<type 'list'>
TypeError: Expected list, got DictRow

The result is the same if I do pd.DataFrame(data) Could someone please help me make this work?

It would also be nice if the column names of the dataframe worked (i.e. extract DictRow and pass them to the dataframe).

Update:
Since I need to process the data, I would like to use the data from the psycopg2 query as is and not the pandas approach, e.g. read_sql_query.

like image 304
n1000 Avatar asked Feb 24 '16 13:02

n1000


3 Answers

Hmm, I eventually found this hacky solution:

print pd.DataFrame([i.copy() for i in data])

The copy() function of the DictRow class will return an actual dictionary. With the list comprehension I create a list of (identical) dictionaries, that Pandas will happily accept.

I am still puzzled why list(data) produced a TypeError. Maybe someone can still enlighten me.

like image 101
n1000 Avatar answered Nov 05 '22 16:11

n1000


UPDATE: pandas.read_sql_query() is a more elegant way to read a SQL query into a dataframe, without the need for psycopg2. See the pandas docs.

I've been having the same issue. The easiest way I found was to convert the DictRow to a numpy array.

import numpy as np
curs = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
curs.execute("SELECT * FROM mytable")
data = curs.fetchall()

print type(data)
print pd.DataFrame(np.array(data))

If you want to get the column names, you can access them as the keys for each row of the DictRow. However, converting to a numpy array doesn't preserve the order. So one (inelegant) way is as follows:

curs = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
curs.execute("SELECT * FROM mytable")
data = curs.fetchall()

print type(data)
colNames = data[0].keys()
print pd.DataFrame([[row[col] for col in colNames] for row in data], columns=colNames)
like image 7
amball Avatar answered Nov 05 '22 15:11

amball


You indeed need to parse out the elements first. You might be confused because the whole result is a list of DictRow elements, but element is not a list. Hence in pandas 0.22.0 even reading from_records wont not work straight away.

This works fine with native types:

inp = [{'a': 1}, {'b': 2}, {'a': 1}, {'b': 2}, {'a': 1}, {'b': 2}]
>>> pd.DataFrame(inp)
     a    b
0  1.0  NaN
1  NaN  2.0
2  1.0  NaN
3  NaN  2.0
4  1.0  NaN
5  NaN  2.0

But printing the results of Psycopg2 query is probably the source of confusion (own data):

[[157, 158, 83, 1], [157, 159, 47, 1], [158, 157, 53, 1], [158, 159, 38, 1], [159, 157, 76, 1], [159, 158, 24, 1]] <class 'list'> but in fact the first element [157, 158, 83, 1] <class 'psycopg2.extras.DictRow'>

like image 2
Jan Sila Avatar answered Nov 05 '22 17:11

Jan Sila