Convert psycopg2 DictRow query to Pandas dataframe

Question

I would like to convert a psycopg2 DictRow query to a pandas dataframe, but pandas keeps complaining:

curs = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
curs.execute("SELECT * FROM mytable")
data = curs.fetchall()

print type(data)
print pd.DataFrame(list(data))

However, I always get an error although I specifically passed a list???

<type 'list'>
TypeError: Expected list, got DictRow

The result is the same if I do pd.DataFrame(data) Could someone please help me make this work?

It would also be nice if the column names of the dataframe worked (i.e. extract DictRow and pass them to the dataframe).

Update:
Since I need to process the data, I would like to use the data from the psycopg2 query as is and not the pandas approach, e.g. read_sql_query.

n1000 · Accepted Answer

Hmm, I eventually found this hacky solution:

print pd.DataFrame([i.copy() for i in data])

The copy() function of the DictRow class will return an actual dictionary. With the list comprehension I create a list of (identical) dictionaries, that Pandas will happily accept.

I am still puzzled why list(data) produced a TypeError. Maybe someone can still enlighten me.

amball · Answer

UPDATE: pandas.read_sql_query() is a more elegant way to read a SQL query into a dataframe, without the need for psycopg2. See the pandas docs.

I've been having the same issue. The easiest way I found was to convert the DictRow to a numpy array.

import numpy as np
curs = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
curs.execute("SELECT * FROM mytable")
data = curs.fetchall()

print type(data)
print pd.DataFrame(np.array(data))

If you want to get the column names, you can access them as the keys for each row of the DictRow. However, converting to a numpy array doesn't preserve the order. So one (inelegant) way is as follows:

curs = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
curs.execute("SELECT * FROM mytable")
data = curs.fetchall()

print type(data)
colNames = data[0].keys()
print pd.DataFrame([[row[col] for col in colNames] for row in data], columns=colNames)

Jan Sila · Answer

You indeed need to parse out the elements first. You might be confused because the whole result is a list of DictRow elements, but element is not a list. Hence in pandas 0.22.0 even reading from_records wont not work straight away.

This works fine with native types:

inp = [{'a': 1}, {'b': 2}, {'a': 1}, {'b': 2}, {'a': 1}, {'b': 2}]
>>> pd.DataFrame(inp)
     a    b
0  1.0  NaN
1  NaN  2.0
2  1.0  NaN
3  NaN  2.0
4  1.0  NaN
5  NaN  2.0

But printing the results of Psycopg2 query is probably the source of confusion (own data):

[[157, 158, 83, 1], [157, 159, 47, 1], [158, 157, 53, 1], [158, 159, 38, 1], [159, 157, 76, 1], [159, 158, 24, 1]] <class 'list'> but in fact the first element [157, 158, 83, 1] <class 'psycopg2.extras.DictRow'>

Convert psycopg2 DictRow query to Pandas dataframe

Tags:

python

pandas

psycopg2

n1000

3 Answers

n1000

amball

Jan Sila

Recent Activity

Donate For Us

Convert psycopg2 DictRow query to Pandas dataframe

Tags:

python

pandas

psycopg2

n1000

3 Answers

n1000

amball

Jan Sila

Related questions

Recent Activity

Donate For Us