I would like to convert a psycopg2 DictRow
query to a pandas dataframe, but pandas keeps complaining:
curs = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
curs.execute("SELECT * FROM mytable")
data = curs.fetchall()
print type(data)
print pd.DataFrame(list(data))
However, I always get an error although I specifically passed a list
???
<type 'list'>
TypeError: Expected list, got DictRow
The result is the same if I do pd.DataFrame(data)
Could someone please help me make this work?
It would also be nice if the column names of the dataframe worked (i.e. extract DictRow
and pass them to the dataframe).
Update:
Since I need to process the data, I would like to use the data from the psycopg2 query as is and not the pandas approach, e.g. read_sql_query
.
Hmm, I eventually found this hacky solution:
print pd.DataFrame([i.copy() for i in data])
The copy()
function of the DictRow
class will return an actual dictionary. With the list comprehension I create a list of (identical) dictionaries, that Pandas will happily accept.
I am still puzzled why list(data)
produced a TypeError
. Maybe someone can still enlighten me.
UPDATE: pandas.read_sql_query()
is a more elegant way to read a SQL query into a dataframe, without the need for psycopg2
. See the pandas docs.
I've been having the same issue. The easiest way I found was to convert the DictRow to a numpy array.
import numpy as np
curs = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
curs.execute("SELECT * FROM mytable")
data = curs.fetchall()
print type(data)
print pd.DataFrame(np.array(data))
If you want to get the column names, you can access them as the keys for each row of the DictRow
. However, converting to a numpy array doesn't preserve the order. So one (inelegant) way is as follows:
curs = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
curs.execute("SELECT * FROM mytable")
data = curs.fetchall()
print type(data)
colNames = data[0].keys()
print pd.DataFrame([[row[col] for col in colNames] for row in data], columns=colNames)
You indeed need to parse out the elements first. You might be confused because the whole result is a list of DictRow
elements, but element is not a list. Hence in pandas 0.22.0
even reading from_records
wont not work straight away.
This works fine with native types:
inp = [{'a': 1}, {'b': 2}, {'a': 1}, {'b': 2}, {'a': 1}, {'b': 2}]
>>> pd.DataFrame(inp)
a b
0 1.0 NaN
1 NaN 2.0
2 1.0 NaN
3 NaN 2.0
4 1.0 NaN
5 NaN 2.0
But printing the results of Psycopg2 query is probably the source of confusion (own data):
[[157, 158, 83, 1], [157, 159, 47, 1], [158, 157, 53, 1], [158, 159, 38, 1], [159, 157, 76, 1], [159, 158, 24, 1]] <class 'list'>
but in fact the first element [157, 158, 83, 1] <class 'psycopg2.extras.DictRow'>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With