Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas read_sql columns not working when using index_col - returns all columns instead

I'm using pandas.read_sql() command to get data from my postgresql database. The SQL query is created generically with many columns from which I only want to get specific columns using one column as index. Creating an example table test_table like this:

column1 column2 column3
1       2       3
2       4       6
3       6       9

I tried to use the index_col and columns parameter from pandas.read_sql() to get column1 as index and column2 as data (and neglecting column3!). But it always returns the whole table. Also when writing columns=['column1', 'column2'] nothing changes...

I'm using python 2.7.6 with pandas 0.17.1 - Thanks for help!

Example Code:

import pandas
import psycopg2
import sqlalchemy


def connect():
    connString = (
        "dbname=test_db "
        "host=localhost "
        "port=5432 "
        "user=postgres "
        "password=password"
    )
    return psycopg2.connect(connString)

engine = sqlalchemy.create_engine(
            'postgresql://',
            creator=connect)
sql = (
    'SELECT '
    'column1, '
    'column2, '
    'column3 '
    'FROM test_table'
)
data = pandas.read_sql(
    sql,
    engine,
    index_col=['column1'],
    columns=['column2'])
print(data)
like image 924
Henhuy Avatar asked Mar 11 '16 10:03

Henhuy


People also ask

What does PD read_sql do?

read_sql. Read SQL query or database table into a DataFrame. This function is a convenience wrapper around read_sql_table and read_sql_query (for backward compatibility).

Can pandas read SQL table?

pandas read_sql() function is used to read SQL query or database table into DataFrame. This is a wrapper on read_sql_query() and read_sql_table() functions, based on the input it calls these function internally and returns SQL table as a two-dimensional data structure with labeled axes.

How do I read a SQL table into a DataFrame?

To read sql table into a DataFrame using only the table name, without executing any query we use read_sql_table() method in Pandas. This function does not support DBAPI connections.


1 Answers

I think the argument columns did not work for you because you were using sql statement instead of providing it with your table name.

As mentioned from pandas website:

columns : list, default: None List of column names to select from sql table (only used when reading a table).

Therefore, I think if you try:

pandas.read_sql('test_table', engine, index_col=['column1'], columns=['column2'])

columns argument will actually work.

like image 80
mdls Avatar answered Oct 17 '22 03:10

mdls