I'm using pandas.read_sql() command to get data from my postgresql database.
The SQL query is created generically with many columns from which I only want to get specific columns using one column as index.
Creating an example table test_table like this:
column1 column2 column3
1       2       3
2       4       6
3       6       9
I tried to use the index_col and columns parameter from pandas.read_sql() to get column1 as index and column2 as data (and neglecting column3!). But it always returns the whole table. Also when writing columns=['column1', 'column2'] nothing changes...
I'm using python 2.7.6 with pandas 0.17.1 - Thanks for help!
Example Code:
import pandas
import psycopg2
import sqlalchemy
def connect():
    connString = (
        "dbname=test_db "
        "host=localhost "
        "port=5432 "
        "user=postgres "
        "password=password"
    )
    return psycopg2.connect(connString)
engine = sqlalchemy.create_engine(
            'postgresql://',
            creator=connect)
sql = (
    'SELECT '
    'column1, '
    'column2, '
    'column3 '
    'FROM test_table'
)
data = pandas.read_sql(
    sql,
    engine,
    index_col=['column1'],
    columns=['column2'])
print(data)
                read_sql. Read SQL query or database table into a DataFrame. This function is a convenience wrapper around read_sql_table and read_sql_query (for backward compatibility).
pandas read_sql() function is used to read SQL query or database table into DataFrame. This is a wrapper on read_sql_query() and read_sql_table() functions, based on the input it calls these function internally and returns SQL table as a two-dimensional data structure with labeled axes.
To read sql table into a DataFrame using only the table name, without executing any query we use read_sql_table() method in Pandas. This function does not support DBAPI connections.
I think the argument columns did not work for you because you were using sql statement instead of providing it with your table name.
As mentioned from pandas website:
columns : list, default: None List of column names to select from sql table (only used when reading a table).
Therefore, I think if you try:
pandas.read_sql('test_table', engine, index_col=['column1'], columns=['column2'])
columns argument will actually work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With