Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Binding list to params in Pandas read_sql_query with other params

I've been trying to test various methods for making my code to run. To begin with, I have this list:

member_list = [111,222,333,444,555,...]

I tried to pass it into this query:

query = pd.read_sql_query(
"""
select member id
    ,yearmonth
from queried_table
where yearmonth between ? and ?
    and member_id in ?
""", db2conn, params = [201601, 201603, member_list])

However, I get an error that says:

'Invalid parameter type. param-index=2 param-type=list', 'HY105'

So I looked around and tried using formatted strings:

query = pd.read_sql_query(
"""
select member id
    ,yearmonth
from queried_table
where yearmonth between ? and ?
    and member_id in (%s)
""" % ','.join(['?']*len(member_list), db2conn, params = [201601, 201603, tuple(member_list)])

Now, I get the error:

'The SQL contains 18622 parameter markers, but 3 parameters were supplied', 'HY000'

because it's looking to fill in all the ? placeholders in the formatted string.

So, ultimately, is there a way to somehow evaluate the list and pass each individual element to bind to the ? or is there another method I could use to get this to work?

Btw, I'm using pyodbc as my connector.

Thanks in advance!

like image 489
Scratch'N'Purr Avatar asked Apr 25 '16 12:04

Scratch'N'Purr


2 Answers

Break this up into three parts to help isolate the problem and improve readability:

  1. Build the SQL string
  2. Set parameter values
  3. Execute pandas.read_sql_query

Build SQL

First ensure ? placeholders are being set correctly. Use str.format with str.join and len to dynamically fill in ?s based on member_list length. Below examples assume 3 member_list elements.

Example

member_list = (1,2,3)
sql = """select member_id, yearmonth
         from queried_table
         where yearmonth between {0} and {0}
         and member_id in ({1})"""
sql = sql.format('?', ','.join('?' * len(member_list)))
print(sql)

Returns

select member_id, yearmonth
from queried_table
where yearmonth between ? and ?
and member_id in (?,?,?)

Set Parameter Values

Now ensure parameter values are organized into a flat tuple

Example

# generator to flatten values of irregular nested sequences,
# modified from answers http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python
def flatten(l):
    for el in l:
        try:
            yield from flatten(el)
        except TypeError:
            yield el

params = tuple(flatten((201601, 201603, member_list)))
print(params)

Returns

(201601, 201603, 1, 2, 3)

Execute

Finally bring the sql and params values together in the read_sql_query call

query = pd.read_sql_query(sql, db2conn, params)
like image 52
Bryan Avatar answered Sep 20 '22 13:09

Bryan


WARNING! Although my proposed solution here works, it is prone to SQL injection attacks. Therefor, it should never be used directly in backend code! It is only safe for offline analysis.

If you're using python 3.6+ you could also use a formatted string litteral for your query (cf https://docs.python.org/3/whatsnew/3.6.html#whatsnew36-pep498)

start, end = 201601, 201603
selected_members = (111, 222, 333, 444, 555)  # requires to be a tuple

query = f"""
    SELECT member_id, yearmonth FROM queried_table
    WHERE yearmonth BETWEEN {start} AND {end}
      AND member_id IN {selected_members}
"""

df = pd.read_sql_query(query, db2conn)
like image 23
bluu Avatar answered Sep 21 '22 13:09

bluu