I'm using a query in SQL Server that requires a range to check whether a number is in that range (e.g. in the below to check if DemographicGroupDimID
is either (1,2 or 3) . After doing some googling the only way I found to be able to do that was the below:
DECLARE @adults table (Id int)
INSERT INTO @adults VALUES (1), (2), (3)
SELECT [date], [station], [impression] = SUM([impressions]) / COUNT(DISTINCT [datetime] )
FROM
(SELECT [datetime] = DATEADD(minute,td.Minute,DATEADD(hour,td.NielsenLocalHour,CONVERT(smalldatetime, ddt.DateKey))), [date] = ddt.DateKey, [station] = nd.Name, [impressions] = SUM(naf.Impression)
FROM [Nielsen].[dbo].[NielsenAnalyticsFact] as naf
LEFT JOIN [dbo].[DateDim] AS ddt
ON naf.StartDateDimID = ddt.DateDimID
LEFT JOIN [dbo].NetworkDim as nd
ON naf.NetworkDimID = nd.NetworkDimID
LEFT JOIN [dbo].TimeDim as td
ON naf.QuarterHourDimID = td.TimeDimID
WHERE (naf.NielsenMarketDimID = 1
AND naf.RecordTypeDimID = 2
AND naf.AudienceEstimateTypeDimID = 1
AND naf.DailyOrWeeklyDimID = 1
AND naf.RecordSequenceCodeDimID = 5
AND naf.ViewingTypeDimID = 4
AND naf.QuarterHourDimID IS NOT NULL
AND naf.DemographicGroupDimID < 31
AND nd.Affiliation = 'Cable'
AND naf.NetworkDimID != 1278
AND naf.DemographicGroupDimID in (SELECT Id FROM @adults))
GROUP BY DATEADD(minute,td.Minute,DATEADD(hour,td.NielsenLocalHour,CONVERT(smalldatetime, ddt.DateKey))), nd.Name, ddt.DateKey)
AS grouped_table
GROUP BY [date], [station]
ORDER BY [date], [station]
If I need to dynamically do this, with different ranges, this fails, like so:
from queries import DB_CREDENTIALS
import pyodbc
import pandas as pd
sql_ = """DECLARE @adults table (Id int)
INSERT INTO @adults VALUES ?
SELECT [date], [station], [impression] = SUM([impressions]) / COUNT(DISTINCT [datetime] )
FROM
(SELECT [datetime] = DATEADD(minute,td.Minute,DATEADD(hour,td.NielsenLocalHour,CONVERT(smalldatetime, ddt.DateKey))), [date] = ddt.DateKey, [station] = nd.Name, [impressions] = SUM(naf.Impression)
FROM [Nielsen].[dbo].[NielsenAnalyticsFact] as naf
LEFT JOIN [dbo].[DateDim] AS ddt
ON naf.StartDateDimID = ddt.DateDimID
LEFT JOIN [dbo].NetworkDim as nd
ON naf.NetworkDimID = nd.NetworkDimID
LEFT JOIN [dbo].TimeDim as td
ON naf.QuarterHourDimID = td.TimeDimID
WHERE (naf.NielsenMarketDimID = 1
AND naf.RecordTypeDimID = 2
AND naf.AudienceEstimateTypeDimID = 1
AND naf.DailyOrWeeklyDimID = 1
AND naf.RecordSequenceCodeDimID = 5
AND naf.ViewingTypeDimID = 4
AND naf.QuarterHourDimID IS NOT NULL
AND naf.DemographicGroupDimID < 31
AND nd.Affiliation = 'Cable'
AND naf.NetworkDimID != 1278
AND naf.DemographicGroupDimID in (SELECT Id FROM @adults))
GROUP BY DATEADD(minute,td.Minute,DATEADD(hour,td.NielsenLocalHour,CONVERT(smalldatetime, ddt.DateKey))), nd.Name, ddt.DateKey)
AS grouped_table
GROUP BY [date], [station]
ORDER BY [date], [station]"""
with pyodbc.connect(DB_CREDENTIALS) as cnxn:
df = pd.read_sql(sql=sql_, con=cnxn, params=['(30)'])
---------------------------------------------------------------------------
DatabaseError Traceback (most recent call last)
<ipython-input-5-4b63847d007f> in <module>()
1 with pyodbc.connect(DB_CREDENTIALS) as cnxn:
----> 2 df = pd.read_sql(sql=sql_, con=cnxn, params=['(30)'])
C:\Users\mburke\AppData\Local\Continuum\Anaconda64\lib\site-packages\pandas\io\sql.pyc in read_sql(sql, con, index_col, coerce_float, params, parse_dates, columns, chunksize)
497 sql, index_col=index_col, params=params,
498 coerce_float=coerce_float, parse_dates=parse_dates,
--> 499 chunksize=chunksize)
500
501 try:
C:\Users\mburke\AppData\Local\Continuum\Anaconda64\lib\site-packages\pandas\io\sql.pyc in read_query(self, sql, index_col, coerce_float, params, parse_dates, chunksize)
1593
1594 args = _convert_params(sql, params)
-> 1595 cursor = self.execute(*args)
1596 columns = [col_desc[0] for col_desc in cursor.description]
1597
C:\Users\mburke\AppData\Local\Continuum\Anaconda64\lib\site-packages\pandas\io\sql.pyc in execute(self, *args, **kwargs)
1570 ex = DatabaseError(
1571 "Execution failed on sql '%s': %s" % (args[0], exc))
-> 1572 raise_with_traceback(ex)
1573
1574 @staticmethod
C:\Users\mburke\AppData\Local\Continuum\Anaconda64\lib\site-packages\pandas\io\sql.pyc in execute(self, *args, **kwargs)
1558 cur.execute(*args, **kwargs)
1559 else:
-> 1560 cur.execute(*args)
1561 return cur
1562 except Exception as exc:
DatabaseError: Execution failed on sql 'DECLARE @adults table (Id int)
INSERT INTO @adults VALUES ?
SELECT [date], [station], [impression] = SUM([impressions]) / COUNT(DISTINCT [datetime] )
FROM
(SELECT [datetime] = DATEADD(minute,td.Minute,DATEADD(hour,td.NielsenLocalHour,CONVERT(smalldatetime, ddt.DateKey))), [date] = ddt.DateKey, [station] = nd.Name, [impressions] = SUM(naf.Impression)
FROM [Nielsen].[dbo].[NielsenAnalyticsFact] as naf
LEFT JOIN [dbo].[DateDim] AS ddt
ON naf.StartDateDimID = ddt.DateDimID
LEFT JOIN [dbo].NetworkDim as nd
ON naf.NetworkDimID = nd.NetworkDimID
LEFT JOIN [dbo].TimeDim as td
ON naf.QuarterHourDimID = td.TimeDimID
WHERE (naf.NielsenMarketDimID = 1
AND naf.RecordTypeDimID = 2
AND naf.AudienceEstimateTypeDimID = 1
AND naf.DailyOrWeeklyDimID = 1
AND naf.RecordSequenceCodeDimID = 5
AND naf.ViewingTypeDimID = 4
AND naf.QuarterHourDimID IS NOT NULL
AND naf.DemographicGroupDimID < 31
AND nd.Affiliation = 'Cable'
AND naf.NetworkDimID != 1278
AND naf.DemographicGroupDimID in (SELECT Id FROM @adults))
GROUP BY DATEADD(minute,td.Minute,DATEADD(hour,td.NielsenLocalHour,CONVERT(smalldatetime, ddt.DateKey))), nd.Name, ddt.DateKey)
AS grouped_table
GROUP BY [date], [station]
ORDER BY [date], [station]': ('42000', "[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]Incorrect syntax near '@P1'. (102) (SQLExecDirectW); [42000] [Microsoft][ODBC SQL Server Driver][SQL Server]Statement(s) could not be prepared. (8180)")
Is this because the declare statement needs to be within the bounds of the select statement itself? I'm not sure how pandas
handles the pyodbc
cursor object so I'm unsure where this error stems from.
Edit: Just to note, the param I passed in this instance was (30)
just to use the simple case of when there is only one number in the range which fails. It of course also fails for more complex strings like (1), (2), (3)
as was the case with the example above.
If you use prepared statements in your SQL, you can't put multiple values for one placeholder/parameter/bind variable!
Beside this you can use placeholders/parameters/bind variables only in place of literals, you can't use it for part of SQL statement which is not a literal.
In your case you tried to put (
and )
which is part of SQL, but not a literal as parameters.
Using parameters/prepared statements/bind variable will also protect you from some SQL injections.
that said, try to change your code as follows:
change
INSERT INTO @adults VALUES ?
to
INSERT INTO @adults VALUES (?)
and
df = pd.read_sql(sql=sql_, con=cnxn, params=['(30)'])
to
df = pd.read_sql(sql=sql_, con=cnxn, params=['30'])
UPDATE:
you can prepare your SQL this way:
In [9]: vals = [20,30,40]
In [32]: vals
Out[32]: [20, 30, 40]
In [33]: ' (?)' * len(vals)
Out[33]: ' (?) (?) (?)'
then:
In [14]: sql_ = """DECLARE @adults table (Id int)
....: INSERT INTO @adults VALUES {}
....:
....: SELECT [date],
....: """
In [15]: sql_.format(' (?)' * len(vals))
Out[15]: 'DECLARE @adults table (Id int)\nINSERT INTO @adults VALUES (?) (?) (?)\n\nSELECT [date],\n'
Pay attention at generated (?) (?) (?)
and finally call your SQL:
df = pd.read_sql(sql=sql_.format(' (?)' * len(vals)), con=cnxn, params=vals)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With