Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Syntax: Subprocess Call PostgreSQL Query, "Error: Only ASCII Characters Allowed"

I'm working with the following code in python, calling a PostgreSQL query with subprocess:

import subprocess
claimer_name = 'a_name'
startdate = '2014-04-01'
enddate = '2018-04-01' 

data = subprocess.check_output(['/usr/bin/psql -U user_name "SELECT c.asset_id, c.video_id,
c.claim_id, c.claim_date FROM db.claim c JOIN db.claim_history h ON c.claim_id = h.claim_id JOIN
db.users_email e ON LOWER(e.email) = LOWER(h.email) JOIN m.auth_user u ON e.user_id = u.id WHERE
h.list_order = 1 AND c.claim_origin = ‘Descriptive Search’ AND c.claim_date >= \"%s\" AND    
c.claim_date < \"%s\" AND concat(u.first_name, concat(chr(32),
u.last_name)) = \"%s\""' % (startdate, enddate, claimer_name)], shell=True)

How can I escape the single quotes around 'Descriptive Search'? Running this code as-is gives the error Only ASCII characters are allowed in an identifier.

I have tried:

  1. [''Descriptive Search'']
  2. [\'Descriptive Search\']
  3. [""Descriptive Search""]
  4. [concat('Descriptive', concat(chr(32), 'Search'))]

and assigning a variable: i = 'Descriptive Search', and then c.claim_origin = \"%s\".

However, these attempts yield the same ASCII characters error. Using string-formatting works fine for my other variables (startdate, enddate, claimer_name) and I'm stumped as to why it doesn't work for the string 'Descriptive Search'.

Using PostgreSQL 9.3.

Any help or points in the right direction would be great; thanks!

like image 901
Daniel Avatar asked Sep 16 '14 01:09

Daniel


1 Answers

There are so many things wrong with this.

  • You should be using psycopg2 rather than trying to shell out to psql to talk to the database;

  • Because you're not using a proper database binding you can't use placement parameters (prepared statements) properly, so you have to handle escaping in literals yourself to avoid SQL injection risks and quoting bugs;

  • When invoking commands via subprocess, avoid using the shell if at all possible. It's another point of possible failure, and completely unnecessary in this case;

  • Long strings should generally be """ quoted in Python to avoid the need to escape nested "s;

  • The expression concat(u.first_name, concat(chr(32), u.last_name)) is needlessly contorted. Just write u.first_name || ' ' || u.last_name or format('%s %s', u.first_name, u.last_name);

  • You're using "double quotes" to quote literals you substitute in, which is invalid SQL. They'll get treated as identifiers, per the documentation. So c.claim_date < \"%s\" will fail with an error like no column "2014-04-01";

  • You're using real single quotes, not apostrophes, when quoting ‘Descriptive Search‘. At a guess you've edited the code in a word processor, not a programmer's text editor. You want apostrophes, 'Descriptive Search', when quoting literals in SQL.

Because you used single quote characters (U+2018) instead of apostrophes (U+0027) to quote the literal string Descriptive Search, PostgreSQL didn't recognise it as a literal and tried to parse it as an identifier. However, isn't a legal character in an unquoted identifier, so it reported the error you show.

See the documentation on identifiers and literals.

Here's what you should have done:

import psycopg2
import datetime
claimer_name = 'a_name'
startdate = datetime.date(2014, 1, 1)
enddate = datetime.date(2018, 1, 1)

conn = psycopg2.connect("user=user_name")
curs = conn.cursor()
curs.execute("""
    SELECT 
        c.asset_id,
        c.video_id,
        c.claim_id,
        c.claim_date
    FROM db.claim c 
         JOIN db.claim_history h ON c.claim_id = h.claim_id 
         JOIN db.users_email e ON LOWER(e.email) = LOWER(h.email) 
         JOIN m.auth_user u ON e.user_id = u.id 
    WHERE h.list_order = 1 
      AND c.claim_origin = 'Descriptive Search'
      AND c.claim_date >= %s 
      AND c.claim_date < %s
      AND u.first_name || ' ' || u.last_name = %s
    """, (startdate, enddate, claimer_name)
)
results = curs.fetchall()

Pay particular attention the fact that I did not use Python's % string-formatting operator above. The %s entries are placement parameters that're substituted properly by psycopg2; see passing parameters to SQL queries.

like image 146
Craig Ringer Avatar answered Oct 21 '22 20:10

Craig Ringer