I'm working on a SQLAlchemy dialect for Apache Drill and I've run into an issue that I can't quite seem to figure out.
The basic problem is that SQLAlchemy is generating a query like the one below:
SELECT `field1`, `field2`
FROM dfs.test.data.csv LIMIT 100
which fails because data.csv
needs backticks around it as shown below:
SELECT `field1`, `field2`
FROM dfs.test.`data.csv` LIMIT 100
I've defined the various visit_()
functions in the dialect's compiler but these seem to have no effect.
This took some time to figure out, and I thought I'd post the result so that if anyone else runs into this issue, they'll have a point of reference as to how to solve it.
Here is the final working code:
https://github.com/JohnOmernik/sqlalchemy-drill/blob/master/sqlalchemy_drill/base.py
Here is what ultimately solved the issue:
def __init__(self, dialect):
super(DrillIdentifierPreparer, self).__init__(dialect, initial_quote='`', final_quote='`')
def format_drill_table(self, schema, isFile=True):
formatted_schema = ""
num_dots = schema.count(".")
schema = schema.replace('`', '')
# For a file, the last section will be the file extension
schema_parts = schema.split('.')
if isFile and num_dots == 3:
# Case for File + Workspace
plugin = schema_parts[0]
workspace = schema_parts[1]
table = schema_parts[2] + "." + schema_parts[3]
formatted_schema = plugin + ".`" + workspace + "`.`" + table + "`"
elif isFile and num_dots == 2:
# Case for file and no workspace
plugin = schema_parts[0]
formatted_schema = plugin + "." + schema_parts[1] + ".`" + schema_parts[2] + "`"
else:
# Case for non-file plugins or incomplete schema parts
for part in schema_parts:
quoted_part = "`" + part + "`"
if len(formatted_schema) > 0:
formatted_schema += "." + quoted_part
else:
formatted_schema = quoted_part
return formatted_schema
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With