I work in a place where we use JOOQ for sql query generation in some part of the backend code. Lots of code has been written to work with it. On my side of things, I would like to map theses features into spark and especially generate queries in Spark SQL over dataframes loaded from a bunch of parquet files.
Is there any tooling to generate DSL classes from parquet (or spark) schema? I could not find any. Other approaches has been successful on this matter?
Ideally, I would like to generate tables and fields dynamically from possibly evolving schema.
I know this is a broad question and I will close it if it is deemed out of scope for SO.
jOOQ doesn't officially support Spark, but you have a variety of options to reverse engineer any schema metadata that you have in your Spark database:
JDBCDatabase
Like any other jooq-meta Database
implementation, you can use the JDBCDatabase
that reverse engineers anything it can find through the JDBC DatabaseMetaData
API, if your JDBC driver supports that.
As of jOOQ version 3.10, there are three different types of "offline" meta data sources that you can use to generate data:
XMLDatabase
will generate code from an XML file.JPADatabase
will generate code from JPA-annotated entities.DDLDatabase
will parse DDL file(s) and reverse engineer its output (this probably won't work well for Spark, as its syntax is not officially supported)Of course, you don't have to generate any code. You can get meta data information directly from your JDBC driver (again through the DatabaseMetaData
API), which is abstracted through DSLContext.meta()
, or you supply the schema again dynamically to jOOQ using XML content through DSLContext.meta(InformationSchema)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With