Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JOOQ generator for Apache Spark parquet dataframes?

I work in a place where we use JOOQ for sql query generation in some part of the backend code. Lots of code has been written to work with it. On my side of things, I would like to map theses features into spark and especially generate queries in Spark SQL over dataframes loaded from a bunch of parquet files.

Is there any tooling to generate DSL classes from parquet (or spark) schema? I could not find any. Other approaches has been successful on this matter?

Ideally, I would like to generate tables and fields dynamically from possibly evolving schema.

I know this is a broad question and I will close it if it is deemed out of scope for SO.

like image 364
Michel Lemay Avatar asked Sep 01 '25 22:09

Michel Lemay


1 Answers

jOOQ doesn't officially support Spark, but you have a variety of options to reverse engineer any schema metadata that you have in your Spark database:

Using the JDBCDatabase

Like any other jooq-meta Database implementation, you can use the JDBCDatabase that reverse engineers anything it can find through the JDBC DatabaseMetaData API, if your JDBC driver supports that.

Using files as a meta data source

As of jOOQ version 3.10, there are three different types of "offline" meta data sources that you can use to generate data:

  • The XMLDatabase will generate code from an XML file.
  • The JPADatabase will generate code from JPA-annotated entities.
  • The DDLDatabase will parse DDL file(s) and reverse engineer its output (this probably won't work well for Spark, as its syntax is not officially supported)

Not using the code generator

Of course, you don't have to generate any code. You can get meta data information directly from your JDBC driver (again through the DatabaseMetaData API), which is abstracted through DSLContext.meta(), or you supply the schema again dynamically to jOOQ using XML content through DSLContext.meta(InformationSchema)

like image 83
Lukas Eder Avatar answered Sep 05 '25 09:09

Lukas Eder