Can you run a transactional data lake (Hudi, Delta Lake) with multiple EMR clusters

Question

I’m looking into several “transactional data lake” technologies such as Apache Hudi, Delta Lake, AWS Lake Formation Governed Tables.

Except for the latter, I can’t see how these would work in a multi cluster environment. I’m baselining against s3 for storage, and want to incrementally alter my data lake, where I may have many clusters all reading from and writing to the lake at any given time. Is this possible/supported? It seems like the compaction and transaction processes are on-cluster. And so you cannot manage a transactional data lake with these platforms from multiple disparate sources. Or am I mistaken?

Any anecdotes or performance limitations you’ve found would be appreciated!

Kyle Weller · Accepted Answer

You can enable a config for multiple writers on Apache Hudi and then use a lock provider as described here: https://hudi.apache.org/docs/concurrency_control#enabling-multi-writing

Example using an AWS DynamoDB lock provider:

hoodie.write.lock.provider=org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider
hoodie.write.lock.dynamodb.table
hoodie.write.lock.dynamodb.partition_key
hoodie.write.lock.dynamodb.region

Delta Lake has a warning in the documentation that multiple writers may result in data loss: https://docs.delta.io/latest/delta-storage.html#amazon-s3

Concurrent writes to the same Delta table from multiple Spark drivers can lead to data loss.

This is a blog you may find interesting that discusses common pitfalls in Lakehouse concurrency control.

Can you run a transactional data lake (Hudi, Delta Lake) with multiple EMR clusters

Tags:

amazon-web-services

amazon-emr

delta-lake

apache-hudi

zachd1_618

1 Answers

Kyle Weller

Recent Activity

Donate For Us

Can you run a transactional data lake (Hudi, Delta Lake) with multiple EMR clusters

Tags:

amazon-web-services

amazon-emr

delta-lake

apache-hudi

zachd1_618

1 Answers

Kyle Weller

Related questions

Recent Activity

Donate For Us