I am trying to transfer data between two databases with similar structure of tables using NiFi. Example of data structure:
User: {varchar name, integer id}.
There are no "Maximum-value Columns" so it is impossible to determine if there is new data or not. So each time I create "snapshot" of the full table content. The problem is that it is unclear either particular record should be inserted or updated in the target database.
I created two branches of processors: with inserts and with updates. Only insert works for new records and only update for existing. But (!) PutSQL processor works with bunch of flow files. For example batch size is 100 and processors work once a day. Assume there was 98 records yesterday. They will be inserted. Today there are 200 records (98 from yesterday and 102 new). In this flow if NiFi tries to update first 100 records and insert them then both actions will fail: first 98 records should be updated while last 2 should be inserted.
How to solve this issue? I know it is possible to use batch size 1 but it work too slow.
I recommend solving this in your SQL statements, since NiFi will not know the prior status of the records. A MERGE statement would be ideal, if your database supports it (Oracle, SQL Server, MySQL insert). Otherwise, you can craft both an INSERT and an UPDATE for each record in the source table, making them conditional on the user existing in the table.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With