How to handle duplicates while exporting data to mysql sever. I am looking for UPSERT solution.



  • Official comment
    Joel Stewart

    Hi Amar, thanks for the question. There is not a specific way to activate the "upsert" behavior you have described in a single method, but there are some broader ways to accomplish this. 

    One method is to utilize Datameer's ability to purge the existing records and then re-insert all the rows directly. To perform this, simply select the "Overwrite Table" option instead of the "Append Records" option in the record replacement policy. This has the side effect of always exporting all the records, so the export itself may take longer to write more batches to the table at scale. If the size is relatively small, this will be negligible. 

    An alternative method is to have Datameer perform a right-outer-join on the current table state (through a Data Link) with the original workbook results (perform this in a downstream workbook, I'd recommend). The result of this right-outer-join is exactly the expected results from the "upsert" function. Then you could export this result with the "Append Records" option. This method does assume that the records are not changing in an outside system between the join and export steps. 

    I hope this helps! 

    Comment actions Permalink
  • Amarnath Reddy Athuri

    Thanks Joel for your quick reply. The first option to "Overwrite Table" will not help as I do a incremental load to mysql server, so I need to retain the records in mysql server and export the records which are not there already. 

    I thought "Drop Record" in error handling of export job should work as it ideally should drop only records that are rejected by mysql server due to primary key violation , but all the records of the export are being dropped whenever few duplicates are present in the export.




    Comment actions Permalink
  • Joel Stewart

    Thanks for the update, "Drop Record" for JDBC Export Jobs essentially becomes "Drop Batch" when the SQL server rejects it due to a missed constraint. For example, a primary key violation would cause the entire SQL batch to be rejected for any single mistake record violation. 

    You're correct that the "Overwrite Table" does not perform an incremental update, but if you have the full set of records in your Datameer Workbook's Sheet results, this would ensure that you have all the records written as you expect. If that's not sufficient, the right-outer-join method is the best consideration. 

    Comment actions Permalink

Please sign in to leave a comment.