Problem
Those that are using Hive tables with Parquet storage notice job errors during processing when the Hive schema changes.
Cause
There is a known issue that a scheme change in Hive with Parquet storage can not be supported.
Solution
Currently, when the schema of a Hive table changes when using Parquet as storage, the user will need to recreate any Data Link or Import Job associated with it that table.
Datameer recommended best practices:
To prevent data copying and to limit the amount of data that needs to be migrated when the schema of those tables changes use Data Links instead of import jobs to minimize the chance of a small change being disruptive.
Comments
0 comments
Please sign in to leave a comment.