Job Failure With Schema Changes In Hive When Using Parquet for Storage

July 08, 2016 10:45
Updated

Problem

Those that are using Hive tables with Parquet storage notice job errors during processing when the Hive schema changes.

Cause

There is a known issue that a scheme change in Hive with Parquet storage can not be supported.

Solution

Currently, when the schema of a Hive table changes when using Parquet as storage, the user will need to recreate any Data Link or Import Job associated with it that table.

Datameer recommended best practices:

To prevent data copying and to limit the amount of data that needs to be migrated when the schema of those tables changes use Data Links instead of import jobs to minimize the chance of a small change being disruptive.

Articles in this section

Job Failure With Schema Changes In Hive When Using Parquet for Storage

Comments

Articles in this section

Related articles