Data parsing error
Need help. The Hive table structure is changed in dropping a variable. because of this the import job defined with the earlier structure is showing up a data parsing error and hence the workbooks associated with that job is failing. Rescanning the data helps to parse but the issue here I need to change datatypes for nearly 300 variables from string to Float which consumes hours. Please let me know how I can use the new structure without having to make changes of the previous defintions.
The Hive table structure is changed in dropping a variable.
Initially I understood this as the Hive table has dropped a column. But the quote below makes me think that I'm missing something.
Rescanning the data helps to parse but the issue here I need to change datatypes for nearly 300 variables from string to Float which consumes hours.
Did the number of columns change or did they update the data types for the existing columns?
Please let me know how I can use the new structure without having to make changes of the previous defintions.
In most cases when the source data is altered, manual intervention must be made on the import side of Datameer. Datameer will have no means of understanding the changes without rescanning the schema and refreshing the column count and data types.
If the workload to clean up the downstream implications of altering the Hive table are great, would it be possible for your team to simply revert the changes made to the table?
In the case of removing a column you could add a dummy column with the same name and fill it with 'null' values. This would keep the downstream schema intact and allow the existing jobs in Datameer to continue as normal.
In the case of reworking the data types, you may need to pick a location to update and begin the process. For example, if the import of this data has all of the types as STRING, the import should continue to work as you would expect and you can then adjust the data types within the Workbook. If the import is failing, you will have to adjust the process at that point. But keep in mind, if you have functions within the downstream Workbook that are depending upon a certain data type to be in a particular column, those functions will begin to fail once the new schema has been read.
It's very important to attempt to keep a consistent schema for long running data sources. When changes are made they can have a large impact on the downstream consumers of the data.
Please sign in to leave a comment.