Data parsing error

Comments

1 comment

  • Official comment
    Brian Junio

    Ganesh,

    Good morning!

    The Hive table structure is changed in dropping a variable.

    Initially I understood this as the Hive table has dropped a column.  But the quote below makes me think that I'm missing something.

    Rescanning the data helps to parse but the issue here I need to change datatypes for nearly 300  variables from string to Float which consumes hours.

    Did the number of columns change or did they update the data types for the existing columns?

    Please let me know how I can use the new structure without having to make changes of the previous defintions.

    In most cases when the source data is altered, manual intervention must be made on the import side of Datameer.  Datameer will have no means of understanding the changes without rescanning the schema and refreshing the column count and data types.  

    If the workload to clean up the downstream implications of altering the Hive table are great, would it be possible for your team to simply revert the changes made to the table? 

    In the case of removing a column you could add a dummy column with the same name and fill it with 'null' values.  This would keep the downstream schema intact and allow the existing jobs in Datameer to continue as normal.  

    In the case of reworking the data types, you may need to pick a location to update and begin the process.  For example, if the import of this data has all of the types as STRING, the import should continue to work as you would expect and you can then adjust the data types within the Workbook.  If the import is failing, you will have to adjust the process at that point.  But keep in mind, if you have functions within the downstream Workbook that are depending upon a certain data type to be in a particular column, those functions will begin to fail once the new schema has been read.

    It's very important to attempt to keep a consistent schema for long running data sources.  When changes are made they can have a large impact on the downstream consumers of the data.

    Cheers,

    Brian

    Comment actions Permalink

Please sign in to leave a comment.