Goal
During export into a Hive table, if a CRLF
in a string column is encountered, the rest of the records are jumbled. Each new line is treated as an individual record. Since these records don't match the target data structure (table), they are dropped from the export. Learn how to have all records exported into the Hive table.
Background
According HIVE-1898 and HIVE-11785, intermediate tables with LazySimpleSerDe
can't handle new lines (NL) and carriage returns (CR) in text.
Learn
You can use any of the following options based on your environment and needs:
- Clean up the input data (strings) before import.
- Clean up the data (strings) within Datameer. For example, remove the line breaks from text. This is a good step to prepare the data for further processing, analysis and text mining.
- Escape the characters.
- Encode the data differently. For example, using URL_ENCODE or ENC_BASE64. It can be converted back using URL_DECODE or DEC_BASE64 later.
Comments
0 comments
Please sign in to leave a comment.