Symptom
Special characters (e.g. äöüß) in text fields are not bring imported correctly. When a user is configuring up a new Import Job or Data Link, Datameer displays these special characters correctly during the Data Preview step.
There are no problems when the Import Job/Data Link is executed with Local Execution. However, when the Import Job/Data Link is executed on the cluster and adds data into a Workbook, these characters are not properly displayed anymore. The characters are shown as �� - characters falling outside of the specified character range for the configured character encoding.
Cause
The character encoding specified on the Java configuration for the Hadoop services running on the cluster's Data Nodes is incorrect.
Solution
- Open the
/<Datameer installation folder>/etc/das.env.sh
file and check the configured character encoding specified by the-Dfile.encoding
java option. The default value is-Dfile.encoding=utf-8
. - Add this value to the existing
tez.task.launch.cmd-opt
property and introduce it to the affected Import Job/Data Link.
tez.task.launch.cmd-opts=<current value> -Dfile.encoding=utf-8
For example:
tez.task.launch.cmd-opts=-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC -Dfile.encoding=utf-8
- Rerun the job.
Comments
0 comments
Please sign in to leave a comment.