Special characters (e.g. äöüß) in text fields are not bring imported correctly. When a user is configuring up a new Import Job or Data Link, Datameer displays these special characters correctly during the Data Preview step.
There are no problems when the Import Job/Data Link is executed with Local Execution. However, when the Import Job/Data Link is executed on the cluster and adds data into a Workbook, these characters are not properly displayed anymore. The characters are shown as �� - characters falling outside of the specified character range for the configured character encoding.
The character encoding specified on the Java configuration for the Hadoop services running on the cluster's Data Nodes is incorrect.
- Open the
/<Datameer installation folder>/etc/das.env.shfile and check the configured character encoding specified by the
-Dfile.encodingjava option. The default value is
- Add this value to the existing
tez.task.launch.cmd-optproperty and introduce it to the affected Import Job/Data Link.
tez.task.launch.cmd-opts=<current value> -Dfile.encoding=utf-8
tez.task.launch.cmd-opts=-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC -Dfile.encoding=utf-8
- Rerun the job.