Goal
A Tableau Export Job creates several intermediate files while running. How can the location of these files be specified by an Administrator?
Learn
Datameer -> Tableau export process phases.
- An Export Job starts.
- A single container cluster application starts to the read source data (parquet) from the Datameer private folder in HDFS.
- Target TDE file is created at the DataNode's tmp/ folder. The size of this file is ~30kb, as at this moment it is just metadata.
- The directory
<id>/dataengine/temp/
is created at the DataNode's tmp/ folder. - Intermediate TDE files created by Tabeau API started to be aggregated under this location.
- As soon as the Parquet -> TDE transformation is completed, all intermediate files from
<id>/dataengine/temp/
are being merged into a single file and written to the target TDE file created under step 3. - Clean up of the
<id>/dataengine/temp/
folder. - The target TDE file is being uploaded to Tableau via REST API.
- Clean up of the target TDE file.
Target TDE/Hyper file location
As of Datameer 7.4.x (DAP-37284), it is possible to choose a location where Datameer stores the Tableau export file during an Export Job's execution, using the custom property das.tde-export.temp-location
. It could be set globally at the Hadoop Cluster section of the Admin Tab or individually per Exprot Job.
Intermediate files location.
The Tableau API leverages the environmental variable TAB_SDK_TMPDIR
, which configures where the service directory /../id/dataengine/
is created. By default, this variable points to the /tmp
location of the DataNode's operating system.
This variable can be controlled by passing it with a value via Tez job configuration properties in the following way:
tez.am.launch.env=TAB_SDK_TMPDIR=/path/,LD_LIBRARY_PATH=/path
tez.task.launch.env=TAB_SDK_TMPDIR=/path/,LD_LIBRARY_PATH=/path
These Tez parameters can be set globally within the Hadoop Cluster section of the Admin Tab or individually per Exprot Job.
If it is necessary to store intermediate Tableau export data individually for every Datameer user, the path can leverage the variable das.job.execution.username
. This variable resolves the userid of the job owner.
das.tde-export.temp-location=/path/to/location/${das.job.execution.username}/
tez.am.launch.env=TAB_SDK_TMPDIR=/path/to/location/usercache/${das.job.execution.username}/,LD_LIBRARY_PATH=/path/
tez.task.launch.env=TAB_SDK_TMPDIR=/path/to/location/usercache/${das.job.execution.username}/,LD_LIBRARY_PATH=/usr/path/
Comments
0 comments
Please sign in to leave a comment.