Problem
When comparing execution times between artifacts being executed as Tez versus SmallJob, I am consistently seeing the Tez jobs completing in less time.
I'd like to force all of my SmallJob executions to Tez executions.
Cause
This discrepancy in execution time is expected. The key difference to consider is that a SmallJob execution will use limited cluster resources to execute while a Tez job will be leveraging as many resources as possible to provide rapid results.
If the job is okay to run a bit more slowly, SmallJob will use less resources but take longer to process.
If speed is the answer, Tez is the way to go.
Workaround
There is a threshold in which jobs will be executed in either SmallJob or Tez frameworks:
das-job.properties:das.sparksx.small.max-uncompressed-size=1g
In order to force the job to execute in Tez, you can apply the following parameter:
das.sparksx.small.execution-framework=Tez
Comments
0 comments
Please sign in to leave a comment.