Problem
Cluster reports KilledTaskAttempts
during a job execution.Even if this doesn't lead to the job failure, it might cause significant performance degradation and increased run-time.
Cause
This behaviour might be caused by different factors and it is required to check the Resource Manager application logs to understand the real reason.
In case containers have been killed by the Application Master with the exit code 102, this means that resources have been requested by queues with higher priorities (Yarn Preemption).
Exit status code:
Containers preempted by the framework - public static final int PREEMPTED = -102;
Yarn application log snippet:
{"entity":"tez_container_e129_1516217554921_30814_01_000428","entitytype":"TEZ_CONTAINER_ID","relatedEntities":[{"entity":"appattempt_1516217554921_30814_000001","entitytype":"TEZ_APPLICATION_ATTEMPT"},
{"entity":"container_e129_1516217554921_30814_01_000428","entitytype":"containerId"}],"events":[{"ts":1516269921084,"eventtype":"CONTAINER_STOPPED"}],"otherinfo":{"exitStatus":-102}}
Solution
Review the RM configuration and reconfigure preemption conditions.
Resource-Preemption in Yarn's capacity scheduler.
Comments
0 comments
Please sign in to leave a comment.