According to the Apache Hadoop documentation, history files are written by MapReduce jobs (in HDFS) to the
.../history/done_intermediate/ directory. This location is configured in
mapred-site.xml via the property
After a mapreduce job completes, logs are written to HDFS under this directory. The history server continuously scans the intermediate directory and moves any newly available logs to the directory specified by the
mapreduce.jobhistory.done-dir parameter in
mapred-site.xml. From this location, history server picks up the logs and displays them on the history server UI.
MapReduce Job History retention policy is controlled by the below properties.
mapreduce.jobhistory.cleaner.enable- True / False. Default value is
mapreduce.jobhistory.cleaner.interval-ms- How often the job history cleaner checks for files to delete, in milliseconds. Defaults to 86400000 (one day). Files are only deleted if they are older than
mapreduce.jobhistory.max-age-ms- Job history files older than this many milliseconds will be deleted when the history cleaner runs. Defaults to 604800000 (1 week).