Problem
Jobs are taking a long time to run, and the following error is seen in the job.log
and YARN application log:
INFO [1970-01-01 00:00:00.001] [MrPlanRunnerV2] (MapRFileSystem.java:1086) - Cannot rename across volumes, falling back on copy/delete semantics
This error often repeats many times and accounts for a significant amount of job run-time.
Cause
MapR allows for multiple volumes to be created within the MapRFS Filesystem. Each volume has its own namespace container, allowing for larger overall number of files in the distributed storage. However, because of this separation of metadata and volume data - MapR doesn't support move or rename actions from one volume to another. The data must be read, then copied to the new volume. Often, this is happening on the same Data Node and same Disk subsystem, leading to a large burden on I/O operations. This is known as a Read/Copy/Delete operation.
A separate volume had been created for the workbook in question due to high inode count within the primary volume. Because this workbook generated a large amount of data - it took several hours to perform the Read/Copy/Delete operation.
For more information on MapR Volumes, refer to: MapR - Introduction to Volumes
Solution
The root cause of having a high inode count is due to large numbers of files, not file size. This workbook in question only created a small number of very large files - and didn't significantly impact inode count. Using DISTCP, the data from the volume was copied back to the primary volume to avoid the Read/Copy/Delete operation.
When separating Datameer's private folder into multiple volumes it is important to only move artifacts that generate a large number of files, but also are relatively small in overall file size. This way any time a Read/Copy/Delete is performed, it's on a small amount of data. Decreasing the overall amount of data that needs to be copied significantly reduces the time to complete a Read/Copy/Delete.
Comments
0 comments
Please sign in to leave a comment.