After splitting the Datameer private folder into multiple volumes to handle an increasing inode count in MapRFS, the following message appears numerous times and causes the job to run for a significantly longer duration than expected:
Cannot rename across volumes, falling back on copy/delete semantics
Multi-volume filesystem configurations are not supported. Datameer expects that the private folder will exist on a single volume for performance reasons.
Normally, when moving or renaming a file, a simple update to the namespace container is made to update the metadata for the file. With a multiple volume configuration, there is a separate namespace container for each volume dedicated to the specific partition of space. This means that when job finalization occurs and job artifacts are moved from the temporary staging area to their final resting place on disk, the entirety of the data needs to be copied from one volume's data partition into the other volume's data partition. This can cause significant delays in job finalization depending on the quantity of data.
In Datameer 6.3 and higher, data objects are not written into a staging area. This should dramatically reduce the impact of copy/delete semantics caused by a multi volume configuration for successful jobs.
Note, failing jobs may still incur significant performance penalties in job finalization due to the possibility of large log files.
Please sign in to leave a comment.