The Datameer conductor encountered a crash, and then can not restart.
The final message in stderrout.log file indicates the Web Application Context is failing to finish initializing:
2017-04-03 10:08:41.654:INFO:/:main: Initializing Spring root WebApplicationContext
This message should be followed by other messages and should not be the last entry in stderrout.log.
A jstack trace revealed that file locks in MapR FS within the Datameer private temp folder were left open. Part of the Web Application Context initialization process clears the temp folder of contents. Because of the file locks, this action couldn't be completed.
No errors are logged as a result of this due to the fact that the Hadoop API does not return an error or exception upon encountering a file lock. It waits for the file lock to resolve, then proceeds. Depending on why the file is locked, this delay could be several hours or only a few minutes.
Rename the temp folder in MapR FS to a new name, and create a replacement temp folder with 770 permissions.
For example, assume the Datameer private folder is at /user/datameer:
- Stop the Datameer Conductor if it is running with: ./conductor.sh stop
- Rename the /user/datameer/temp folder: hadoop fs -mv /user/datameer/temp /user/datameer/temp_old
- Create a new temp folder: hadoop fs -mkdir /user/datameer/temp
- Set permissions on the new folder: hadoop fs -chmod 770 /user/datameer/temp
- Restart the Datameer Conductor with: ./conductor.sh start
The root cause in this scenario was a mismatch between the C files for MapR embedded in Datameer and the files in use by MapR. The mismatch was caused by a patch to MapR. These files can be found by doing a search for all files containing 'mapr' in their file name that reside in the Datameer home folder. Replace them with the updated files from the MapR patch and restart the Datameer conductor.
Please sign in to leave a comment.