Goal
Datameer acts as a job compiler, it compiles jobs and send thems to the cluster for execution. A clusters low I/O throughput might significantly impact Datameer's performance, especially relatively heavy jobs that require a huge volume of intermediate and final results to be written to HDFS.
The TestDFSIO benchmark is used for measuring I/O (read/write) performance. It does this by using a MapReduce job to read and write files in parallel. Hence, functional MapReduce is needed for it.
Learn
In order to measure I/O (read/write) performance via TestDFSIO, it is required to perform the following steps:
- Login at a DataNode client and locate
hadoop-mapreduce-client-jobclient-*-tests.jar
find / -name "hadoop-mapreduce-client-jobclient-*-tests.jar"
- Execute 3-5 TestDFSIO jobs with different parameters (various amounts and sizes of written files).
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-<version>-tests.jar TestDFSIO -write -nrFiles 10 -size 25MB
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-<version>-tests.jar TestDFSIO -write -nrFiles 10 -size 50MB
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-<version>-tests.jar TestDFSIO -write -nrFiles 10 -size 100MB
- TestDFSIO output example.
17/11/13 17:17:24 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write 17/11/13 17:17:24 INFO fs.TestDFSIO: Date & time: Mon Nov 13 17:17:24 UTC 2017 17/11/13 17:17:24 INFO fs.TestDFSIO: Number of files: 10 17/11/13 17:17:24 INFO fs.TestDFSIO: Total MBytes processed: 1000.0 17/11/13 17:17:24 INFO fs.TestDFSIO: Throughput mb/sec: 14.895138226882745 17/11/13 17:17:24 INFO fs.TestDFSIO: Average IO rate mb/sec: 17.42253303527832 17/11/13 17:17:24 INFO fs.TestDFSIO: IO rate std deviation: 6.712244005380064 17/11/13 17:17:24 INFO fs.TestDFSIO: Test exec time sec: 65.527
Possible issues
Sometimes TestDFSIO responds with the following error message.
java.io.FileNotFoundException: File does not exist: /benchmarks/TestDFSIO/io_write/part-00000
This might be caused by an incorrect compression configuration. Additional property -D mapred.output.compress=false
to TestDFSIO should fix this.
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-<version>-tests.jar TestDFSIO -D mapred.output.compress=false -write -nrFiles 10 -size 25MB
Comments
0 comments
Please sign in to leave a comment.