Problem
After Datameer has been running without error for quite some time, the following warning message is generated in the conductor.log file:
[anonymous] WARN [2014-01-01 00:00:00.000] [LeaseRenewer:datameer@hadoop-name-node.datameer.com.com:8020] (LeaseRenewer.java:458) - Failed to renew lease for [DFSClient_NONMAPREDUCE_-451590243_47] for 3222 seconds. Will retry shortly ... java.io.IOException: Failed on local exception: java.io.IOException: Too many open files; Host Details : local host is: "datameer-app-host/10.0.0.123"; destination host is: "hadoop-name-node.datameer.com":8020; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) at org.apache.hadoop.ipc.Client.call(Client.java:1351) at org.apache.hadoop.ipc.Client.call(Client.java:1300) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at $Proxy97.renewLease(Unknown Source) at sun.reflect.GeneratedMethodAccessor258.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at $Proxy97.renewLease(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewLease(ClientNamenodeProtocolTranslatorPB.java:499) at org.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:713) at org.apache.hadoop.hdfs.LeaseRenewer.renew(LeaseRenewer.java:417) at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:442) at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71) at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:298) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: Too many open files at sun.nio.ch.IOUtil.initPipe(Native Method) at sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:49) at sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:18) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.get(SocketIOWithTimeout.java:409) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:325) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:203) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:547) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:642) at org.apache.hadoop.ipc.Client$Connection.access$2600(Client.java:314) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1399) at org.apache.hadoop.ipc.Client.call(Client.java:1318) ... 16 more |
An administrator on the Datameer application server can run the following command to identify the number of open network connections in a CLOSE_WAIT state: "lsof | grep -c CLOSE_WAIT". In environments affected by this issue, the count of CLOSE_WAIT connections steadily increases over time until the limit of open files is reached for the user that started the conductor.sh binary.
Cause
The cause seems to be HDFS-5671 "Fix socket leak in DFSInputStream#getBlockReader".
Solution
To work-around this issue, restart the Datameer application to successfully close all the CLOSE_WAIT socket connections.
To resolve this issue permanently, please contact Datameer Support for more information and reference DAP-21185.
Comments
0 comments
Please sign in to leave a comment.