异常日志的信息

java.util.concurrent.ExecutionException: java.net.ConnectException: Call From node1/192.168.245.210
 to node2:8020 failed on connection exception: java.net.ConnectException: Connection refused; For m
ore details see:  http://wiki.apache.org/hadoop/ConnectionRefused
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:206)
	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.triggerActiveLogRoll(EditLogTail
er.java:352)
	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditL
ogTailer.java:411)
	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(E
ditLogTailer.java:380)
	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLo
gTailer.java:397)
	at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:482)
	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogT
ailer.java:393)
Caused by: java.net.ConnectException: Call From node1/192.168.245.210 to node2:8020 failed on conne
ction exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.
apache.org/hadoop/ConnectionRefused
	at sun.reflect.GeneratedConstructorAccessor14.newInstance(Unknown Source)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorI
mpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755)
	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1495)
	at org.apache.hadoop.ipc.Client.call(Client.java:1437)
	at org.apache.hadoop.ipc.Client.call(Client.java:1347)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
	at com.sun.proxy.$Proxy19.rollEditLog(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.rollEditLog(NamenodeProto
colTranslatorPB.java:150)
	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$2.doWork(EditLogTailer.java:337)
	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$2.doWork(EditLogTailer.java:334)
	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$MultipleNameNodeProxy.call(EditL
ogTailer.java:479)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:685)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:788)
	at org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:409)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1552)
	at org.apache.hadoop.ipc.Client.call(Client.java:1383)
	... 12 more

解决

修改 hdfs-site.xml
添加如下属性

    <property>
        <name>dfs.ha.fencing.methods</name>
        <value>
            sshfence
            shell(/bin/true)
        </value>
    </property>

原理分析

active节点所在的namenode down掉后,standby节点会连接down掉的namenode,如果失败就一直连到指定的次数后放弃,然后。。。就没有然后了
加入shell(/bin/true)后,就是可以直接执行自定义脚本,让standyby节点成为active

Logo

魔乐社区(Modelers.cn) 是一个中立、公益的人工智能社区,提供人工智能工具、模型、数据的托管、展示与应用协同服务,为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作,由全产业链共同建设、共同运营、共同享有,推动国产AI生态繁荣发展。

更多推荐