IDEA 无法访问远程虚拟机的datanode拿到数据
来源:4-16 功能开发之完成ETL数据到HBase落地的全过程

weixin_慕妹8043461
2020-06-06
pk哥你好:
我遇到的问题是,当我想用IEDA访问云端的HDFS,得到如下的错误:
0/06/05 16:59:14 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, ANY, 7978 bytes)
20/06/05 16:59:14 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
20/06/05 16:59:14 INFO rdd.WholeTextFileRDD: Input split: Paths:/test/test-access.log:0+2715
20/06/05 17:00:14 WARN hdfs.BlockReaderFactory: I/O error constructing remote block reader.
org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/172.31.69.xxx:50010]
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533)
at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3590)
at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:849)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:764)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:377)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:666)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:904)
网上搜索之后发现,是因为虚拟机和IDEA的主机之间通过外网ip传递信息,但是当想要或许HDFS中datanode的数据时,namenode传递给IDEA主机上面的是datanode的内网ip,导致无法访问,我做的尝试:
我根据一些帖子的指导做了相应的配置改变,hdfs-site.xml
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
</property>
<property>
<name>dfs.datanode.use.datanode.hostname</name>
<value>true</value>
</property>
类似于虚拟机上的HDSF用域名传递而非ip传递给主机上的IDEA,并且主机也配置好了映射:
18.207.78.xxx hadoop000 //(公共ip,idea可访问)
虚拟机配置映射:
172.31.69.xxx hadoop000 //(内网ip)
因为是伪分布,按照你的视频,datanode的 hostname 也是叫hadoop000.
运行结果还是报同样的错,IDEA还是尝试解析内网ip导致超时,希望您能帮忙看看。
2回答
-
weixin_慕妹8043461
提问者
2020-06-07
更新报错:
20/06/07 10:33:14 INFO datasources.FileScanRDD: Reading File path: hdfs://hadoop000:8020/test/drivers.csv, range: 0-1997, partition values: [empty row]
20/06/07 10:33:14 INFO codegen.CodeGenerator: Code generated in 7.95912 ms
20/06/07 10:34:15 WARN hdfs.BlockReaderFactory: I/O error constructing remote block reader.
org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/172.31.69.197:50010](还是能接收到内网ip并且尝试链接的)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533)
at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3590)
at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:849)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:764)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:377)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:666)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:904)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:963)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.fillBuffer(UncompressedSplitLineReader.java:62)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
太长省略一些。。。。
0/06/07 10:34:15 WARN hdfs.DFSClient: Failed to connect to /172.31.69.197:50010 for block BP-536449180-172.31.69.197-1591536838863:blk_1073741825_1001, add to deadNodes and continue.
省略一些。。。。。。。。。。。。。。。
20/06/07 10:34:15 INFO hdfs.DFSClient: Could not obtain BP-536449180-172.31.69.197-1591536838863:blk_1073741825_1001 from any node: No live nodes contain current block Block locations: DatanodeInfoWithStorage[172.31.69.197:50010,DS-84a95ba2-e53a-41b2-8224-0d38ad969b24,DISK] Dead nodes: DatanodeInfoWithStorage[172.31.69.197:50010,DS-84a95ba2-e53a-41b2-8224-0d38ad969b24,DISK]. Will get new block locations from namenode and retry...
20/06/07 10:34:15 WARN hdfs.DFSClient: DFS chooseDataNode: got # 1 IOException, will wait for 892.3526112545212 msec.
012020-06-08 -
Michael_PK
2020-06-06
你是云主机那,那你的代码需要设置dfs.client.use.datanode.hostname是true设置到Configuration中
082020-06-08
相似问题