Ubuntu上使用Hadoop 2.x 十二 HDFS Cluster HA QJM和Federation联合使用


扩展性和容错的解决方案

现在已经有了Federation集群,这样就能提供Hadoop大集群的解决方案。不过对于单个namenode server,还是需要HA QJM来提供单点故障的解决方案,使得其可以自动的故障切换。

之前我已经有了两个namenode1和namenode2 server,分别用于管理两个namespace。现在把它们看成active machine, 并clone出两个虚拟机,作为它们的standby machine.

同时QJM需要至少3个JournalNodes,为了省机器,就用datanode1, datanode2和datanode3作为namenode1的JournalNodes. 再创建三个datanode server,同时也作为namenode2的JournalNodes.

架构图:




配置

添3个datanode到federation中

从datanode1中clone出虚拟机,然后复制到另一台物理主机中,安装后,再克隆出2份

完成之后,发现一个奇怪的现象,每个namenode只能看到3台datanode server, 而且每次看到的还不同。

hduser@namenode1:~$ hdfs dfsadmin -printTopology  
Rack: /168/1
   192.168.1.73:50010 (datanode1)
   192.168.1.74:50010 (datanode2)
   192.168.1.75:50010 (datanode3)

hduser@namenode1:~$ hdfs dfsadmin -printTopology  
Rack: /168/1
   192.168.1.74:50010 (datanode2)
   192.168.1.75:50010 (datanode3)
   192.168.1.78:50010 (datanode6)

namenode2和namenode1有所区别:

hduser@namenode2:~$ hdfs dfsadmin -printTopology  
Rack: /168/1
   192.168.1.74:50010 (datanode2)
   192.168.1.75:50010 (datanode3)
   192.168.1.78:50010 (datanode6)

这个可能是hdfs的设计问题,应该不是datanode启动失败,因为我检查了日志,似乎没看到错误信息。先记在这里,以后再查。

我还特地检查了hdfs-site.xml,允许所有datanode连接:

  <property>
    <name>dfs.hosts</name>
    <value>/usr/local/hadoop/etc/hadoop/datanode-allow-list</value>
  </property>

该文件内容为空。


也可以用-report查看,还是只能看到3台datanode server.

hduser@namenode1:~$ hdfs dfsadmin -report
Configured Capacity: 295283847168 (275.00 GB)
Present Capacity: 267733209088 (249.35 GB)
DFS Remaining: 267733045248 (249.35 GB)
DFS Used: 163840 (160 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 3 (3 total, 0 dead)

Live datanodes:
Name: 192.168.1.74:50010 (datanode2)
Hostname: datanode2
Rack: /168/1
Decommission Status : Normal
Configured Capacity: 98427949056 (91.67 GB)
DFS Used: 53248 (52 KB)
Non DFS Used: 9290870784 (8.65 GB)
DFS Remaining: 89137025024 (83.02 GB)
DFS Used%: 0.00%
DFS Remaining%: 90.56%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Last contact: Tue Mar 18 14:51:05 UTC 2014


Name: 192.168.1.78:50010 (datanode6)
Hostname: datanode6
Rack: /168/1
Decommission Status : Normal
Configured Capacity: 98427949056 (91.67 GB)
DFS Used: 53248 (52 KB)
Non DFS Used: 9129762816 (8.50 GB)
DFS Remaining: 89298132992 (83.17 GB)
DFS Used%: 0.00%
DFS Remaining%: 90.72%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Last contact: Tue Mar 18 14:47:10 UTC 2014


Name: 192.168.1.75:50010 (datanode3)
Hostname: datanode3
Rack: /168/1
Decommission Status : Normal
Configured Capacity: 98427949056 (91.67 GB)
DFS Used: 57344 (56 KB)
Non DFS Used: 9130004480 (8.50 GB)
DFS Remaining: 89297887232 (83.17 GB)
DFS Used%: 0.00%
DFS Remaining%: 90.72%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Last contact: Tue Mar 18 14:51:05 UTC 2014



更新中...


相关内容