寫在前面#
本實驗使用虛擬機器實現了 Hadoop HA、HBase HA 和 YARN HA 的全分散式(Fully Distributed Mode)集群部署。
完全適用於本實驗的設備清單:
- 2 台安裝了 Oracle VM VirtualBox 的電腦,並在其中 1 台電腦安裝了一個搭載 Ubuntu Server Linux 的虛擬機器;
- 1 個 Raspberry Pi 3 Model B;
- 1 個路由器;
- 3 根網線使 3 台設備分別于路由器連接。
選擇的設備並不需要完全一致,你可以根據自己的實驗環境參考本文,比如:
- 若 1 台電腦的性能足夠一定數量的虛擬機器正常工作(此處 2 台電腦各自分別運行了 3 個虛擬機器),則不需要 2 台及以上;
- 引入樹莓派到集群中是非必須的;
- VirtualBox 並非唯一可選的虛擬機器軟體;
- 無線路由器也可以構建無網線的網路環境;
- ……
如何確定你真正需要的設備:
- 清點手頭設備的數量、估計設備性能(以判斷該設備能讓多少數量的虛擬機器正常工作);
- 合理設計 ZooKeeper、Hadoop、HBase 和 YARN 所部署的“機器”(裸機或虛擬機器);
- 確認所需虛擬機器數量,根據各設備性能給宿主機分配虛擬機器的數量。
💡 值得注意的是,超過 1 個設備的話,需要將所有設備部署在同一局域網下。
以下是該分散式集群系統中每台機器上部署服務的示意圖:
graph TD
subgraph fa:fa-desktop PC_1
H1[fa:fa-server H1
NameNode
zkfc]
H2[fa:fa-server H2
DataNode
HMaster
ResourceManager]
H6[fa:fa-server H6
DataNode
HRegionServer
QuorumPeerMain]
endgraph TD
subgraph fa:fa-desktop PC_2
H4[fa:fa-server H4
NameNode
zkfc]
H5[fa:fa-server H5
DataNode
HMaster
ResourceManager]
H7[fa:fa-server H7
DataNode
HRegionServer
QuorumPeerMain]
endgraph TD
subgraph fa:fa-desktop Raspberry_Pi
H3[fa:fa-server H3
DataNode
HRegionServer
QuorumPeerMain]
end啟動其中 1 台虛擬機器。
安裝 apt-transport-https 與 ca-certificates 以使用 https 源。
1
| sudo apt install apt-transport-https ca-certificates
|
備份 sources.list
。
1
| sudo mv /etc/apt/sources.list /etc/apt/sources.list.bak
|
修改 sources.list
。
1
| sudo nano /etc/apt/sources.list
|
使用 國研院國網中心自由軟體實驗室鏡像站 Ubuntu 軟體來源。
1
2
3
4
5
6
7
8
9
10
| deb http://free.nchc.org.tw/ubuntu/ focal main restricted universe multiverse
## deb-src http://free.nchc.org.tw/ubuntu/ focal main restricted universe multiverse
deb http://free.nchc.org.tw/ubuntu/ focal-updates main restricted universe multiverse
## deb-src http://free.nchc.org.tw/ubuntu/ focal-updates main restricted universe multiverse
deb http://free.nchc.org.tw/ubuntu/ focal-backports main restricted universe multiverse
## deb-src http://free.nchc.org.tw/ubuntu/ focal-backports main restricted universe multiverse
deb http://free.nchc.org.tw/ubuntu/ focal-security main restricted universe multiverse
## deb-src http://free.nchc.org.tw/ubuntu/ focal-security main restricted universe multiverse
## deb http://free.nchc.org.tw/ubuntu/ focal-proposed main restricted universe multiverse
## deb-src http://free.nchc.org.tw/ubuntu/ focal-proposed main restricted universe multiverse
|
更新軟體來源。
1
2
| sudo apt update
sudo apt upgrade
|
安裝 openjdk-8-jdk。
1
| sudo apt install openjdk-8-jdk
|
下載 Hadoop 二進位制包,HBase 和 ZooKeeper 同。
1
| wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz
|
解壓縮到目前的目錄,HBase 和 ZooKeeper 同。
1
| tar xzf hadoop-3.3.1.tar.gz
|
編輯 .bashrc
。
添加相關環境變數。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
| #Hadoop Related Options
export HADOOP_HOME=/home/hadoop/hadoop-3.3.1
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native -Djava.net.preferIPv4Stack=true"
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
#HBase Related Options
export HBASE_HOME=/home/hadoop/hbase-2.4.6
export PATH=$PATH:$HBASE_HOME/sbin:$HBASE_HOME/bin
#ZooKeeper Related Options
export ZOOKEEPER_HOME=/home/hadoop/apache-zookeeper-3.7.0-bin
export PATH=$PATH:$ZOOKEEPER_HOME/bin
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export PATH=$PATH:$JAVA_HOME/bin
|
💡 示例中的 hadoop
為用戶名,請按實際情況修改,下同。
使對 .bashrc
的更改生效。
修改 sshd_config
設置。
1
| sudo nano /etc/ssh/sshd_config
|
1
2
3
4
5
| Port 22
ListenAddress 0.0.0.0
PermitRootLogin yes
PasswordAuthentication yes
X11Forwarding no
|
重啟 ssh 服務以生效。
1
| sudo service ssh restart
|
克隆虛擬機器#
將前文配置好的虛擬機器命名為 H1,克隆出 H2 到 H7(VirtualBox 中選擇 完全複製,並選擇 為所有網卡重新生成 MAC 位址),並修改對應的 /etc/hostname
和 /etc/hosts
。
配置網路#
配置虛擬機器獲取靜態 IP。
1
| sudo nano /etc/netplan/00-installer-config.yaml
|
1
2
3
4
5
6
7
8
9
10
11
| network:
renderer: networkd
ethernets:
enp0s3:
dhcp4: false
addresses: [192.168.0.201/24]
gateway4: 192.168.0.1
nameservers:
addresses: [8.8.8.8]
optional: true
version: 2
|
💡 enp0s3
為物理網卡名,請按實際情況修改,可通過ip addr
獲得;gateway4
指預設閘道,應該與宿主機的預設閘道設置一致。
在本集群中對 H1 到 H7 共 7 台虛擬機器分別配置靜態 IP 為 192.168.0.201 到 192.168.0.207,所以以上修改需進行 7 次。
應用更改以生效。
查看是否生效。
其結果為生效。
1
2
3
4
5
6
7
8
9
10
11
12
| 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: **enp0s3**: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 08:00:27:9a:bf:e9 brd ff:ff:ff:ff:ff:ff
inet **192.168.0.201**/24 brd 192.168.0.255 scope global enp0s3
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:fe9a:bfe9/64 scope link
valid_lft forever preferred_lft forever
|
修改 /etc/hostname
。
1
| sudo nano /etc/hostname
|
內容如下,這是 H1 的例子。
修改 /etc/hosts
。
以下為 H1 的例子,將原本對應主機名稱所在行注釋,需要不同虛擬機器分別修改的地方是 localhost
。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
| 127.0.0.1 localhost
**## 127.0.1.1 h1**
192.168.0.201 h1
192.168.0.202 h2
192.168.0.203 h3
192.168.0.204 h4
192.168.0.205 h5
192.168.0.206 h6
192.168.0.207 h7
## The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
|
配置 SSH#
生成 ssh key。
1
| ssh-keygen -t rsa -P ""
|
複製公開金鑰到 ~/.ssh/authorized_keys
。
1
| cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
|
集群中所有虛擬機器互相複製 ssh 金鑰到對方虛擬機器。
1
2
3
4
5
6
7
| ssh-copy-id hadoop@h1 \
&& ssh-copy-id hadoop@h2 \
&& ssh-copy-id hadoop@h3 \
&& ssh-copy-id hadoop@h4 \
&& ssh-copy-id hadoop@h5 \
&& ssh-copy-id hadoop@h6 \
&& ssh-copy-id hadoop@h7
|
以下為針對複製 H1 金鑰的部分輸出。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
| hadoop@h1:~$ ssh-copy-id hadoop@h1 && ssh-copy-id hadoop@h2 &&ssh-copy-id hadoop@h3 && ssh-copy-id hadoop@h4 && ssh-copy-id hadoop@h5 && ssh-copy-id hadoop@h6 && ssh-copy-id hadoop@h7
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
(if you think this is a mistake, you may want to use -f option)
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
The authenticity of host 'h2 (192.168.0.202)' can't be established.
ECDSA key fingerprint is SHA256:C6ydAa+dfI5lcMJkMUucz60WE7p3eFLIs7fWZrTYfDE.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
hadoop@h2's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'hadoop@h2'"
and check to make sure that only the key(s) you wanted were added.
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
The authenticity of host 'h3 (192.168.0.203)' can't be established.
ECDSA key fingerprint is SHA256:OVEZc5ls6hhBFNgqmZxT/EjubDKr8oyoqwE4Wtvsk+k.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
hadoop@h3's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'hadoop@h3'"
and check to make sure that only the key(s) you wanted were added.
|
搭建 ZooKeeper 集群#
切換到 H3。
複製 ZooKeeper 範本設定檔並修改配置。
1
| cp $ZOOKEEPER_HOME/conf/zoo_sample.cfg $ZOOKEEPER_HOME/conf/zoo.cfg && nano $ZOOKEEPER_HOME/conf/zoo.cfg
|
1
2
3
4
5
| dataDir=/home/hadoop/tmp/zookeeper
server.3=h3:2888:3888
server.6=h6:2888:3888
server.7=h7:2888:3888
|
💡 集群資訊格式為 server.<id>=<hostname>:2888:3888
。
其中 id
是每個機器唯一的編號,hostname
是每個機器對應的主機名稱。:2888:3888
前者表示 Follower 跟 Leader 的通訊連接埠,即服務端內部通信的埠(默認2888);後者為選舉埠(默認是3888)。
在每台機器上為 ZooKeeper 新建 dataDir,在目錄中新建 myid
並將 id 填入,以下是分別在 H3、H6、H7 上操作的例子。
1
| mkdir -p /home/hadoop/tmp/zookeeper && echo 3 > /home/hadoop/tmp/zookeeper/myid
|
1
| mkdir -p /home/hadoop/tmp/zookeeper && echo 6 > /home/hadoop/tmp/zookeeper/myid
|
1
| mkdir -p /home/hadoop/tmp/zookeeper && echo 7 > /home/hadoop/tmp/zookeeper/myid
|
將剛剛配置完成的 H3 中的 ZooKeeper 複製到 H6 和 H7。
1
| scp -r $ZOOKEEPER_HOME/conf/* h6:$ZOOKEEPER_HOME/conf && scp -r $ZOOKEEPER_HOME/conf/* h7:$ZOOKEEPER_HOME/conf
|
將 3 台機器的 ZooKeeper 服務啟動,即在 H3、H6、H7 執行下面的命令。
在 3 台機器上檢查集群運行狀態,因為集群必須啟動至少 超過半數 的節點才能正常工作,所以應把上面的命令在 3 台機器執行後,才可以看到集群正常啟動的狀態回饋。
輸出資訊為正常啟動。
1
2
3
4
5
| hadoop@h3:~$ zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/hadoop/apache-zookeeper-3.7.0-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: follower
|
1
2
3
4
5
| hadoop@h6:~$ zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/hadoop/apache-zookeeper-3.7.0-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: leader
|
1
2
3
4
5
| hadoop@h7:~$ zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/hadoop/apache-zookeeper-3.7.0-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: follower
|
💡 可以看到目前被選舉為 Leader 的是 H6。
通過執行 jps
獲得目前正在運行的相關服務。
1
2
3
| hadoop@h3:~$ jps
2499 Jps
2378 QuorumPeerMain
|
1
2
3
| hadoop@h6:~$ jps
3364 Jps
3279 QuorumPeerMain
|
1
2
3
| hadoop@h7:~$ jps
3511 QuorumPeerMain
3599 Jps
|
將 3 台機器的 ZooKeeper 服務停止,即在 H3、H6、H7 執行下面的命令。
搭建 Hadoop 集群#
切換到 H1。
修改 hadoop-env.sh
1
| nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh
|
指定 JAVA_HOME
。
1
| export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
|
💡 如果集群中有樹莓派等 ARM 架構的設備,應安裝對應版本的 JDK 並修改對應版本的 JAVA_HOME
,如。
1
| export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-arm64
|
修改 hdfs-site.xml
。
1
| nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
| <configuration>
<!-- 指定生成副本數量 -->
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<!-- 關閉許可權限制以方便開發調試 -->
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<!-- NameService 名稱,與 core-site.xml 一致 -->
<property>
<name>dfs.nameservices</name>
<value>ns1</value>
</property>
<!-- 指定 2 個 NameNode -->
<property>
<name>dfs.ha.namenodes.ns1</name>
<value>nn1,nn2</value>
</property>
<!-- nn1 RPC 通信地址 -->
<property>
<name>dfs.namenode.rpc-address.ns1.nn1</name>
<value>h1:9000</value>
</property>
<!-- nn1 HTTP 通信地址 -->
<property>
<name>dfs.namenode.http-address.ns1.nn1</name>
<value>h1:50070</value>
</property>
<!-- nn2 RPC 通信地址 -->
<property>
<name>dfs.namenode.rpc-address.ns1.nn2</name>
<value>h4:9000</value>
</property>
<!-- nn2 HTTP 通信地址 -->
<property>
<name>dfs.namenode.http-address.ns1.nn2</name>
<value>h4:50070</value>
</property>
<!-- JournalNode 依賴於 ZooKeeper,部署在 ZooKeeper 的機器上 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://h3:8485;h6:8485;h7:8485/ns1</value>
</property>
<!-- 指定 JournalNode 在本地磁片存放 edits 資料的位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop/tmp/journaldata</value>
</property>
<!-- 開啟 NameNode 失敗自動切換 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 配置失敗自動切換實現方式,訪問代理類 -->
<property>
<name>dfs.client.failover.proxy.provider.ns1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 配置隔離機制方法 -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<!-- 使用 sshfence 隔離機制時需要 ssh 免登錄 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<!-- 配置 sshfence 隔離機制超時時間,時間單位為 ms -->
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
<!-- 保存 FsImage 鏡像的目錄 -->
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/data/namenode</value>
</property>
<!-- 存放 HDFS 檔案系統資料檔案的目錄 -->
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/data/datanode</value>
</property>
<property>
<name>heartbeat.recheck.interval</name>
<value>2000</value>
</property>
<property>
<name>dfs.heartbeat.interval</name>
<value>1</value>
</property>
</configuration>
|
💡 腦裂:即出現 2 個 namenode 同時對外提供服務,對此有 2 個隔離機制方法可選用:
- sshfence:遠端登入殺死出現狀況的 namenode(如果遠端 ssh 登錄的埠號不是 22,需在 dfs.ha.fencing.methods 處配置
sshfence(用戶名:埠號)
; - shell:遠端登入超時等無回應的後續解決方案,可在括弧內填入自訂腳本,如
shell(/bin/true)
,此時返回 true 直接切換。
修改 core-site.xml
。
1
| nano $HADOOP_HOME/etc/hadoop/core-site.xml
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
| <configuration>
<!-- 指定 HDFS 的 NameService 為 ns1 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://ns1/</value>
</property>
<!-- 指定 Hadoop 工作目錄 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp/hadoop</value>
</property>
<!-- 指定 ZooKeeper 地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>h3:2181,h6:2181,h7:2181</value>
</property>
<!-- 用戶端連接重試次數 -->
<property>
<name>ipc.client.connect.max.retries</name>
<value>100</value>
</property>
<!-- 2 次重新建立連接之間的時間間隔,單位為 ms -->
<property>
<name>ipc.client.connect.retry.interval</name>
<value>5000</value>
</property>
</configuration>
|
修改 mapred-site.xml
。
1
| nano $HADOOP_HOME/etc/hadoop/mapred-site.xml
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
| <configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=/home/hadoop/hadoop-3.3.1</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=/home/hadoop/hadoop-3.3.1</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=/home/hadoop/hadoop-3.3.1</value>
</property>
</configuration>
|
修改 yarn-site.xml
。
1
| nano $HADOOP_HOME/etc/hadoop/yarn-site.xml
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
| <configuration>
<!-- 開啟 ResourceManager 高可用 -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!-- 指定 ResourceManager 集群 id -->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yrc</value>
</property>
<!-- 指定分組下的 ResourceManager 的名稱 -->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!-- 配置 RM1 -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>h2</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm1</name>
<value>h2:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>h2:8030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>h2:8088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
<value>h2:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm1</name>
<value>h2:8033</value>
</property>
<property>
<name>yarn.resourcemanager.ha.admin.address.rm1</name>
<value>h2:23142</value>
</property>
<!-- 配置 RM2 -->
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>h5</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>h5:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>h5:8030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>h5:8088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
<value>h5:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm2</name>
<value>h5:8033</value>
</property>
<property>
<name>yarn.resourcemanager.ha.admin.address.rm2</name>
<value>h5:23142</value>
</property>
<!-- 指定 ZooKeeper 集群地址 -->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>h3:2181,h6:2181,h7:2181</value>
</property>
<!-- reducer 取資料的方式是 mapreduce_shuffle -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
|
配置 workers
。
1
| nano $HADOOP_HOME/etc/hadoop/workers
|
創建上文設定檔中所需要的目錄。
1
| mkdir -p /home/hadoop/tmp/journaldata /home/hadoop/tmp/hadoop
|
將剛剛配置完成的 H1 中的 Hadoop 複製到另外 6 台機器。
1
2
3
4
5
6
| scp -r $HADOOP_HOME/etc/hadoop/* h2:$HADOOP_HOME/etc/hadoop \
&& scp -r $HADOOP_HOME/etc/hadoop/* h3:$HADOOP_HOME/etc/hadoop \
&& scp -r $HADOOP_HOME/etc/hadoop/* h4:$HADOOP_HOME/etc/hadoop \
&& scp -r $HADOOP_HOME/etc/hadoop/* h5:$HADOOP_HOME/etc/hadoop \
&& scp -r $HADOOP_HOME/etc/hadoop/* h6:$HADOOP_HOME/etc/hadoop \
&& scp -r $HADOOP_HOME/etc/hadoop/* h7:$HADOOP_HOME/etc/hadoop
|
切換到 H4 為其新建 NameNode 保存 FsImage 鏡像的目錄。
1
| mkdir -p /home/hadoop/data/namenode
|
啟動 HDFS HA#
將 H3、H6、H7 的 ZooKeeper 服務啟動,即在 H3、H6、H7 執行下面的命令。
在 H3、H6、H7 上檢查集群運行狀態。
輸出資訊為正常啟動。
1
2
3
4
5
| hadoop@h3:~$ zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/hadoop/apache-zookeeper-3.7.0-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: follower
|
1
2
3
4
5
| hadoop@h6:~$ zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/hadoop/apache-zookeeper-3.7.0-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: leader
|
1
2
3
4
5
| hadoop@h7:~$ zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/hadoop/apache-zookeeper-3.7.0-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: follower
|
💡 可以看到目前被選舉為 Leader 的仍是 H6。
將 H2、H3、H5、H6、H7 的 JournalNode 服務啟動,即在 H2、H3、H5、H6、H7 執行下面的命令。
1
| hdfs --daemon start journalnode
|
💡 Hadoop 集群第一次啟動時需要手動啟動 JournalNode,上面的步驟在以後不需要操作。
通過執行 jps
獲得目前正在運行的相關服務。
1
2
3
| hadoop@h2:~$ jps
4631 Jps
4588 JournalNode
|
1
2
3
4
| hadoop@h3:~$ jps
2864 JournalNode
2729 QuorumPeerMain
2938 Jps
|
1
2
3
| hadoop@h5:~$ jps
4371 Jps
4325 JournalNode
|
1
2
3
4
| hadoop@h6:~$ jps
3474 QuorumPeerMain
3594 JournalNode
3644 Jps
|
1
2
3
4
| hadoop@h7:~$ jps
3890 Jps
3848 JournalNode
3736 QuorumPeerMain
|
切換到 H1 格式化 HDFS。
複製 H1 NameNode 的 FSImage 到 H4 的 NameNode 上,確保兩者一致。
1
| scp -r /home/hadoop/data/namenode/* h4:/home/hadoop/data/namenode
|
格式化 ZKFC。
啟動 HDFS。
通過執行 jps
獲得目前正在運行的相關服務。
1
2
3
4
| hadoop@h1:~$ jps
22368 Jps
21960 NameNode
22317 DFSZKFailoverController
|
查看報告可以發現 5 個 DataNode 狀態正常。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
| Configured Capacity: 101142011904 (94.20 GB)
Present Capacity: 59280982016 (55.21 GB)
DFS Remaining: 59280859136 (55.21 GB)
DFS Used: 122880 (120 KB)
DFS Used%: 0.00%
Replicated Blocks:
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 0
Erasure Coded Block Groups:
Low redundancy block groups: 0
Block groups with corrupt internal blocks: 0
Missing block groups: 0
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 0
-------------------------------------------------
Live datanodes (5):
Name: 192.168.0.202:9866 (h2)
Hostname: ip6-localhost
Decommission Status : Normal
Configured Capacity: 9971224576 (9.29 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 7673147392 (7.15 GB)
DFS Remaining: 1771356160 (1.65 GB)
DFS Used%: 0.00%
DFS Remaining%: 17.76%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 0
Last contact: Mon Oct 25 02:17:04 UTC 2021
Last Block Report: Mon Oct 25 02:15:29 UTC 2021
Num of Blocks: 0
Name: 192.168.0.203:9866 (h3)
Hostname: ip6-localhost
Decommission Status : Normal
Configured Capacity: 61257113600 (57.05 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 6547861504 (6.10 GB)
DFS Remaining: 52175802368 (48.59 GB)
DFS Used%: 0.00%
DFS Remaining%: 85.18%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 0
Last contact: Mon Oct 25 02:17:06 UTC 2021
Last Block Report: Mon Oct 25 02:15:54 UTC 2021
Num of Blocks: 0
Name: 192.168.0.205:9866 (h5)
Hostname: ip6-localhost
Decommission Status : Normal
Configured Capacity: 9971224576 (9.29 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 7665070080 (7.14 GB)
DFS Remaining: 1779433472 (1.66 GB)
DFS Used%: 0.00%
DFS Remaining%: 17.85%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 0
Last contact: Mon Oct 25 02:17:06 UTC 2021
Last Block Report: Mon Oct 25 02:15:33 UTC 2021
Num of Blocks: 0
Name: 192.168.0.206:9866 (h6)
Hostname: ip6-localhost
Decommission Status : Normal
Configured Capacity: 9971224576 (9.29 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 7667515392 (7.14 GB)
DFS Remaining: 1776988160 (1.65 GB)
DFS Used%: 0.00%
DFS Remaining%: 17.82%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 0
Last contact: Mon Oct 25 02:17:05 UTC 2021
Last Block Report: Mon Oct 25 02:15:38 UTC 2021
Num of Blocks: 0
Name: 192.168.0.207:9866 (h7)
Hostname: ip6-localhost
Decommission Status : Normal
Configured Capacity: 9971224576 (9.29 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 7667224576 (7.14 GB)
DFS Remaining: 1777278976 (1.66 GB)
DFS Used%: 0.00%
DFS Remaining%: 17.82%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 0
Last contact: Mon Oct 25 02:17:07 UTC 2021
Last Block Report: Mon Oct 25 02:15:31 UTC 2021
Num of Blocks: 0
|
測試 HDFS HA#
訪問 http://192.168.0.201:50070/ 可以看到 ‘h1:9000’ (active)
在 H1 停止 NameNode。
1
| hdfs --daemon stop namenode
|
此時訪問 http://192.168.0.204:50070/ 可以看到 ‘h4:9000’ (active)。
在 H1 再次啟用 NameNode。
1
| hdfs --daemon start namenode
|
此時訪問 http://192.168.0.204:50070/ 可以看到 ‘h1:9000’ (standby)。
啟動 YARN#
將 H2、H5 的 YARN 服務啟動,即在 H2 執行下面的命令。
通過執行 jps
獲得目前正在運行的相關服務。
1
2
3
4
5
6
| hadoop@h2:~$ jps
5203 JournalNode
6691 NodeManager
6534 ResourceManager
6838 Jps
5692 DataNode
|
1
2
3
4
5
6
| hadoop@h5:~$ jps
6099 Jps
5395 DataNode
4921 JournalNode
5659 ResourceManager
5789 NodeManager
|
測試 YARN HA#
訪問 http://192.168.0.202:8188/ 可以看到 ResourceManager HA state: active。
在 H2 停止 ResourceManager。
1
| yarn --daemon stop resourcemanager
|
此時訪問 http://192.168.0.205:8188/ 可以看到 ResourceManager HA state: active。
在 H2 再次啟用 NameNode。
1
| yarn --daemon start resourcemanager
|
此時訪問 http://192.168.0.202:8188/ 可以看到 ResourceManager HA state: standby。
測試 MapReduce#
切換到 H2。
1
| hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.1.jar pi 1 1
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
| hadoop@h2:~$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.1.jar pi 1 1
Number of Maps = 1
Samples per Map = 1
Wrote input for Map #0
Starting Job
2021-10-25 07:08:04,762 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
2021-10-25 07:08:04,977 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1635145651320_0001
2021-10-25 07:08:05,934 INFO input.FileInputFormat: Total input files to process : 1
2021-10-25 07:08:06,626 INFO mapreduce.JobSubmitter: number of splits:1
2021-10-25 07:08:06,973 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1635145651320_0001
2021-10-25 07:08:06,975 INFO mapreduce.JobSubmitter: Executing with tokens: []
2021-10-25 07:08:07,341 INFO conf.Configuration: resource-types.xml not found
2021-10-25 07:08:07,342 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2021-10-25 07:08:07,991 INFO impl.YarnClientImpl: Submitted application application_1635145651320_0001
2021-10-25 07:08:08,048 INFO mapreduce.Job: The url to track the job: http://h5:8188/proxy/application_1635145651320_0001/
2021-10-25 07:08:08,051 INFO mapreduce.Job: Running job: job_1635145651320_0001
2021-10-25 07:08:25,438 INFO mapreduce.Job: Job job_1635145651320_0001 running in uber mode : false
2021-10-25 07:08:25,439 INFO mapreduce.Job: map 0% reduce 0%
2021-10-25 07:08:34,585 INFO mapreduce.Job: map 100% reduce 0%
2021-10-25 07:08:50,737 INFO mapreduce.Job: map 100% reduce 100%
2021-10-25 07:08:52,774 INFO mapreduce.Job: Job job_1635145651320_0001 completed successfully
2021-10-25 07:08:52,993 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=28
FILE: Number of bytes written=555793
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=255
HDFS: Number of bytes written=215
HDFS: Number of read operations=9
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
HDFS: Number of bytes read erasure-coded=0
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=5590
Total time spent by all reduces in occupied slots (ms)=14233
Total time spent by all map tasks (ms)=5590
Total time spent by all reduce tasks (ms)=14233
Total vcore-milliseconds taken by all map tasks=5590
Total vcore-milliseconds taken by all reduce tasks=14233
Total megabyte-milliseconds taken by all map tasks=5724160
Total megabyte-milliseconds taken by all reduce tasks=14574592
Map-Reduce Framework
Map input records=1
Map output records=2
Map output bytes=18
Map output materialized bytes=28
Input split bytes=137
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=28
Reduce input records=2
Reduce output records=0
Spilled Records=4
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=220
CPU time spent (ms)=1050
Physical memory (bytes) snapshot=302690304
Virtual memory (bytes) snapshot=4986265600
Total committed heap usage (bytes)=138096640
Peak Map Physical memory (bytes)=202854400
Peak Map Virtual memory (bytes)=2490499072
Peak Reduce Physical memory (bytes)=99835904
Peak Reduce Virtual memory (bytes)=2495766528
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=118
File Output Format Counters
Bytes Written=97
Job Finished in 48.784 seconds
Estimated value of Pi is 4.00000000000000000000
|
搭建 HBase HA 集群#
切換到 H2。
修改 hbase-env.sh
。
1
| nano $HBASE_HOME/conf/hbase-env.sh
|
指定 JAVA_HOME
。
1
| export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
|
💡 同樣地,如果集群中有樹莓派等 ARM 架構的設備,應安裝對應版本的 JDK 並修改對應版本的 JAVA_HOME
,如
1
| export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-arm64
|
不使用 HBase 提供的 ZooKeeper。
1
| export HBASE_MANAGES_ZK=false
|
修改 hbase-site.xml
。
1
| nano $HBASE_HOME/conf/hbase-site.xml
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
| <configuration>
<!-- 指定 HBase 在 HDFS 上存儲的路徑 -->
<property>
<name>hbase.rootdir</name>
<value>hdfs://ns1/hbase</value>
</property>
<!-- 指定 HBase 分散式部署 -->
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<!-- 指定 ZooKeeper 地址 -->
<property>
<name>hbase.zookeeper.quorum</name>
<value>h3:2181,h6:2181,h7:2181</value>
</property>
<!-- HMaster 與 HRegionServer 最大時間差 -->
<property>
<name>hbase.master.maxclockskew</name>
<value>180000</value>
</property>
<property>
<name>hbase.wal.provider</name>
<value>filesystem</value>
</property>
</configuration>
|
修改 regionservers
。
1
| nano $HBASE_HOME/conf/regionservers
|
從 H1 複製 core-site.xml
和 hdfs-site.xml
到 H2。
1
2
| scp -r $HADOOP_HOME/etc/hadoop/core-site.xml h2:$HBASE_HOME/conf \
&& scp -r $HADOOP_HOME/etc/hadoop/hdfs-site.xml h2:$HBASE_HOME/conf
|
將剛剛配置完成的 H2 中的 HBase 複製到另外 4 台機器。
1
2
3
4
| scp -r $HBASE_HOME/conf/* h3:$HBASE_HOME/conf \
&& scp -r $HBASE_HOME/conf/* h5:$HBASE_HOME/conf \
&& scp -r $HBASE_HOME/conf/* h6:$HBASE_HOME/conf \
&& scp -r $HBASE_HOME/conf/* h7:$HBASE_HOME/conf
|
啟動 HBase HA#
切換到 H1。
啟動 HDFS。
通過執行 jps
獲得目前正在運行的相關服務。
1
2
3
4
| hadoop@h1:~$ jps
3783 NameNode
4169 Jps
4138 DFSZKFailoverController
|
切換到 H2。
啟動 HMaster。
切換到 H5。
啟動 HMaster。
1
| hbase-daemon.sh start master
|
通過執行 jps
獲得目前正在運行的相關服務。
1
2
3
4
5
| hadoop@h2:~$ jps
1052 JournalNode
7901 DataNode
8125 HMaster
8847 Jps
|
1
2
3
4
5
6
| hadoop@h3:~$ jps
6368 Jps
6210 HRegionServer
2660 QuorumPeerMain
5717 DataNode
5846 JournalNode
|
1
2
3
4
5
| hadoop@h5:~$ jps
6336 Jps
1058 JournalNode
6117 HMaster
5980 DataNode
|
1
2
3
4
5
6
| hadoop@h6:~$ jps
4722 Jps
4408 JournalNode
4248 DataNode
4570 HRegionServer
1039 QuorumPeerMain
|
1
2
3
4
5
6
| hadoop@h7:~$ jps
4402 DataNode
4563 JournalNode
4726 HRegionServer
1031 QuorumPeerMain
5000 Jps
|
測試 HBase HA#
訪問 http://192.168.0.202:16010/ 可以看到 Current Active Master: h2。
在 H2 停止 HMaster。
1
| hbase-daemon.sh stop master
|
此時訪問 http://192.168.0.205:16010/ 可以看到 Current Active Master: h5。
在 H2 再次啟動 HMaster。
1
| hbase-daemon.sh start master
|
此時訪問 http://192.168.0.202:16010/ 可以看到 Current Active Master: h5,而 Backup Master 為 h2。
大功告成#
流程的結束就是學習的開始了。保持微笑,持續進步 😊