写在前面#
本实验使用虚拟机实现了 Hadoop HA、HBase HA 和 YARN HA 的全分布式(Fully Distributed Mode)集群部署。
完全适用于本实验的设备清单:
- 2 台安装了 Oracle VM VirtualBox 的计算机,并在其中 1 台计算机安装了一个搭载 Ubuntu Server Linux 的虚拟机;
- 1 个 Raspberry Pi 3 Model B;
- 1 个路由器;
- 3 根网线使 3 台设备分别于路由器连接。
选择的设备并不需要完全一致,你可以根据自己的实验环境参考本文,比如:
- 若 1 台计算机的性能足够一定数量的虚拟机正常工作(此处 2 台计算机各自分别运行了 3 个虚拟机),则不需要 2 台及以上;
- 引入树莓派到集群中是非必须的;
- VirtualBox 并非唯一可选的虚拟机软件;
- 无线路由器也可以构建无网线的网络环境;
- ……
如何确定你真正需要的设备:
- 清点手头设备的数量、估计设备性能(以判断该设备能让多少数量的虚拟机正常工作);
- 合理设计 ZooKeeper、Hadoop、HBase 和 YARN 所部署的“机器”(裸机或虚拟机);
- 确认所需虚拟机数量,根据各设备性能给宿主机分配虚拟机的数量。
💡 值得注意的是,超过 1 个设备的话,需要将所有设备部署在同一局域网下。
以下是该分布式集群系统中每台机器上部署服务的示意图:
graph TD
subgraph fa:fa-desktop PC_1
H1[fa:fa-server H1
NameNode
zkfc]
H2[fa:fa-server H2
DataNode
HMaster
ResourceManager]
H6[fa:fa-server H6
DataNode
HRegionServer
QuorumPeerMain]
endgraph TD
subgraph fa:fa-desktop PC_2
H4[fa:fa-server H4
NameNode
zkfc]
H5[fa:fa-server H5
DataNode
HMaster
ResourceManager]
H7[fa:fa-server H7
DataNode
HRegionServer
QuorumPeerMain]
endgraph TD
subgraph fa:fa-desktop Raspberry_Pi
H3[fa:fa-server H3
DataNode
HRegionServer
QuorumPeerMain]
end启动其中 1 台虚拟机。
安装 apt-transport-https 与 ca-certificates 以使用 https 源。
1
| sudo apt install apt-transport-https ca-certificates
|
备份 sources.list
。
1
| sudo mv /etc/apt/sources.list /etc/apt/sources.list.bak
|
修改 sources.list
。
1
| sudo nano /etc/apt/sources.list
|
使用 清华大学开源软件镜像站 Ubuntu 软件源。
1
2
3
4
5
6
7
8
9
10
| deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal main restricted universe multiverse
## deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal main restricted universe multiverse
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-updates main restricted universe multiverse
## deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-updates main restricted universe multiverse
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-backports main restricted universe multiverse
## deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-backports main restricted universe multiverse
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-security main restricted universe multiverse
## deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-security main restricted universe multiverse
## deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-proposed main restricted universe multiverse
## deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-proposed main restricted universe multiverse
|
更新软件源。
1
2
| sudo apt update
sudo apt upgrade
|
安装 openjdk-8-jdk。
1
| sudo apt install openjdk-8-jdk
|
下载 Hadoop 二进制包,HBase 和 ZooKeeper 同。
1
| wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz
|
解压缩到当前目录,HBase 和 ZooKeeper 同。
1
| tar xzf hadoop-3.3.1.tar.gz
|
编辑 .bashrc
。
添加相关环境变量。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
| #Hadoop Related Options
export HADOOP_HOME=/home/hadoop/hadoop-3.3.1
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native -Djava.net.preferIPv4Stack=true"
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
#HBase Related Options
export HBASE_HOME=/home/hadoop/hbase-2.4.6
export PATH=$PATH:$HBASE_HOME/sbin:$HBASE_HOME/bin
#ZooKeeper Related Options
export ZOOKEEPER_HOME=/home/hadoop/apache-zookeeper-3.7.0-bin
export PATH=$PATH:$ZOOKEEPER_HOME/bin
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export PATH=$PATH:$JAVA_HOME/bin
|
💡 示例中的 hadoop
为用户名,请按实际情况修改,下同。
使对 .bashrc
的更改生效。
修改 sshd_config
设置。
1
| sudo nano /etc/ssh/sshd_config
|
1
2
3
4
5
| Port 22
ListenAddress 0.0.0.0
PermitRootLogin yes
PasswordAuthentication yes
X11Forwarding no
|
重启 ssh 服务以生效。
1
| sudo service ssh restart
|
克隆虚拟机#
将前文配置好的虚拟机命名为 H1,克隆出 H2 到 H7(VirtualBox 中选择 完全复制,并选择 为所有网卡重新生成 MAC 地址),并修改对应的 /etc/hostname
和 /etc/hosts
。
配置网络#
配置虚拟机获取静态 IP。
1
| sudo nano /etc/netplan/00-installer-config.yaml
|
1
2
3
4
5
6
7
8
9
10
11
| network:
renderer: networkd
ethernets:
enp0s3:
dhcp4: false
addresses: [192.168.0.201/24]
gateway4: 192.168.0.1
nameservers:
addresses: [8.8.8.8]
optional: true
version: 2
|
💡 enp0s3
为物理网卡名,请按实际情况修改,可通过ip addr
获得;gateway4
指默认网关,应该与宿主机的默认网关设置一致。
在本集群中对 H1 到 H7 共 7 台虚拟机分别配置静态 IP 为 192.168.0.201 到 192.168.0.207,所以以上修改需进行 7 次。
应用更改以生效。
查看是否生效。
其结果为生效。
1
2
3
4
5
6
7
8
9
10
11
12
| 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: **enp0s3**: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 08:00:27:9a:bf:e9 brd ff:ff:ff:ff:ff:ff
inet **192.168.0.201**/24 brd 192.168.0.255 scope global enp0s3
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:fe9a:bfe9/64 scope link
valid_lft forever preferred_lft forever
|
修改 /etc/hostname
。
1
| sudo nano /etc/hostname
|
内容如下,这是 H1 的例子。
修改 /etc/hosts
。
以下为 H1 的例子,将原本对应主机名所在行注释,需要不同虚拟机分别修改的地方是 localhost
。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
| 127.0.0.1 localhost
**## 127.0.1.1 h1**
192.168.0.201 h1
192.168.0.202 h2
192.168.0.203 h3
192.168.0.204 h4
192.168.0.205 h5
192.168.0.206 h6
192.168.0.207 h7
## The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
|
配置 SSH#
生成 ssh key。
1
| ssh-keygen -t rsa -P ""
|
复制公钥到 ~/.ssh/authorized_keys
。
1
| cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
|
集群中所有虚拟机互相复制 ssh 密钥到对方虚拟机。
1
2
3
4
5
6
7
| ssh-copy-id hadoop@h1 \
&& ssh-copy-id hadoop@h2 \
&& ssh-copy-id hadoop@h3 \
&& ssh-copy-id hadoop@h4 \
&& ssh-copy-id hadoop@h5 \
&& ssh-copy-id hadoop@h6 \
&& ssh-copy-id hadoop@h7
|
以下为针对复制 H1 密钥的部分输出。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
| hadoop@h1:~$ ssh-copy-id hadoop@h1 && ssh-copy-id hadoop@h2 &&ssh-copy-id hadoop@h3 && ssh-copy-id hadoop@h4 && ssh-copy-id hadoop@h5 && ssh-copy-id hadoop@h6 && ssh-copy-id hadoop@h7
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
(if you think this is a mistake, you may want to use -f option)
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
The authenticity of host 'h2 (192.168.0.202)' can't be established.
ECDSA key fingerprint is SHA256:C6ydAa+dfI5lcMJkMUucz60WE7p3eFLIs7fWZrTYfDE.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
hadoop@h2's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'hadoop@h2'"
and check to make sure that only the key(s) you wanted were added.
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
The authenticity of host 'h3 (192.168.0.203)' can't be established.
ECDSA key fingerprint is SHA256:OVEZc5ls6hhBFNgqmZxT/EjubDKr8oyoqwE4Wtvsk+k.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
hadoop@h3's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'hadoop@h3'"
and check to make sure that only the key(s) you wanted were added.
|
搭建 ZooKeeper 集群#
切换到 H3。
复制 ZooKeeper 模板配置文件并修改配置。
1
| cp $ZOOKEEPER_HOME/conf/zoo_sample.cfg $ZOOKEEPER_HOME/conf/zoo.cfg && nano $ZOOKEEPER_HOME/conf/zoo.cfg
|
1
2
3
4
5
| dataDir=/home/hadoop/tmp/zookeeper
server.3=h3:2888:3888
server.6=h6:2888:3888
server.7=h7:2888:3888
|
💡 集群信息格式为 server.<id>=<hostname>:2888:3888
。
其中 id
是每个机器唯一的编号,hostname
是每个机器对应的主机名. :2888:3888
前者表示 Follower 跟 Leader 的通信端口,即服务端内部通信的端口(默认2888);后者为选举端口(默认是3888)。
在每台机器上为 ZooKeeper 新建 dataDir,在目录中新建 myid
并将 id 填入,以下是分别在 H3、H6、H7 上操作的例子。
1
| mkdir -p /home/hadoop/tmp/zookeeper && echo 3 > /home/hadoop/tmp/zookeeper/myid
|
1
| mkdir -p /home/hadoop/tmp/zookeeper && echo 6 > /home/hadoop/tmp/zookeeper/myid
|
1
| mkdir -p /home/hadoop/tmp/zookeeper && echo 7 > /home/hadoop/tmp/zookeeper/myid
|
将刚刚配置完成的 H3 中的 ZooKeeper 复制到 H6 和 H7。
1
| scp -r $ZOOKEEPER_HOME/conf/* h6:$ZOOKEEPER_HOME/conf && scp -r $ZOOKEEPER_HOME/conf/* h7:$ZOOKEEPER_HOME/conf
|
将 3 台机器的 ZooKeeper 服务启动,即在 H3、H6、H7 执行下面的命令。
在 3 台机器上检查集群运行状态,因为集群必须启动至少 超过半数 的节点才能正常工作,所以应把上面的命令在 3 台机器执行后,才可以看到集群正常启动的状态反馈。
输出信息为正常启动。
1
2
3
4
5
| hadoop@h3:~$ zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/hadoop/apache-zookeeper-3.7.0-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: follower
|
1
2
3
4
5
| hadoop@h6:~$ zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/hadoop/apache-zookeeper-3.7.0-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: leader
|
1
2
3
4
5
| hadoop@h7:~$ zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/hadoop/apache-zookeeper-3.7.0-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: follower
|
💡 可以看到目前被选举为 Leader 的是 H6。
通过执行 jps
获得目前正在运行的相关服务。
1
2
3
| hadoop@h3:~$ jps
2499 Jps
2378 QuorumPeerMain
|
1
2
3
| hadoop@h6:~$ jps
3364 Jps
3279 QuorumPeerMain
|
1
2
3
| hadoop@h7:~$ jps
3511 QuorumPeerMain
3599 Jps
|
将 3 台机器的 ZooKeeper 服务停止,即在 H3、H6、H7 执行下面的命令。
搭建 Hadoop 集群#
切换到 H1。
修改 hadoop-env.sh
1
| nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh
|
指定 JAVA_HOME
。
1
| export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
|
💡 如果集群中有树莓派等 ARM 架构的设备,应安装对应版本的 JDK 并修改对应版本的 JAVA_HOME
,如。
1
| export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-arm64
|
修改 hdfs-site.xml
。
1
| nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
| <configuration>
<!-- 指定生成副本数量 -->
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<!-- 关闭权限限制以方便开发调试 -->
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<!-- NameService 名称,与 core-site.xml 一致 -->
<property>
<name>dfs.nameservices</name>
<value>ns1</value>
</property>
<!-- 指定 2 个 NameNode -->
<property>
<name>dfs.ha.namenodes.ns1</name>
<value>nn1,nn2</value>
</property>
<!-- nn1 RPC 通信地址 -->
<property>
<name>dfs.namenode.rpc-address.ns1.nn1</name>
<value>h1:9000</value>
</property>
<!-- nn1 HTTP 通信地址 -->
<property>
<name>dfs.namenode.http-address.ns1.nn1</name>
<value>h1:50070</value>
</property>
<!-- nn2 RPC 通信地址 -->
<property>
<name>dfs.namenode.rpc-address.ns1.nn2</name>
<value>h4:9000</value>
</property>
<!-- nn2 HTTP 通信地址 -->
<property>
<name>dfs.namenode.http-address.ns1.nn2</name>
<value>h4:50070</value>
</property>
<!-- JournalNode 依赖于 ZooKeeper,部署在 ZooKeeper 的机器上 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://h3:8485;h6:8485;h7:8485/ns1</value>
</property>
<!-- 指定 JournalNode 在本地磁盘存放 edits 数据的位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop/tmp/journaldata</value>
</property>
<!-- 开启 NameNode 失败自动切换 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 配置失败自动切换实现方式,访问代理类 -->
<property>
<name>dfs.client.failover.proxy.provider.ns1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 配置隔离机制方法 -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<!-- 使用 sshfence 隔离机制时需要 ssh 免登录 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<!-- 配置 sshfence 隔离机制超时时间,时间单位为 ms -->
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
<!-- 保存 FsImage 镜像的目录 -->
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/data/namenode</value>
</property>
<!-- 存放 HDFS 文件系统数据文件的目录 -->
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/data/datanode</value>
</property>
<property>
<name>heartbeat.recheck.interval</name>
<value>2000</value>
</property>
<property>
<name>dfs.heartbeat.interval</name>
<value>1</value>
</property>
</configuration>
|
💡 脑裂:即出现 2 个 namenode 同时对外提供服务,对此有 2 个隔离机制方法可选用:
- sshfence:远程登录杀死出现状况的 namenode(如果远程 ssh 登录的端口号不是 22,需在 dfs.ha.fencing.methods 处配置
sshfence(用户名:端口号)
; - shell:远程登录超时等无响应的后续解决方案,可在括号内填入自定义脚本,如
shell(/bin/true)
,此时返回 true 直接切换。
修改 core-site.xml
。
1
| nano $HADOOP_HOME/etc/hadoop/core-site.xml
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
| <configuration>
<!-- 指定 HDFS 的 NameService 为 ns1 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://ns1/</value>
</property>
<!-- 指定 Hadoop 工作目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp/hadoop</value>
</property>
<!-- 指定 ZooKeeper 地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>h3:2181,h6:2181,h7:2181</value>
</property>
<!-- 客户端连接重试次数 -->
<property>
<name>ipc.client.connect.max.retries</name>
<value>100</value>
</property>
<!-- 2 次重新建立连接之间的时间间隔,单位为 ms -->
<property>
<name>ipc.client.connect.retry.interval</name>
<value>5000</value>
</property>
</configuration>
|
修改 mapred-site.xml
。
1
| nano $HADOOP_HOME/etc/hadoop/mapred-site.xml
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
| <configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=/home/hadoop/hadoop-3.3.1</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=/home/hadoop/hadoop-3.3.1</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=/home/hadoop/hadoop-3.3.1</value>
</property>
</configuration>
|
修改 yarn-site.xml
。
1
| nano $HADOOP_HOME/etc/hadoop/yarn-site.xml
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
| <configuration>
<!-- 开启 ResourceManager 高可用 -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!-- 指定 ResourceManager 集群 id -->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yrc</value>
</property>
<!-- 指定分组下的 ResourceManager 的名称 -->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!-- 配置 RM1 -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>h2</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm1</name>
<value>h2:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>h2:8030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>h2:8088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
<value>h2:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm1</name>
<value>h2:8033</value>
</property>
<property>
<name>yarn.resourcemanager.ha.admin.address.rm1</name>
<value>h2:23142</value>
</property>
<!-- 配置 RM2 -->
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>h5</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>h5:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>h5:8030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>h5:8088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
<value>h5:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm2</name>
<value>h5:8033</value>
</property>
<property>
<name>yarn.resourcemanager.ha.admin.address.rm2</name>
<value>h5:23142</value>
</property>
<!-- 指定 ZooKeeper 集群地址 -->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>h3:2181,h6:2181,h7:2181</value>
</property>
<!-- reducer 取数据的方式是 mapreduce_shuffle -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
|
配置 workers
。
1
| nano $HADOOP_HOME/etc/hadoop/workers
|
创建上文配置文件中所需要的目录。
1
| mkdir -p /home/hadoop/tmp/journaldata /home/hadoop/tmp/hadoop
|
将刚刚配置完成的 H1 中的 Hadoop 复制到另外 6 台机器。
1
2
3
4
5
6
| scp -r $HADOOP_HOME/etc/hadoop/* h2:$HADOOP_HOME/etc/hadoop \
&& scp -r $HADOOP_HOME/etc/hadoop/* h3:$HADOOP_HOME/etc/hadoop \
&& scp -r $HADOOP_HOME/etc/hadoop/* h4:$HADOOP_HOME/etc/hadoop \
&& scp -r $HADOOP_HOME/etc/hadoop/* h5:$HADOOP_HOME/etc/hadoop \
&& scp -r $HADOOP_HOME/etc/hadoop/* h6:$HADOOP_HOME/etc/hadoop \
&& scp -r $HADOOP_HOME/etc/hadoop/* h7:$HADOOP_HOME/etc/hadoop
|
切换到 H4 为其新建 NameNode 保存 FsImage 镜像的目录。
1
| mkdir -p /home/hadoop/data/namenode
|
启动 HDFS HA#
将 H3、H6、H7 的 ZooKeeper 服务启动,即在 H3、H6、H7 执行下面的命令。
在 H3、H6、H7 上检查集群运行状态。
输出信息为正常启动。
1
2
3
4
5
| hadoop@h3:~$ zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/hadoop/apache-zookeeper-3.7.0-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: follower
|
1
2
3
4
5
| hadoop@h6:~$ zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/hadoop/apache-zookeeper-3.7.0-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: leader
|
1
2
3
4
5
| hadoop@h7:~$ zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/hadoop/apache-zookeeper-3.7.0-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: follower
|
💡 可以看到目前被选举为 Leader 的仍是 H6。
将 H2、H3、H5、H6、H7 的 JournalNode 服务启动,即在 H2、H3、H5、H6、H7 执行下面的命令。
1
| hdfs --daemon start journalnode
|
💡 Hadoop 集群第一次启动时需要手动启动 JournalNode,上面的步骤在以后不需要操作。
通过执行 jps
获得目前正在运行的相关服务。
1
2
3
| hadoop@h2:~$ jps
4631 Jps
4588 JournalNode
|
1
2
3
4
| hadoop@h3:~$ jps
2864 JournalNode
2729 QuorumPeerMain
2938 Jps
|
1
2
3
| hadoop@h5:~$ jps
4371 Jps
4325 JournalNode
|
1
2
3
4
| hadoop@h6:~$ jps
3474 QuorumPeerMain
3594 JournalNode
3644 Jps
|
1
2
3
4
| hadoop@h7:~$ jps
3890 Jps
3848 JournalNode
3736 QuorumPeerMain
|
切换到 H1 格式化 HDFS。
复制 H1 NameNode 的 FSImage 到 H4 的 NameNode 上,确保两者一致。
1
| scp -r /home/hadoop/data/namenode/* h4:/home/hadoop/data/namenode
|
格式化 ZKFC。
启动 HDFS。
通过执行 jps
获得目前正在运行的相关服务。
1
2
3
4
| hadoop@h1:~$ jps
22368 Jps
21960 NameNode
22317 DFSZKFailoverController
|
查看报告可以发现 5 个 DataNode 状态正常。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
| Configured Capacity: 101142011904 (94.20 GB)
Present Capacity: 59280982016 (55.21 GB)
DFS Remaining: 59280859136 (55.21 GB)
DFS Used: 122880 (120 KB)
DFS Used%: 0.00%
Replicated Blocks:
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 0
Erasure Coded Block Groups:
Low redundancy block groups: 0
Block groups with corrupt internal blocks: 0
Missing block groups: 0
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 0
-------------------------------------------------
Live datanodes (5):
Name: 192.168.0.202:9866 (h2)
Hostname: ip6-localhost
Decommission Status : Normal
Configured Capacity: 9971224576 (9.29 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 7673147392 (7.15 GB)
DFS Remaining: 1771356160 (1.65 GB)
DFS Used%: 0.00%
DFS Remaining%: 17.76%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 0
Last contact: Mon Oct 25 02:17:04 UTC 2021
Last Block Report: Mon Oct 25 02:15:29 UTC 2021
Num of Blocks: 0
Name: 192.168.0.203:9866 (h3)
Hostname: ip6-localhost
Decommission Status : Normal
Configured Capacity: 61257113600 (57.05 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 6547861504 (6.10 GB)
DFS Remaining: 52175802368 (48.59 GB)
DFS Used%: 0.00%
DFS Remaining%: 85.18%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 0
Last contact: Mon Oct 25 02:17:06 UTC 2021
Last Block Report: Mon Oct 25 02:15:54 UTC 2021
Num of Blocks: 0
Name: 192.168.0.205:9866 (h5)
Hostname: ip6-localhost
Decommission Status : Normal
Configured Capacity: 9971224576 (9.29 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 7665070080 (7.14 GB)
DFS Remaining: 1779433472 (1.66 GB)
DFS Used%: 0.00%
DFS Remaining%: 17.85%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 0
Last contact: Mon Oct 25 02:17:06 UTC 2021
Last Block Report: Mon Oct 25 02:15:33 UTC 2021
Num of Blocks: 0
Name: 192.168.0.206:9866 (h6)
Hostname: ip6-localhost
Decommission Status : Normal
Configured Capacity: 9971224576 (9.29 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 7667515392 (7.14 GB)
DFS Remaining: 1776988160 (1.65 GB)
DFS Used%: 0.00%
DFS Remaining%: 17.82%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 0
Last contact: Mon Oct 25 02:17:05 UTC 2021
Last Block Report: Mon Oct 25 02:15:38 UTC 2021
Num of Blocks: 0
Name: 192.168.0.207:9866 (h7)
Hostname: ip6-localhost
Decommission Status : Normal
Configured Capacity: 9971224576 (9.29 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 7667224576 (7.14 GB)
DFS Remaining: 1777278976 (1.66 GB)
DFS Used%: 0.00%
DFS Remaining%: 17.82%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 0
Last contact: Mon Oct 25 02:17:07 UTC 2021
Last Block Report: Mon Oct 25 02:15:31 UTC 2021
Num of Blocks: 0
|
测试 HDFS HA#
访问 http://192.168.0.201:50070/ 可以看到 ‘h1:9000’ (active)
在 H1 停止 NameNode。
1
| hdfs --daemon stop namenode
|
此时访问 http://192.168.0.204:50070/ 可以看到 ‘h4:9000’ (active)。
在 H1 再次启用 NameNode。
1
| hdfs --daemon start namenode
|
此时访问 http://192.168.0.204:50070/ 可以看到 ‘h1:9000’ (standby)。
启动 YARN#
将 H2、H5 的 YARN 服务启动,即在 H2 执行下面的命令。
通过执行 jps
获得目前正在运行的相关服务。
1
2
3
4
5
6
| hadoop@h2:~$ jps
5203 JournalNode
6691 NodeManager
6534 ResourceManager
6838 Jps
5692 DataNode
|
1
2
3
4
5
6
| hadoop@h5:~$ jps
6099 Jps
5395 DataNode
4921 JournalNode
5659 ResourceManager
5789 NodeManager
|
测试 YARN HA#
访问 http://192.168.0.202:8188/ 可以看到 ResourceManager HA state: active。
在 H2 停止 ResourceManager。
1
| yarn --daemon stop resourcemanager
|
此时访问 http://192.168.0.205:8188/ 可以看到 ResourceManager HA state: active。
在 H2 再次启用 NameNode。
1
| yarn --daemon start resourcemanager
|
此时访问 http://192.168.0.202:8188/ 可以看到 ResourceManager HA state: standby。
测试 MapReduce#
切换到 H2。
1
| hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.1.jar pi 1 1
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
| hadoop@h2:~$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.1.jar pi 1 1
Number of Maps = 1
Samples per Map = 1
Wrote input for Map #0
Starting Job
2021-10-25 07:08:04,762 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
2021-10-25 07:08:04,977 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1635145651320_0001
2021-10-25 07:08:05,934 INFO input.FileInputFormat: Total input files to process : 1
2021-10-25 07:08:06,626 INFO mapreduce.JobSubmitter: number of splits:1
2021-10-25 07:08:06,973 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1635145651320_0001
2021-10-25 07:08:06,975 INFO mapreduce.JobSubmitter: Executing with tokens: []
2021-10-25 07:08:07,341 INFO conf.Configuration: resource-types.xml not found
2021-10-25 07:08:07,342 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2021-10-25 07:08:07,991 INFO impl.YarnClientImpl: Submitted application application_1635145651320_0001
2021-10-25 07:08:08,048 INFO mapreduce.Job: The url to track the job: http://h5:8188/proxy/application_1635145651320_0001/
2021-10-25 07:08:08,051 INFO mapreduce.Job: Running job: job_1635145651320_0001
2021-10-25 07:08:25,438 INFO mapreduce.Job: Job job_1635145651320_0001 running in uber mode : false
2021-10-25 07:08:25,439 INFO mapreduce.Job: map 0% reduce 0%
2021-10-25 07:08:34,585 INFO mapreduce.Job: map 100% reduce 0%
2021-10-25 07:08:50,737 INFO mapreduce.Job: map 100% reduce 100%
2021-10-25 07:08:52,774 INFO mapreduce.Job: Job job_1635145651320_0001 completed successfully
2021-10-25 07:08:52,993 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=28
FILE: Number of bytes written=555793
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=255
HDFS: Number of bytes written=215
HDFS: Number of read operations=9
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
HDFS: Number of bytes read erasure-coded=0
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=5590
Total time spent by all reduces in occupied slots (ms)=14233
Total time spent by all map tasks (ms)=5590
Total time spent by all reduce tasks (ms)=14233
Total vcore-milliseconds taken by all map tasks=5590
Total vcore-milliseconds taken by all reduce tasks=14233
Total megabyte-milliseconds taken by all map tasks=5724160
Total megabyte-milliseconds taken by all reduce tasks=14574592
Map-Reduce Framework
Map input records=1
Map output records=2
Map output bytes=18
Map output materialized bytes=28
Input split bytes=137
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=28
Reduce input records=2
Reduce output records=0
Spilled Records=4
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=220
CPU time spent (ms)=1050
Physical memory (bytes) snapshot=302690304
Virtual memory (bytes) snapshot=4986265600
Total committed heap usage (bytes)=138096640
Peak Map Physical memory (bytes)=202854400
Peak Map Virtual memory (bytes)=2490499072
Peak Reduce Physical memory (bytes)=99835904
Peak Reduce Virtual memory (bytes)=2495766528
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=118
File Output Format Counters
Bytes Written=97
Job Finished in 48.784 seconds
Estimated value of Pi is 4.00000000000000000000
|
搭建 HBase HA 集群#
切换到 H2。
修改 hbase-env.sh
。
1
| nano $HBASE_HOME/conf/hbase-env.sh
|
指定 JAVA_HOME
。
1
| export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
|
💡 同样地,如果集群中有树莓派等 ARM 架构的设备,应安装对应版本的 JDK 并修改对应版本的 JAVA_HOME
,如
1
| export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-arm64
|
不使用 HBase 提供的 ZooKeeper。
1
| export HBASE_MANAGES_ZK=false
|
修改 hbase-site.xml
。
1
| nano $HBASE_HOME/conf/hbase-site.xml
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
| <configuration>
<!-- 指定 HBase 在 HDFS 上存储的路径 -->
<property>
<name>hbase.rootdir</name>
<value>hdfs://ns1/hbase</value>
</property>
<!-- 指定 HBase 分布式部署 -->
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<!-- 指定 ZooKeeper 地址 -->
<property>
<name>hbase.zookeeper.quorum</name>
<value>h3:2181,h6:2181,h7:2181</value>
</property>
<!-- HMaster 与 HRegionServer 最大时间差 -->
<property>
<name>hbase.master.maxclockskew</name>
<value>180000</value>
</property>
<property>
<name>hbase.wal.provider</name>
<value>filesystem</value>
</property>
</configuration>
|
修改 regionservers
。
1
| nano $HBASE_HOME/conf/regionservers
|
从 H1 复制 core-site.xml
和 hdfs-site.xml
到 H2。
1
2
| scp -r $HADOOP_HOME/etc/hadoop/core-site.xml h2:$HBASE_HOME/conf \
&& scp -r $HADOOP_HOME/etc/hadoop/hdfs-site.xml h2:$HBASE_HOME/conf
|
将刚刚配置完成的 H2 中的 HBase 复制到另外 4 台机器。
1
2
3
4
| scp -r $HBASE_HOME/conf/* h3:$HBASE_HOME/conf \
&& scp -r $HBASE_HOME/conf/* h5:$HBASE_HOME/conf \
&& scp -r $HBASE_HOME/conf/* h6:$HBASE_HOME/conf \
&& scp -r $HBASE_HOME/conf/* h7:$HBASE_HOME/conf
|
启动 HBase HA#
切换到 H1。
启动 HDFS。
通过执行 jps
获得目前正在运行的相关服务。
1
2
3
4
| hadoop@h1:~$ jps
3783 NameNode
4169 Jps
4138 DFSZKFailoverController
|
切换到 H2。
启动 HMaster。
切换到 H5。
启动 HMaster。
1
| hbase-daemon.sh start master
|
通过执行 jps
获得目前正在运行的相关服务。
1
2
3
4
5
| hadoop@h2:~$ jps
1052 JournalNode
7901 DataNode
8125 HMaster
8847 Jps
|
1
2
3
4
5
6
| hadoop@h3:~$ jps
6368 Jps
6210 HRegionServer
2660 QuorumPeerMain
5717 DataNode
5846 JournalNode
|
1
2
3
4
5
| hadoop@h5:~$ jps
6336 Jps
1058 JournalNode
6117 HMaster
5980 DataNode
|
1
2
3
4
5
6
| hadoop@h6:~$ jps
4722 Jps
4408 JournalNode
4248 DataNode
4570 HRegionServer
1039 QuorumPeerMain
|
1
2
3
4
5
6
| hadoop@h7:~$ jps
4402 DataNode
4563 JournalNode
4726 HRegionServer
1031 QuorumPeerMain
5000 Jps
|
测试 HBase HA#
访问 http://192.168.0.202:16010/ 可以看到 Current Active Master: h2。
在 H2 停止 HMaster。
1
| hbase-daemon.sh stop master
|
此时访问 http://192.168.0.205:16010/ 可以看到 Current Active Master: h5。
在 H2 再次启动 HMaster。
1
| hbase-daemon.sh start master
|
此时访问 http://192.168.0.202:16010/ 可以看到 Current Active Master: h5,而 Backup Master 为 h2。
大功告成#
流程的结束就是学习的开始了。保持微笑,持续进步 😊