虚拟机hadoop集群配置流程记录
目录
1 方案设计
Hadoop完全分布式部署需要在真正的多个节点环境中部署,本次安装Hadoop在Linux环境下进行,需要准备好三台Linux虚拟机环境,模拟三个结点。
操作系统和软件版本
Hadoop使用JAVA语言编写,所以在安装Hadoop之前需要安装JAVA运行环境。 hadoop2.x可运行在Jave7&8两个版本,hadoop3.x仅可运行在Java8及以上版本。 本文中使用的系统与软件版本如下: - Linux操作系统:centos7.9 - JDK版本为:JAVA1.8 - Hadoop版本为:hadoop-2.10.2
Hadoop安装位置,全部结点均使用相同位置:
/opt/bigdata/hadoop-2.10.2/
集群架构
准备三台服务器,可以使用PVE建立三台CentOS虚拟机,IP与结点角色分配如下:
框架-结点 | master | slave1/worker1 | slave2/worker2 |
---|---|---|---|
IP | 192.168.5.131 | 192.168.5.233 | 192.168.5.248 |
HDFS进程 | NameNode DataNode |
DataNode | SecondaryNameNode DataNode |
YARN进程 | NodeManager | NodeManager | NodeManager ResourceManager |
2 安装JDK
安装jdk(以master为例,主节点和全部从节点都要配置)
# yum install java-1.8.0-openjdk.x86_64 -y
修改profile,增加JAVA_HOME设置;
执行source /etc/profile 使配置生效
检查java版本
java -version
3 配置环境变量
修改主机名 修改三个节点的/etc/hosts文件,三台机器分别执行以下语句:
hostnamectl set-hostname master
hostnamectl set-hostname worker1
hostnamectl set-hostname worker2
添加ip映射,通过vi编辑修改
vi /etc/hosts
# 在/etc/hosts的末尾添加
192.168.5.131 master
192.168.5.233 slave1
192.168.5.248 slave2
4 安装hadoop
下载安装包 https://hadoop.apache.org/releases.html
下载2.X版本 https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.10.2/hadoop-2.10.2.tar.gz
Linux下载命令
wget https://dlcdn.apache.org/hadoop/common/hadoop-2.10.2/hadoop-2.10.2.tar.gz
开始配置hadoop集群,先配置主结点master,各个从结点使用master中的hdoop文件同步完成配置;
解压缩文件包,安装文件到 /opt/bigdata
tar -zxvf hadoop-2.10.2.tar.gz -C /opt/bigdata
添加Hadoop到环境变量
vi /etc/profile
##HADOOP_HOME
export HADOOP_HOME=/opt/bigdata/hadoop-2.10.2
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
更新环境变量
source /etc/profile
查看hadoop版本信息,成功显示则安装成功
[root@master ~]# hadoop version
Hadoop 2.10.2
Subversion Unknown -r 965fd380006fa78b2315668fbc7eb432e1d8200f
Compiled by ubuntu on 2022-05-24T22:35Z
Compiled with protoc 2.5.0
From source with checksum d3ab737f7788f05d467784f0a86573fe
This command was run using /opt/bigdata/hadoop-2.10.2/share/hadoop/common/hadoop-common-2.10.2.jar
[root@master ~]#
hadoop目录说明:
查看hadoop文件目录
ll /opt/bigdata/hadoop-2.10.2/
1.bin目录:对Hadoop进行操作的相关命令,如hadoop,hdfs等 2.etc目录:Hadoop的配置文件目录,入hdfs-site.xml,core-site.xml等 3.lib目录:Hadoop本地库(解压缩的依赖) 4.sbin目录:存放的是Hadoop集群启动停止相关脚本,命令 5.share目录:Hadoop的一些jar,官方案例jar,文档等
5 集群配置
主结点中修改以下四个核心文件,完成后同步到各个从结点 - hadoop-env.sh - core-site.xml - mapred-site.xml - yarn-site.xml
Hadoop集群配置 ==> HDFS集群配置 + MapReduce集群配置 + Yarn集群配置
配置hadoop-env.sh 将JDK路径加入配置给HDFS
hadoop-env.sh
vi etc/hadoop/hadoop-env.sh
# export JAVA_HOME
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.312.b07-1.el7_9.x86_64
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
core-site.xml
指定NameNode节点以及数据存储目录
vi etc/hadoop/core-site.xml
<configuration>
<!-- 非必须项 -->
<!-- 配置HDFS网页登录使用的静态用户为dlw -->
<property>
<name>hadoop.http.staticuser.user</name>
<value>dlw</value>
<value>root</value>
</property>
<!-- 指定HDFS中NameNode的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<!-- 指定Hadoop运行时产生文件的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>file:/var/tmp/hadoop/tmp</value>
</property>
</configuration>
hdfs-site.xml
指定secondarynamenode节点
vi etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.http-address</name>
<value>0.0.0.0:50070</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/var/tmp/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/var/tmp/hadoop/dfs/data</value>
</property>
<!--副本数量 -->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
mapred-site.xml
配置历史服务器 在Yarn中运行的任务产生的日志数据不能查看,为了查看程序的历史运行情况,需要配置一下历史日志服务器。
vi mapred-site.xml
<configuration>
<!-- 指定MapReduce程序运行在Yarn上 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- jobhistory properties -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<!-- 历史服务器web端地址 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>
配置yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>slave2</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA HOME,HADOOP COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
<!-- 开启日志聚集功能 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 设置日志聚集服务器地址 -->
<property>
<name>yarn.log.server.url</name>
<value>http://192.168.5.131:19888/jobhistory/logs</value>
</property>
<!-- 设置日志保留时间为7天 -->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
</configuration>
6 配置结点间免密登录
先在主结点完成,然后分别在各个从结点依次完成; 主结点操作如下: 运行ssh-keygen -t rsa 命令,使用默认信息生成key;同步Key到各从结点;
[root@master ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:gVLLOMv+9bI5Nxw/qTtB9yT9gugNNC8UYtiWx7BkcDw root@master
The key's randomart image is:
+---[RSA 2048]----+
| o==+ |
| +.*E.+ |
| + +oo+ . . |
| . + .= o o |
| o S+ = = . |
| . * o o .|
| . .o B . . |
| . .oo* = |
| . o=+= . |
+----[SHA256]-----+
[root@master ~]#
然后把公钥复制到各个节点,这里是将master的公钥拷贝到slave1和slave2上,第一次登陆新结点,系统会提示目标结点的登录密码,这里使用root用户,因此为root密码;
ssh-copy-id master
ssh-copy-id worker1
ssh-copy-id worker2
[root@master temp]# ssh-copy-id slave2
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host 'slave2 (192.168.5.248)' can't be established.
ECDSA key fingerprint is SHA256:SD7NPOpvABQCFbgp7e0R7gJH1no+Z3IolZsHp+Yh66g.
ECDSA key fingerprint is MD5:fd:fd:a2:28:ae:9e:5c:09:bf:1a:5f:a7:e0:be:a6:65.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@slave2's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'slave2'"
and check to make sure that only the key(s) you wanted were added.
[root@master temp]#
测试免密登录是否配置成功
ssh slave2
exit # 回到master节点
- 备注:在slave1,slave2需要分别重复执行以上操作,完成各结点之间的登录授权。
slave结点设置,通过分发文件方式完成;
配置slaves/workers 在/opt/bigdata/hadoop-2.10.2/etc/hadoop/路径下编辑slaves或workers 删除localhost 加入:
master
slave1
slave2
workers中不能有任何多余的空格,不能有多余的空行。
7 软件与配置文件多结点分发
在master主结点中完成设置后,通过以下方式向其它结点复制分发文件; 此种方式确定各结点软件版本与配置保持一致; 可以使用scp或rsync两种工具完成,
分发jdk,$PWD:获取当前所在目录的绝对路径
scp -r java-1.8.0-openjdk-1.8.0.332.b09-1.el7_9.x86_64/ root@slave1:$PWD
scp -r java-1.8.0-openjdk-1.8.0.332.b09-1.el7_9.x86_64/ root@slave2:$PWD
分发hadoop
scp -r hadoop-2.10.2 root@slave1:$PWD
scp -r hadoop-2.10.2 root@slave2:$PWD
分发/etc/hosts
scp /etc/hosts root@slave1:/etc/
scp /etc/hosts root@slave2:/etc/
分发/etc/profile
scp /etc/profile root@slave1:/etc/
scp /etc/profile root@slave2:/etc/
分发后在各从节点上执行source /etc/profile,更新本地配置
分发/opt/bigdata/hadoop-2.10.2/etc/hadoop/slaves
scp /opt/bigdata/hadoop-2.10.2/etc/hadoop/slaves root@slave1:/opt/bigdata/hadoop-2.10.2/etc/hadoop/
scp /opt/bigdata/hadoop-2.10.2/etc/hadoop/slaves root@slave2:/opt/bigdata/hadoop-2.10.2/etc/hadoop/
备注:hadoop2.x中使用slave,hadoop3.x中使用worker
8 启动集群
在第一次启动集群的时候需要将master中的NameNode节点格式化; 首页启动,要进行初始化节点操作,仅首次操作;如再次操作,会更改结点的ClusterID值,造成DataNode中ClusterID值不匹配问题,导致DataNode启动失败;
hdfs namenode -format
- 启动hdfs
start-dfs.sh
- 启动yarn
start-yarn.sh
- 查看进程
jps
[root@master hadoop-2.10.2]# ./sbin/start-dfs.sh
Starting namenodes on [master]
master: namenode running as process 1014. Stop it first.
localhost: starting datanode, logging to /opt/bigdata/hadoop-2.10.2/logs/hadoop-root-datanode-master.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is SHA256:dZyz2xZPzpfSayK73IsRVyLdcyPhe0E7oPdoAVgRUFk.
ECDSA key fingerprint is MD5:89:f5:6c:70:a5:d6:52:6a:bb:a4:7b:dc:41:4d:51:c1.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /opt/bigdata/hadoop-2.10.2/logs/hadoop-root-secondarynamenode-master.out
通过jps命令,在各结点查看是否启动成功
[root@master hadoop-2.10.2]# jps
1586 DataNode
1746 SecondaryNameNode
1014 NameNode
1868 Jps
[root@master hadoop-2.10.2]#
通过web端管理界面,查看集群运行状态
Check Hadoop Status: http://192.168.5.131:50070/dfshealth.html#tab-overview
启动历史服务器
[root@master hadoop-2.9.2]$ sbin/mr-jobhistory-daemon.sh start historyserver
9 测试集群
Example1:
生成words.txt,随便录入些内容做wordcount测试集群;
hdfs dfs -mkdir /input
hdfs dfs -put words.txt /input/
hadoop jar /opt/bigdata/hadoop-2.10.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.10.2.jar wordcount /input/ /output/
hdfs dfs -ls /output
hdfs dfs -cat /output/part-r-00000
[root@master bg]# hdfs dfs -cat /output/part-r-00000
hadoop 4
hdfs 3
mapreduce 2
yarn 2
[root@master bg]#
10 常见问题
error:
运行hadoop命令后,出现which命令无法识别,需求安装which工具;
[root@master ~]# hadoop version
/opt/bigdata/hadoop-2.10.2/bin/hadoop: line 20: which: command not found
dirname: missing operand
Try 'dirname --help' for more information.
/opt/bigdata/hadoop-2.10.2/bin/hadoop: line 27: /root/../libexec/hadoop-config.sh: No such file or directory
/opt/bigdata/hadoop-2.10.2/bin/hadoop: line 169: exec: : not found
[root@master ~]# yum install which -y
error:
浏览器中无法访问Hadoop管理界面: http://192.168.5.131:50070/dfshealth.html#tab-overview 无法访问
排除防火墙原因,优先检查hdfs-site.xml文件,将127.0.0.1修改为0.0.0.0:设置如下:
<property>
<name>dfs.namenode.http-address</name>
<value>0.0.0.0:50070</value>
</property>
error
结点中DataNode正常启动,但Web管理页面中Live Node对应值为0或1 修改/etc/hosts文件如下,删除不必要的IP映射;
[root@master hadoop]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.5.131 master
192.168.5.233 slave1
192.168.5.248 slave2
重启集群并刷新页面
error
hadoop分布式集群搭建中,hadoop3.x和hadoop2.x之间在默认配置中存在差别,如web端口从50070改为了9870,使用过程中注意区别版本;
Links:
- Ref Link: https://blog.csdn.net/rznice/article/details/52219909
- Ref Link: https://blog.csdn.net/qq_39604679/article/details/125017691
- Ref Link: https://blog.csdn.net/qq_52289797/article/details/120324689
- Ref Link: https://blog.csdn.net/u011811966/article/details/78424217
- Ref Link: https://blog.csdn.net/m0_67393619/article/details/124246473