自在工坊

走在代码边缘

Howto Setup Hadoop Cluster on Linux CentOS7.9 Server

虚拟机hadoop集群配置流程记录

目录

1 方案设计

2 安装JDK

3 配置环境变量

4 安装Hadoop

5 配置集群

6 配置结点间免密登录

7 软件与配置文件多结点分发

8 启动集群

9 测试集群

10 常见问题

1 方案设计

Hadoop完全分布式部署需要在真正的多个节点环境中部署,本次安装Hadoop在Linux环境下进行,需要准备好三台Linux虚拟机环境,模拟三个结点。

操作系统和软件版本

Hadoop使用JAVA语言编写,所以在安装Hadoop之前需要安装JAVA运行环境。 hadoop2.x可运行在Jave7&8两个版本,hadoop3.x仅可运行在Java8及以上版本。 本文中使用的系统与软件版本如下: - Linux操作系统:centos7.9 - JDK版本为:JAVA1.8 - Hadoop版本为:hadoop-2.10.2

Hadoop安装位置,全部结点均使用相同位置:
/opt/bigdata/hadoop-2.10.2/

集群架构

准备三台服务器,可以使用PVE建立三台CentOS虚拟机,IP与结点角色分配如下:

框架-结点 master slave1/worker1 slave2/worker2
IP 192.168.5.131 192.168.5.233 192.168.5.248
HDFS进程 NameNode
DataNode
DataNode SecondaryNameNode
DataNode
YARN进程 NodeManager NodeManager NodeManager
ResourceManager

返回顶部

2 安装JDK

安装jdk(以master为例,主节点和全部从节点都要配置)

# yum install java-1.8.0-openjdk.x86_64 -y 

修改profile,增加JAVA_HOME设置;

执行source /etc/profile 使配置生效

检查java版本

java -version

返回顶部

3 配置环境变量

修改主机名 修改三个节点的/etc/hosts文件,三台机器分别执行以下语句:

hostnamectl set-hostname master
hostnamectl set-hostname worker1
hostnamectl set-hostname worker2

添加ip映射,通过vi编辑修改

vi /etc/hosts

# 在/etc/hosts的末尾添加
192.168.5.131 master
192.168.5.233 slave1
192.168.5.248 slave2

返回顶部

4 安装hadoop

下载安装包 https://hadoop.apache.org/releases.html

下载2.X版本 https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.10.2/hadoop-2.10.2.tar.gz

Linux下载命令

wget https://dlcdn.apache.org/hadoop/common/hadoop-2.10.2/hadoop-2.10.2.tar.gz

开始配置hadoop集群,先配置主结点master,各个从结点使用master中的hdoop文件同步完成配置;

解压缩文件包,安装文件到 /opt/bigdata

tar -zxvf hadoop-2.10.2.tar.gz -C /opt/bigdata

添加Hadoop到环境变量

vi /etc/profile

##HADOOP_HOME
export HADOOP_HOME=/opt/bigdata/hadoop-2.10.2
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin

更新环境变量

source /etc/profile

查看hadoop版本信息,成功显示则安装成功

[root@master ~]# hadoop version
Hadoop 2.10.2
Subversion Unknown -r 965fd380006fa78b2315668fbc7eb432e1d8200f
Compiled by ubuntu on 2022-05-24T22:35Z
Compiled with protoc 2.5.0
From source with checksum d3ab737f7788f05d467784f0a86573fe
This command was run using /opt/bigdata/hadoop-2.10.2/share/hadoop/common/hadoop-common-2.10.2.jar
[root@master ~]#

hadoop目录说明:

查看hadoop文件目录

ll /opt/bigdata/hadoop-2.10.2/

1.bin目录:对Hadoop进行操作的相关命令,如hadoop,hdfs等 2.etc目录:Hadoop的配置文件目录,入hdfs-site.xml,core-site.xml等 3.lib目录:Hadoop本地库(解压缩的依赖) 4.sbin目录:存放的是Hadoop集群启动停止相关脚本,命令 5.share目录:Hadoop的一些jar,官方案例jar,文档等

返回顶部

5 集群配置

主结点中修改以下四个核心文件,完成后同步到各个从结点 - hadoop-env.sh - core-site.xml - mapred-site.xml - yarn-site.xml

Hadoop集群配置 ==> HDFS集群配置 + MapReduce集群配置 + Yarn集群配置

配置hadoop-env.sh 将JDK路径加入配置给HDFS

hadoop-env.sh

vi etc/hadoop/hadoop-env.sh

# export JAVA_HOME
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.312.b07-1.el7_9.x86_64
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root

core-site.xml

指定NameNode节点以及数据存储目录

vi etc/hadoop/core-site.xml

<configuration>
<!-- 非必须项 -->
    <!-- 配置HDFS网页登录使用的静态用户为dlw -->

<property>
        <name>hadoop.http.staticuser.user</name>
        <value>dlw</value>
        <value>root</value>

</property>

<!-- 指定HDFS中NameNode的地址 -->
<property>
    <name>fs.defaultFS</name>
    <value>hdfs://master:9000</value>
</property>

<!-- 指定Hadoop运行时产生文件的存储目录 --> 
<property>
    <name>hadoop.tmp.dir</name>
    <value>file:/var/tmp/hadoop/tmp</value>
</property>
</configuration>

hdfs-site.xml

指定secondarynamenode节点

vi etc/hadoop/hdfs-site.xml

<configuration>
<property>
  <name>dfs.namenode.http-address</name>
  <value>0.0.0.0:50070</value>
</property>

<property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/var/tmp/hadoop/dfs/name</value>
</property>
<property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/var/tmp/hadoop/dfs/data</value>
</property>
<!--副本数量 --> 
<property>
      <name>dfs.replication</name>
      <value>3</value>
</property>
</configuration>

mapred-site.xml

配置历史服务器 在Yarn中运行的任务产生的日志数据不能查看,为了查看程序的历史运行情况,需要配置一下历史日志服务器。

vi mapred-site.xml

<configuration>
<!-- 指定MapReduce程序运行在Yarn上 -->
<property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
</property>

<!-- jobhistory properties -->
<property>
    <name>mapreduce.jobhistory.address</name>
    <value>master:10020</value>
</property>

<!-- 历史服务器web端地址 -->
<property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>master:19888</value>
</property>
</configuration>

配置yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->
<property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>slave2</value>
    </property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA HOME,HADOOP COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    </property>

     <!-- 开启日志聚集功能 -->
    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>

    <!-- 设置日志聚集服务器地址 -->
    <property>
        <name>yarn.log.server.url</name>
        <value>http://192.168.5.131:19888/jobhistory/logs</value>
    </property>

    <!-- 设置日志保留时间为7天 -->
    <property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>604800</value>
    </property>

</configuration>

返回顶部

6 配置结点间免密登录

先在主结点完成,然后分别在各个从结点依次完成; 主结点操作如下: 运行ssh-keygen -t rsa 命令,使用默认信息生成key;同步Key到各从结点;

[root@master ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:gVLLOMv+9bI5Nxw/qTtB9yT9gugNNC8UYtiWx7BkcDw root@master
The key's randomart image is:
+---[RSA 2048]----+
|      o==+       |
|     +.*E.+      |
|    + +oo+ . .   |
|   . +   .= o o  |
|    o   S+ = = . |
|   .      * o o .|
|    .   .o B . . |
|     . .oo* =    |
|      . o=+= .   |
+----[SHA256]-----+
[root@master ~]#

然后把公钥复制到各个节点,这里是将master的公钥拷贝到slave1和slave2上,第一次登陆新结点,系统会提示目标结点的登录密码,这里使用root用户,因此为root密码;

ssh-copy-id master
ssh-copy-id worker1
ssh-copy-id worker2

[root@master temp]# ssh-copy-id slave2
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host 'slave2 (192.168.5.248)' can't be established.
ECDSA key fingerprint is SHA256:SD7NPOpvABQCFbgp7e0R7gJH1no+Z3IolZsHp+Yh66g.
ECDSA key fingerprint is MD5:fd:fd:a2:28:ae:9e:5c:09:bf:1a:5f:a7:e0:be:a6:65.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@slave2's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'slave2'"
and check to make sure that only the key(s) you wanted were added.

[root@master temp]#

测试免密登录是否配置成功

ssh slave2
exit # 回到master节点
  • 备注:在slave1,slave2需要分别重复执行以上操作,完成各结点之间的登录授权。

slave结点设置,通过分发文件方式完成;

配置slaves/workers 在/opt/bigdata/hadoop-2.10.2/etc/hadoop/路径下编辑slaves或workers 删除localhost 加入:

master
slave1
slave2

workers中不能有任何多余的空格,不能有多余的空行。

返回顶部

7 软件与配置文件多结点分发

在master主结点中完成设置后,通过以下方式向其它结点复制分发文件; 此种方式确定各结点软件版本与配置保持一致; 可以使用scp或rsync两种工具完成,

分发jdk,$PWD:获取当前所在目录的绝对路径

scp -r java-1.8.0-openjdk-1.8.0.332.b09-1.el7_9.x86_64/ root@slave1:$PWD
scp -r java-1.8.0-openjdk-1.8.0.332.b09-1.el7_9.x86_64/ root@slave2:$PWD
分发hadoop
scp -r hadoop-2.10.2 root@slave1:$PWD
scp -r hadoop-2.10.2 root@slave2:$PWD
分发/etc/hosts
scp /etc/hosts root@slave1:/etc/
scp /etc/hosts root@slave2:/etc/
分发/etc/profile
scp /etc/profile root@slave1:/etc/
scp /etc/profile root@slave2:/etc/

分发后在各从节点上执行source /etc/profile,更新本地配置

分发/opt/bigdata/hadoop-2.10.2/etc/hadoop/slaves
scp /opt/bigdata/hadoop-2.10.2/etc/hadoop/slaves root@slave1:/opt/bigdata/hadoop-2.10.2/etc/hadoop/
scp /opt/bigdata/hadoop-2.10.2/etc/hadoop/slaves root@slave2:/opt/bigdata/hadoop-2.10.2/etc/hadoop/

备注:hadoop2.x中使用slave,hadoop3.x中使用worker

返回顶部

8 启动集群

在第一次启动集群的时候需要将master中的NameNode节点格式化; 首页启动,要进行初始化节点操作,仅首次操作;如再次操作,会更改结点的ClusterID值,造成DataNode中ClusterID值不匹配问题,导致DataNode启动失败;

hdfs namenode -format
  • 启动hdfs
start-dfs.sh
  • 启动yarn
start-yarn.sh
  • 查看进程
jps
[root@master hadoop-2.10.2]# ./sbin/start-dfs.sh
Starting namenodes on [master]
master: namenode running as process 1014. Stop it first.
localhost: starting datanode, logging to /opt/bigdata/hadoop-2.10.2/logs/hadoop-root-datanode-master.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is SHA256:dZyz2xZPzpfSayK73IsRVyLdcyPhe0E7oPdoAVgRUFk.
ECDSA key fingerprint is MD5:89:f5:6c:70:a5:d6:52:6a:bb:a4:7b:dc:41:4d:51:c1.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /opt/bigdata/hadoop-2.10.2/logs/hadoop-root-secondarynamenode-master.out

通过jps命令,在各结点查看是否启动成功

[root@master hadoop-2.10.2]# jps
1586 DataNode
1746 SecondaryNameNode
1014 NameNode
1868 Jps
[root@master hadoop-2.10.2]#

通过web端管理界面,查看集群运行状态

Check Hadoop Status: http://192.168.5.131:50070/dfshealth.html#tab-overview

启动历史服务器

[root@master hadoop-2.9.2]$ sbin/mr-jobhistory-daemon.sh start historyserver

返回顶部

9 测试集群

Example1:

生成words.txt,随便录入些内容做wordcount测试集群;

hdfs dfs -mkdir /input
hdfs dfs -put words.txt /input/
hadoop jar /opt/bigdata/hadoop-2.10.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.10.2.jar wordcount /input/ /output/

hdfs dfs -ls /output
hdfs dfs -cat /output/part-r-00000

[root@master bg]# hdfs dfs -cat /output/part-r-00000
hadoop  4
hdfs    3
mapreduce       2
yarn    2
[root@master bg]#

返回顶部

10 常见问题

error:

运行hadoop命令后,出现which命令无法识别,需求安装which工具;

[root@master ~]# hadoop version
/opt/bigdata/hadoop-2.10.2/bin/hadoop: line 20: which: command not found
dirname: missing operand
Try 'dirname --help' for more information.
/opt/bigdata/hadoop-2.10.2/bin/hadoop: line 27: /root/../libexec/hadoop-config.sh: No such file or directory
/opt/bigdata/hadoop-2.10.2/bin/hadoop: line 169: exec: : not found
[root@master ~]# yum install which -y

error:

浏览器中无法访问Hadoop管理界面: http://192.168.5.131:50070/dfshealth.html#tab-overview 无法访问

排除防火墙原因,优先检查hdfs-site.xml文件,将127.0.0.1修改为0.0.0.0:设置如下:

<property>
  <name>dfs.namenode.http-address</name>
  <value>0.0.0.0:50070</value>
</property>

error

结点中DataNode正常启动,但Web管理页面中Live Node对应值为0或1 修改/etc/hosts文件如下,删除不必要的IP映射;

[root@master hadoop]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.5.131 master
192.168.5.233 slave1
192.168.5.248 slave2

重启集群并刷新页面

error

hadoop分布式集群搭建中,hadoop3.x和hadoop2.x之间在默认配置中存在差别,如web端口从50070改为了9870,使用过程中注意区别版本;

Links:

Comments