【Hadoop】伪分布式安装

发布于:2025-05-16 ⋅ 阅读:(10) ⋅ 点赞:(0)

【Hadoop】伪分布式安装

什么是 Hadoop 伪分布式安装?

Hadoop 伪分布式安装(Pseudo-Distributed Mode) 是一种在单台机器上模拟分布式集群环境的部署方式。它是介于 本地模式(Local Mode)完全分布式模式(Fully Distributed Mode) 之间的一种配置,主要用于学习、开发和测试 Hadoop 的核心功能,而无需多台物理机器。

  1. 单机模拟多节点
    • 所有 Hadoop 组件(如 NameNode、DataNode、ResourceManager、NodeManager 等)都运行在同一台机器上。
    • 每个组件以独立的 Java 进程运行,模拟多节点行为。
  2. 与本地模式的区别
    • 本地模式(Local Mode):直接使用本地文件系统,无分布式特性(无 HDFS,无 YARN)。
    • 伪分布式模式:使用 HDFS 和 YARN,但所有服务集中在单机。
  3. 与完全分布式的区别
    • 完全分布式模式:服务分布在多台机器,适合生产环境。
    • 伪分布式模式:仅用于单机测试,性能和生产环境不同。

伪分布式安装核心步骤(以 hadoop-3.1.1 为例)

用户权限准备

建议创建专用用户(如hadoop)进行部署,避免使用root用户:

useradd hadoop	#使用 root 用户进行创建用户 hadoop
passwd hadoop	#设置 hadoop 用户密码
chown -R hadoop:hadoop hadoop-3.1.1/  # 设置权限
su - hadoop	#切换到 hadoop 用户,配置完环境变量再切换

安装JDK

环境变量

[root@localhost module]# vim /etc/profile.d/my_env.sh 

#JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_221
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar


:wq

[root@localhost module]# source /etc/profile
[root@localhost module]# java -version
java version "1.8.0_221"
Java(TM) SE Runtime Environment (build 1.8.0_221-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.221-b11, mixed mode)
[root@localhost module]# 

配置本地免密码登录(Hadoop 服务启动依赖)

# 生成 SSH 密钥对(一路回车,不设密码)
ssh-keygen -t rsa -P "" -f ~/.ssh/id_rsa

# 将公钥添加到本地授权列表
mkdir -p ~/.ssh
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

# 设置文件权限(关键!否则 SSH 可能拒绝登录)
chmod 600 ~/.ssh/authorized_keys
chmod 700 ~/.ssh

# 测试免密登录(应直接进入,无需密码)
ssh localhost
exit  # 退出 SSH 会话

下载并安装 Hadoop

下载 Hadoop

从官网下载稳定版(如hadoop-3.1.1.tar.gz),解压到指定目录:

[root@localhost module]# pwd
/opt/module
[root@localhost module]# ll
总用量 742452
drwxr-xr-x. 11 hadoop hadoop       178 515 17:43 hadoop-3.1.1
-rw-r--r--.  1 root   root   334559382 515 16:50 hadoop-3.1.1.tar.gz
drwxr-xr-x.  7     10    143       245 74 2019 jdk1.8.0_221
-rw-r--r--.  1 root   root   195094741 515 16:50 jdk-8u221-linux-x64.tar.gz
drwxr-xr-x. 16 root   root        4096 62 2017 pig-0.17.0
-rw-r--r--.  1 root   root   230606579 515 16:50 pig-0.17.0.tar.gz
[root@localhost module]# 

tar -zxvf hadoop-3.1.1.tar.gz

配置环境变量

[root@localhost module]# cat /etc/profile.d/my_env.sh 

#JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_221
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar


export HADOOP_HOME=/opt/module/hadoop-3.1.1
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

配置 Hadoop 核心文件

Hadoop 通过 XML 文件配置,需修改以下 4 个文件(路径:$HADOOP_HOME/etc/hadoop/)。

core-site.xml

配置 HDFS 的默认地址和临时目录:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->


<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://localhost:9000</value>  # HDFS默认地址(伪分布式)
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/opt/module/hadoop-3.1.1/hadooptmp</value>  # 临时文件存储路径(需手动创建)
  </property>
</configuration>
hdfs-site.xml

配置 HDFS 副本数(伪分布式下设为 1,因只有一个 DataNode):

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->


<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
</configuration>
mapred-site.xml

配置 MapReduce 框架使用 YARN:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->


<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
</configuration>
yarn-site.xml

配置 YARN 的 ResourceManager 地址和 NodeManager 的环境变量:

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->


<configuration>
  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>localhost</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
</configuration>

初始化 HDFS 文件系统

格式化 NameNode(首次启动必做,且仅需执行一次):

# 格式化 NameNode(首次启动必做,且仅需执行一次)
hdfs namenode -format

启动服务

# 启动HDFS(NameNode、DataNode)
start-dfs.sh
# 启动YARN(ResourceManager、NodeManager)
start-yarn.sh


[hadoop@localhost hadooptmp]$ start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [localhost.localdomain]
[hadoop@localhost hadooptmp]$ start-yarn.sh
Starting resourcemanager
Starting nodemanagers

JPS 验证

[hadoop@localhost hadooptmp]$ jps  
17361 SecondaryNameNode
17697 NodeManager
17575 ResourceManager
18008 Jps
17182 DataNode
17039 NameNode

访问 Web 界面

systemctl stop firewalld	#关闭防火墙
systemctl disable firewall	#关闭开机自启

在这里插入图片描述

简单操作测试

[hadoop@localhost hadooptmp]$ hdfs dfs -mkdir -p /user/hadoop
[hadoop@localhost hadooptmp]$ hdfs dfs -put /etc/profile /user/hadoop
[hadoop@localhost hadooptmp]$ hdfs dfs -ls /user/hadoop

在这里插入图片描述

伪分布式 vs 完全分布式 vs 单机模式

模式 节点数 组件部署 用途
单机模式 1 仅运行非分布式进程(无守护进程) 快速验证程序逻辑(无分布式功能)
伪分布式 1 所有分布式组件运行在同一节点 学习、调试、本地测试
完全分布式 ≥2 各组件分布在不同节点(主从架构) 生产环境大规模数据处理

常见问题与注意事项

端口冲突:若 9000、8088 等端口被占用,需修改配置文件中的端口号(如fs.defaultFS改为hdfs://localhost:9001)。

权限问题:避免用sudo启动服务(可能导致文件权限混乱),建议用普通用户操作,或提前设置目录权限:

chown -R $USER:$USER /opt/module/hadoop-3.1.1/ # 赋予当前用户目录所有权, USER 如给 hadoop用户 chown -R hadoop:hadoop /opt/module/hadoop-3.1.1/

日志排查:启动失败时,查看 H A D O O P H O M E / l o g s / 下的日志文件(如 h a d o o p − HADOOP_HOME/logs/下的日志文件(如hadoop- HADOOPHOME/logs/下的日志文件(如hadoop{USER}-namenode-${HOSTNAME}.log)。

关闭服务:用stop-dfs.sh和stop-yarn.sh正常停止,避免直接 kill 进程导致数据不一致。

若还有问题,停止服务,删除压缩包重新来一遍。


网站公告

今日签到

点亮在社区的每一天
去签到