任务3.2上传文件到HDFS目录
掌握HDFS的基本操作
HDFS基本操作练习
文件上传下载查看操作
要求:通过命令行方式实现:
1,下载文件email_log.txt(可通过:wget http://10.255.10.50/file/email_log.txt)到本地 /user/hadoop目录下;
2)上传文件email_log.txt到HDFS的本组员私人目录下;
3)上传文件email_log.txt到HDFS的本组员私人目录下并重命名为1.txt;
4)将HDFS上的私人目录下的1.txt下载到本地 /user/hadoop目录下
5) 将本地目录下的1.txt转移到HDFS的本组员私人目录下
6) 使用tail命令查看本组员私人目录下的2.txt文件
7)使用cat命令查看本员私人目录下的2.txt文件,按Ctrl+C中继查看
1,下载文件email_log.txt(可通过:wget http://10.255.10.50/file/email_log.txt)到本地 /user/hadoop目录下;
2)上传文件email_log.txt到HDFS的本组员私人目录下;
3)上传文件email_log.txt到HDFS的本组员私人目录下并重命名为1.txt;
4)将HDFS上的私人目录下的1.txt下载到本地 /user/hadoop目录下
5) 将本地目录下的1.txt转移到HDFS的本组员私人目录下
6) 使用tail命令查看本组员私人目录下的2.txt文件
7)使用cat命令查看本员私人目录下的2.txt文件,按Ctrl+C中继查看
任务3.3运行首个MapReduce任务
MapReduce任务相关配置
vi /usr/local/hadoop-3.3.1/etc/hadoop/mapred-site.xml
更改三处:
HADOOP_MAPRED_HOME=$HADOOP_CLASS
添加一处
<property>
<name>yarn.application.classpath</name>
<value>$HADOOP_CLASS</value>
</property>
更改三处:
HADOOP_MAPRED_HOME=$HADOOP_CLASS
添加一处
<property>
<name>yarn.application.classpath</name>
<value>$HADOOP_CLASS</value>
</property>
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_CLASS</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_CLASS</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_CLASS</value>
</property>
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>256</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>512</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>256</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>$HADOOP_CLASS</value>
</property>
</configuration>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_CLASS</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_CLASS</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_CLASS</value>
</property>
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>256</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>512</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>256</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>$HADOOP_CLASS</value>
</property>
</configuration>
其它参考
在笔记本上操作Hadoop3.x集群
HDFS基本操作练习
文件上传下载查看操作
要求:通过命令行方式实现:
1,下载并解压文件email_log.txt:
2)上传文件email_log.txt到HDFS的本组员私人目录下;
3)上传文件email_log.txt到HDFS的本组员私人目录下并重命名为1.txt;
4)将HDFS上的私人目录下的1.txt下载到本地 /user/hadoop目录下
5) 将本地目录下的1.txt转移到HDFS的本组员私人目录下
6) 使用tail命令查看本组员私人目录下的2.txt文件
7)使用cat命令查看本员私人目录下的2.txt文件,按Ctrl+C中继查看
1,下载并解压文件email_log.txt:
2)上传文件email_log.txt到HDFS的本组员私人目录下;
3)上传文件email_log.txt到HDFS的本组员私人目录下并重命名为1.txt;
4)将HDFS上的私人目录下的1.txt下载到本地 /user/hadoop目录下
5) 将本地目录下的1.txt转移到HDFS的本组员私人目录下
6) 使用tail命令查看本组员私人目录下的2.txt文件
7)使用cat命令查看本员私人目录下的2.txt文件,按Ctrl+C中继查看
常见问题
WARN hdfs.DataStreamer: Exception in createBlockOutputStream...
详细错误
2022-02-09 08:06:22,360 WARN hdfs.DataStreamer: Exception in createBlockOutputStream blk_1073741958_1134
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:586)
at org.apache.hadoop.hdfs.DataStreamer.createSocketForPipeline(DataStreamer.java:253)
at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1757)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1711)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:707)
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:586)
at org.apache.hadoop.hdfs.DataStreamer.createSocketForPipeline(DataStreamer.java:253)
at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1757)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1711)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:707)
WARN hdfs.DataStreamer: Exception in createBlockOutputStream
详细错误
2022-02-09 08:06:22,540 WARN hdfs.DataStreamer: Exception in createBlockOutputStream blk_1073741960_1136
java.io.IOException: Got error, status=ERROR, status message , ack with firstBadLink as 192.168.137.136:9866
atorg.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:134)
atorg.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:110)
atorg.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1810)
atorg.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1711)
atorg.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:707)
java.io.IOException: Got error, status=ERROR, status message , ack with firstBadLink as 192.168.137.136:9866
atorg.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:134)
atorg.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:110)
atorg.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1810)
atorg.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1711)
atorg.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:707)
Cannot allocate containers as requested resource is greater than maximum allowed allocation
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>128</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>2048</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>512</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>2</value>
</property>
<value>2048</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>128</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>2048</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>512</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>2</value>
</property>
报hadoop路径配置问题
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
进入安全模式
关闭Hadoop的安全模式
bin/hadoop dfsadmin -safemode leave
[root@c100 hadoop-3.3.1]# bin/hadoop dfsadmin -safemode leave
WARNING: Use of this script to execute dfsadmin is deprecated.
WARNING: Attempting to execute replacement "hdfs dfsadmin" instead.
Safe mode is OFF
[root@c100 hadoop-3.3.1]# cd sbin/
[root@c100 sbin]# ./stop-all.sh
Stopping namenodes on [c100]
Last login: Thu Apr 28 02:17:38 EDT 2022 from 10.255.10.31 on pts/0
Stopping datanodes
Last login: Thu Apr 28 02:21:02 EDT 2022 on pts/0
Stopping secondary namenodes [c100]
Last login: Thu Apr 28 02:21:04 EDT 2022 on pts/0
Stopping nodemanagers
Last login: Thu Apr 28 02:21:07 EDT 2022 on pts/0
Stopping resourcemanager
Last login: Thu Apr 28 02:21:11 EDT 2022 on pts/0
[root@c100 sbin]# ./start-all.sh
Starting namenodes on [c100]
Last login: Thu Apr 28 02:21:13 EDT 2022 on pts/0
Starting datanodes
Last login: Thu Apr 28 02:21:23 EDT 2022 on pts/0
Starting secondary namenodes [c100]
Last login: Thu Apr 28 02:21:26 EDT 2022 on pts/0
Starting resourcemanager
Last login: Thu Apr 28 02:21:30 EDT 2022 on pts/0
Starting nodemanagers
Last login: Thu Apr 28 02:21:36 EDT 2022 on pts/0
[root@c100 sbin]#
WARNING: Use of this script to execute dfsadmin is deprecated.
WARNING: Attempting to execute replacement "hdfs dfsadmin" instead.
Safe mode is OFF
[root@c100 hadoop-3.3.1]# cd sbin/
[root@c100 sbin]# ./stop-all.sh
Stopping namenodes on [c100]
Last login: Thu Apr 28 02:17:38 EDT 2022 from 10.255.10.31 on pts/0
Stopping datanodes
Last login: Thu Apr 28 02:21:02 EDT 2022 on pts/0
Stopping secondary namenodes [c100]
Last login: Thu Apr 28 02:21:04 EDT 2022 on pts/0
Stopping nodemanagers
Last login: Thu Apr 28 02:21:07 EDT 2022 on pts/0
Stopping resourcemanager
Last login: Thu Apr 28 02:21:11 EDT 2022 on pts/0
[root@c100 sbin]# ./start-all.sh
Starting namenodes on [c100]
Last login: Thu Apr 28 02:21:13 EDT 2022 on pts/0
Starting datanodes
Last login: Thu Apr 28 02:21:23 EDT 2022 on pts/0
Starting secondary namenodes [c100]
Last login: Thu Apr 28 02:21:26 EDT 2022 on pts/0
Starting resourcemanager
Last login: Thu Apr 28 02:21:30 EDT 2022 on pts/0
Starting nodemanagers
Last login: Thu Apr 28 02:21:36 EDT 2022 on pts/0
[root@c100 sbin]#
步骤 1 执行命令退出安全模式: hdfs dfsadmin -safemode leave
步骤 2 执行健康检查,删除损坏掉的block。 hdfs fsck / -delete
步骤 2 执行健康检查,删除损坏掉的block。 hdfs fsck / -delete
[root@c100 logs]# hdfs dfsadmin -safemode leave
Safe mode is OFF
[root@c100 logs]# hdfs fsck / -delete
Connecting to namenode via http://c100:9870/fsck?ugi=root&delete=1&path=%2F
FSCK started by root (auth:SIMPLE) from /10.255.10.100 for path / at Thu Apr 28 02:54:52 EDT 2022
/user/chenyanfang/1.txt: MISSING 2 blocks of total size 218379675 B.
/user/chenyanfang/2.txt: MISSING 2 blocks of total size 218379675 B.
/user/chenyanfang/email_log.txt: MISSING 2 blocks of total size 218379675 B.
/user/liuchenling/1.txt: MISSING 2 blocks of total size 218379675 B.
/user/liuchenling/email_log.txt: MISSING 2 blocks of total size 218379675 B.
/user/root/1.txt: MISSING 2 blocks of total size 218379675 B.
/user/yeying/1.txt: MISSING 2 blocks of total size 218379675 B.
/user/yeying/email_log.txt: MISSING 2 blocks of total size 218379675 B.
Status: CORRUPT
Number of data-nodes: 0
Number of racks: 0
Total dirs: 15
Total symlinks: 0
Replicated Blocks:
Total size: 1747037400 B
Total files: 8
Total blocks (validated): 16 (avg. block size 109189837 B)
********************************
UNDER MIN REPL'D BLOCKS: 16 (100.0 %)
MINIMAL BLOCK REPLICATION: 1
CORRUPT FILES: 8
MISSING BLOCKS: 16
MISSING SIZE: 1747037400 B
********************************
Minimally replicated blocks: 0 (0.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 0.0
Missing blocks: 16
Corrupt blocks: 0
Missing replicas: 0
Blocks queued for replication: 0
Erasure Coded Block Groups:
Total size: 0 B
Total files: 0
Total block groups (validated): 0
Minimally erasure-coded block groups: 0
Over-erasure-coded block groups: 0
Under-erasure-coded block groups: 0
Unsatisfactory placement block groups: 0
Average block group size: 0.0
Missing block groups: 0
Corrupt block groups: 0
Missing internal blocks: 0
Blocks queued for replication: 0
FSCK ended at Thu Apr 28 02:54:53 EDT 2022 in 396 milliseconds
The filesystem under path '/' is CORRUPT
[root@c100 logs]#
Safe mode is OFF
[root@c100 logs]# hdfs fsck / -delete
Connecting to namenode via http://c100:9870/fsck?ugi=root&delete=1&path=%2F
FSCK started by root (auth:SIMPLE) from /10.255.10.100 for path / at Thu Apr 28 02:54:52 EDT 2022
/user/chenyanfang/1.txt: MISSING 2 blocks of total size 218379675 B.
/user/chenyanfang/2.txt: MISSING 2 blocks of total size 218379675 B.
/user/chenyanfang/email_log.txt: MISSING 2 blocks of total size 218379675 B.
/user/liuchenling/1.txt: MISSING 2 blocks of total size 218379675 B.
/user/liuchenling/email_log.txt: MISSING 2 blocks of total size 218379675 B.
/user/root/1.txt: MISSING 2 blocks of total size 218379675 B.
/user/yeying/1.txt: MISSING 2 blocks of total size 218379675 B.
/user/yeying/email_log.txt: MISSING 2 blocks of total size 218379675 B.
Status: CORRUPT
Number of data-nodes: 0
Number of racks: 0
Total dirs: 15
Total symlinks: 0
Replicated Blocks:
Total size: 1747037400 B
Total files: 8
Total blocks (validated): 16 (avg. block size 109189837 B)
********************************
UNDER MIN REPL'D BLOCKS: 16 (100.0 %)
MINIMAL BLOCK REPLICATION: 1
CORRUPT FILES: 8
MISSING BLOCKS: 16
MISSING SIZE: 1747037400 B
********************************
Minimally replicated blocks: 0 (0.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 0.0
Missing blocks: 16
Corrupt blocks: 0
Missing replicas: 0
Blocks queued for replication: 0
Erasure Coded Block Groups:
Total size: 0 B
Total files: 0
Total block groups (validated): 0
Minimally erasure-coded block groups: 0
Over-erasure-coded block groups: 0
Under-erasure-coded block groups: 0
Unsatisfactory placement block groups: 0
Average block group size: 0.0
Missing block groups: 0
Corrupt block groups: 0
Missing internal blocks: 0
Blocks queued for replication: 0
FSCK ended at Thu Apr 28 02:54:53 EDT 2022 in 396 milliseconds
The filesystem under path '/' is CORRUPT
[root@c100 logs]#
突然断电导致数据存储文件丢失
Directory /data/hadoop/hdfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
Invalid resource request! Cannot allocate containers as requested resource
任务报错信息:
java.io.IOException: org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request! Cannot allocate containers as requested resource is greater than maximum allowed allocation. Requested resource type=[memory-mb], Requested resource=<memory:1536, vCores:1>, maximum allowed allocation=<memory:1024, vCores:2>, please note that maximum allowed allocation is calculated by scheduler based on maximum resource of registered NodeManagers, which might be less than configured maximum allocation=<memory:1024, vCores:4>
应对措施
实训
实训2查询与中断MapReduce任务
实现思路及步骤
使用官方的/hadoop-mapreduce-examples-3.3.1.jar分别提交MR
任务(wordcount、wordmean、wordmedian),将结果输出到hdfs:
/user/myname/output_logs_wordcount,
/user/myname/output_logs_wordmean,
/user/myname/output_logs_wordmedian
目录下
任务(wordcount、wordmean、wordmedian),将结果输出到hdfs:
/user/myname/output_logs_wordcount,
/user/myname/output_logs_wordmean,
/user/myname/output_logs_wordmedian
目录下