繁依Fanyi0

这个屌丝很懒，什么也没留下！

热门标签

HDFS分布式文件系统-简介_hdfs分布式文件系统是什么

作者：繁依Fanyi0 | 2024-08-06 12:39:40

踩

hdfs分布式文件系统是什么

HDFS分布式文件系统

HDFS 简介

HDFS （全称：Hadoop Distribute File System，Hadoop 分布式文件系统）是 Hadoop 核心组成，是分布式存储服务。
分布式文件系统横跨多台计算机，在大数据时代有着广泛的应用前景，它们为存储和处理超大规模数据提供所需的扩展能力。
HDFS是分布式文件系统中的一种

HDFS的重要概念

HDFS 通过统一的命名空间目录树来定位文件；另外，它是分布式的，由很多服务器联合起来实现其功能，集群中的服务器有各自的角色（分布式本质是拆分，各司其职）

典型的 Master/Slave 架构
HDFS 的架构是典型的 Master/Slave 结构。
HDFS集群往往是一个NameNode（HA架构会有两个NameNode,联邦机制）+多个DataNode组成；
NameNode是集群的主节点，DataNode是集群的从节点
分块存储（block机制）
HDFS 中的文件在物理上是分块存储（block）的，块的大小可以通过配置参数来规定；
Hadoop2.x版本中默认的block大小是128M；
命名空间（NameSpace）
HDFS 支持传统的层次型文件组织结构。用户或者应用程序可以创建目录，然后将文件保存在这些目录里。文件系统名字空间的层次结构和大多数现有的文件系统类似：用户可以创建、删除、移动或重命名文件。
Namenode 负责维护文件系统的名字空间，任何对文件系统名字空间或属性的修改都将被Namenode 记录下来。
HDFS提供给客户单一个抽象目录树，访问形式：hdfs://namenode的hostname:port/test/input
hdfs://linux121:9000/test/input
NameNode元数据管理
我们把目录结构及文件分块位置信息叫做元数据。
NameNode的元数据记录每一个文件所对应的block信息（block的id,以及所在的DataNode节点的信息）
DataNode数据存储
文件的各个 block 的具体存储管理由 DataNode 节点承担。一个block会有多个DataNode来存储，DataNode会定时向NameNode来汇报自己持有的block信息
副本机制
为了容错，文件的所有 block 都会有副本。每个文件的 block 大小和副本系数都是可配置的。应用程序可以指定某个文件的副本数目。副本系数可以在文件创建的时候指定，也可以在之后改变。副本数量默认是3个
一次写入，多次读出
HDFS 是设计成适应一次写入，多次读出的场景，且不支持文件的随机修改。（支持追加写入，不只支持随机更新）
正因为如此，HDFS 适合用来做大数据分析的底层存储服务，并不适合用来做网盘等应用（修改不方便，延迟大，网络开销大，成本太高）

HDFS 架构

在这里插入图片描述

NameNode(nn):Hdfs集群的管理者，Master
维护管理Hdfs的名称空间（NameSpace）
维护副本策略
记录文件块（Block）的映射信息
负责处理客户端读写请求
DataNode:NameNode下达命令，DataNode执行实际操作，Slave节点。
保存实际的数据块
负责数据块的读写
Client:客户端
上传文件到HDFS的时候，Client负责将文件切分成Block,然后进行上传
请求NameNode交互，获取文件的位置信息
读取或写入文件，与DataNode交互
Client可以使用一些命令来管理HDFS或者访问HDFS

HDFS 客户端操作

Shell 命令行操作HDFS

基本语法
bin/hadoop fs 具体命令 OR bin/hdfs dfs 具体命令
命令大全

hdfs dfs
Usage: hadoop fs [generic options]
	[-appendToFile <localsrc> ... <dst>]
	[-cat [-ignoreCrc] <src> ...]
	[-checksum <src> ...]
	[-chgrp [-R] GROUP PATH...]
	[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
	[-chown [-R] [OWNER][:[GROUP]] PATH...]
	[-copyFromLocal [-f] [-p] [-l] [-d] <localsrc> ... <dst>]
	[-copyToLocal [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
	[-count [-q] [-h] [-v] [-t [<storage type>]] [-u] [-x] <path> ...]
	[-cp [-f] [-p | -p[topax]] [-d] <src> ... <dst>]
	[-createSnapshot <snapshotDir> [<snapshotName>]]
	[-deleteSnapshot <snapshotDir> <snapshotName>]
	[-df [-h] [<path> ...]]
	[-du [-s] [-h] [-x] <path> ...]
	[-expunge]
	[-find <path> ... <expression> ...]
	[-get [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
	[-getfacl [-R] <path>]
	[-getfattr [-R] {-n name | -d} [-e en] <path>]
	[-getmerge [-nl] [-skip-empty-file] <src> <localdst>]
	[-help [cmd ...]]
	[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...]]
	[-mkdir [-p] <path> ...]
	[-moveFromLocal <localsrc> ... <dst>]
	[-moveToLocal <src> <localdst>]
	[-mv <src> ... <dst>]
	[-put [-f] [-p] [-l] [-d] <localsrc> ... <dst>]
	[-renameSnapshot <snapshotDir> <oldName> <newName>]
	[-rm [-f] [-r|-R] [-skipTrash] [-safely] <src> ...]
	[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
	[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
	[-setfattr {-n name [-v value] | -x name} <path>]
	[-setrep [-R] [-w] <rep> <path> ...]
	[-stat [format] <path> ...]
	[-tail [-f] <file>]
	[-test -[defsz] <path>]
	[-text [-ignoreCrc] <src> ...]
	[-touchz <path> ...]
	[-truncate [-w] <length> <path> ...]
	[-usage [cmd ...]]

Generic options supported are:
-conf <configuration file>        specify an application configuration file
-D <property=value>               define a value for a given property
-fs <file:///|hdfs://namenode:port> specify default filesystem URL to use, overrides 'fs.defaultFS' property from configurations.
-jt <local|resourcemanager:port>  specify a ResourceManager
-files <file1,...>                specify a comma-separated list of files to be copied to the map reduce cluster
-libjars <jar1,...>               specify a comma-separated list of jar files to be included in the classpath
-archives <archive1,...>          specify a comma-separated list of archives to be unarchived on the compute machines

The general command line syntax is:
command [genericOptions] [commandOptions]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55

HDFS命令演示

启动Hadoop集群（方便后续的测试）


[root@linux121 hadoop-2.9.2]$ sbin/start-dfs.sh
[root@linux122 hadoop-2.9.2]$ sbin/start-yarn.sh
1
2
3

-help：输出这个命令参数

hadoop fs -help rm
1

 hdfs fs-ls /
Found 5 items
drwxr-xr-x   - root supergroup          0 2020-10-12 15:41 /test
drwx------   - root supergroup          0 2020-10-12 16:28 /tmp
drwxr-xr-x   - root supergroup          0 2020-10-12 15:47 /wcinput
drwxr-xr-x   - root supergroup          0 2020-10-12 15:45 /wcinputls
drwxr-xr-x   - root supergroup          0 2020-10-12 17:51 /wcoutput

1
2
3
4
5
6
7
8

fs -mkdir -p /lagou/bigdata

hadoop fs -ls /
Found 6 items
drwxr-xr-x   - root supergroup          0 2020-10-12 19:31 /lagou
drwxr-xr-x   - root supergroup          0 2020-10-12 15:41 /test
drwx------   - root supergroup          0 2020-10-12 16:28 /tmp
drwxr-xr-x   - root supergroup          0 2020-10-12 15:47 /wcinput
drwxr-xr-x   - root supergroup          0 2020-10-12 15:45 /wcinputls
drwxr-xr-x   - root supergroup          0 2020-10-12 17:51 /wcoutput

1
2
3
4
5
6
7
8
9
10
11

. -moveFromLocal：从本地剪切粘贴到HDFS

touch hadoop.txt
[root@linux135 servers]# hadoop fs -moveFromLocal hadoop.txt /lagou/bigdata
[root@linux135 servers]# ll
总用量 4
drwxr-xr-x. 11 root root 4096 10月 12 15:06 hadoop-2.9.2
[root@linux135 servers]# hadoop fs -ls /lagou/bigdata
Found 1 items
-rw-r--r--   3 root supergroup          0 2020-10-12 19:33 /lagou/bigdata/hadoop.txt
[root@linux135 servers]# 

1
2
3
4
5
6
7
8
9
10

-appendToFile：追加一个文件到已经存在的文件末尾

touch hdfs.txt
vim hdfs.txt 
#输入   namenode datanode block replication
touch hdfs.txt
[root@linux135 servers]# hadoop fs -appendToFile hdfs.txt /lagou/bigdata/hadoop.txt

[root@linux135 servers]# hadoop fs -cat /lagou/bigdata/hadoop.txt
namenode datanode block replication


1
2
3
4
5
6
7
8
9
10

-chgrp 、-chmod、-chown：Linux文件系统中的用法一样，修改文件所属权限

hadoop fs -chmod 666 /lagou/bigdata/hadoop.txt
hadoop fs -chown root:root /lagou/bigdata/hadoop.txt
1
2

-copyToLocal：从HDFS拷贝到本地


ll
总用量 8
drwxr-xr-x. 11 root root 4096 10月 12 15:06 hadoop-2.9.2
-rw-r--r--.  1 root root   36 10月 12 19:35 hdfs.txt
[root@linux135 servers]# hadoop fs -copyToLocal /lagou/bigdata/hadoop.txt .
[root@linux135 servers]# ll
总用量 12
drwxr-xr-x. 11 root root 4096 10月 12 15:06 hadoop-2.9.2
-rw-r--r--.  1 root root   36 10月 12 19:39 hadoop.txt
-rw-r--r--.  1 root root   36 10月 12 19:35 hdfs.txt
[root@linux135 servers]# cat hadoop.txt 
namenode datanode block replication

1
2
3
4
5
6
7
8
9
10
11
12
13
14

-copyFromLocal：从本地文件系统中拷贝文件到HDFS路径去

adoop fs -copyFromLocal README.txt /
1

-cp ：从HDFS的一个路径拷贝到HDFS的另一个路径

fs -cp /lagou/bigdata/hadoop.txt /hdfs.txt
adoop fs -ls /lagou/bigdata/
Found 1 items
-rw-rw-rw-   3 root root         36 2020-10-12 19:36 /lagou/bigdata/hadoop.txt
[root@linux135 servers]# hadoop fs -ls /
Found 7 items
-rw-r--r--   3 root supergroup         36 2020-10-12 19:41 /hdfs.txt
drwxr-xr-x   - root supergroup          0 2020-10-12 19:31 /lagou
drwxr-xr-x   - root supergroup          0 2020-10-12 15:41 /test
drwx------   - root supergroup          0 2020-10-12 16:28 /tmp
drwxr-xr-x   - root supergroup          0 2020-10-12 15:47 /wcinput
drwxr-xr-x   - root supergroup          0 2020-10-12 15:45 /wcinputls
drwxr-xr-x   - root supergroup          0 2020-10-12 17:51 /wcoutput


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

hadoop fs -mv /hdfs.txt /lagou/bigdata/
hadoop fs  -ls /
Found 6 items
drwxr-xr-x   - root supergroup          0 2020-10-12 19:31 /lagou
drwxr-xr-x   - root supergroup          0 2020-10-12 15:41 /test
drwx------   - root supergroup          0 2020-10-12 16:28 /tmp
drwxr-xr-x   - root supergroup          0 2020-10-12 15:47 /wcinput
drwxr-xr-x   - root supergroup          0 2020-10-12 15:45 /wcinputls
drwxr-xr-x   - root supergroup          0 2020-10-12 17:51 /wcoutput
[root@linux135 servers]# hadoop fs  -ls /lagou/bigdata
Found 2 items
-rw-rw-rw-   3 root root               36 2020-10-12 19:36 /lagou/bigdata/hadoop.txt
-rw-r--r--   3 root supergroup         36 2020-10-12 19:41 /lagou/bigdata/hdfs.txt

1
2
3
4
5
6
7
8
9
10
11
12
13
14

-get：等同于copyToLocal，就是从HDFS下载文件到本地

hadoop fs -get /lagou/bigdata/hadoop.txt .
[root@linux135 servers]# ll
总用量 12
drwxr-xr-x. 11 root root 4096 10月 12 15:06 hadoop-2.9.2
-rw-r--r--.  1 root root   36 10月 12 19:44 hadoop.txt
-rw-r--r--.  1 root root   36 10月 12 19:35 hdfs.txt

1
2
3
4
5
6
7

-put：等同于copyFromLocal

hadoop fs -mkdir -p /user/root/test/
#本地文件系统创建yarn.txt
[root@linux121 hadoop-2.9.2]$ vim yarn.txt
resourcemanager nodemanager
hadoop fs -mkdir -p /user/root/test
[root@linux135 servers]# hadoop fs -put yarn.txt /user/root/test/
[root@linux135 servers]# hadoop fs -ls  /user/root/test/
Found 1 items
-rw-r--r--   3 root supergroup         28 2020-10-12 19:46 /user/root/test/yarn.txt

1
2
3
4
5
6
7
8
9
10

-tail：显示一个文件的末尾

hadoop fs -tail  /user/root/test/yarn.txt
resourcemanager nodemanager

1
2
3

-rm：删除文件或文件夹

hadoop fs -rm /user/root/test/yarn.txt
Deleted /user/root/test/yarn.txt
[root@linux135 servers]# hadoop fs -ls /user/root/test/
[root@linux135 servers]# 

1
2
3
4
5

hadoop fs -mkdir /test
hadoop fs -rmdir /test 或者 hadoop fs -rm -R /test
1
2

-du统计文件夹的大小信息


fs -du -s -h /user/root/test/
0  /user/root/test
[root@linux135 servers]# hadoop fs -du -s /user/root/test/
0  /user/root/test
[root@linux135 servers]# hadoop fs -du -h /user/root/test/

1
2
3
4
5
6
7

-setrep：设置HDFS中文件的副本数量

hadoop fs -setrep 10 /lagou/bigdata/hadoop.txt
1

在这里插入图片描述

这里设置的副本数只是记录在NameNode的元数据中，是否真的会有这么多副本，还得看DataNode的
数量。因为目前只有3台设备，最多也就3个副本，只有节点数的增加到10台时，副本数才能达到10。

JAVA api

Window下解压

在这里插入图片描述
配置环境变量

在这里插入图片描述

建立项目


<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.liu</groupId>
    <artifactId>stage04-hdfs</artifactId>
    <version>1.0-SNAPSHOT</version>


    <dependencies>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.9.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.9.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>2.9.2</version>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>RELEASE</version>
        </dependency>
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-core</artifactId>
            <version>2.8.2</version>
        </dependency>
    </dependencies>
    <!--maven打包插件 -->
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.8.1</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                    <encoding>UTF-8</encoding>
                </configuration>
            </plugin>
            <plugin>
                <artifactId>maven-assembly-plugin </artifactId>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
</project>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72

测试建立文件夹


public class HdfsClient {

    @Test
    public void testMkdirs() throws IOException, InterruptedException,
            URISyntaxException {


        //1. 创建配置文件
        Configuration configuration = new Configuration();

        //2. 根据Configuration获取FileSystem对象

        FileSystem fs = FileSystem.get(new URI("hdfs://192.168.181.135:9000"), configuration, "root");
        //3. 使用FileSystem对象创建目录

        fs.mkdirs(new Path("/api_test5"));


        //4.释放FileSystem对象

        fs.close();


    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

报错

java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset.

下载地址
https://github.com/cdarlint/winutils
下载对应版的一般下载的winutils.exe是这个包放到bin目录下
再次启动IDEA或者Ecplice即可

如果不指定操作HDFS集群的用户信息，默认是获取当前操作系统的用户信息，出现权限被拒绝的问
题，报错如下
在这里插入图片描述

解决方案：

从资料文件夹中找到winutils.exe拷贝放到windows系统Hadoop安装目录的bin目录下即可！！
HDFS文件系统权限问题
hdfs的文件权限机制与linux系统的文件权限机制类似！！
r:read w:write x:execute 权限x对于文件表示忽略，对于文件夹表示是否有权限访问其内容
如果linux系统用户zhangsan使用hadoop命令创建一个文件，那么这个文件在HDFS当中的owner
就是zhangsan
HDFS文件权限的目的，防止好人做错事，而不是阻止坏人做坏事。HDFS相信你告诉我你是谁，
你就是谁！！
解决方案

指定用户信息获取FileSystem对象

关闭HDFS集群权限校验

vim hdfs-site.xml
#添加如下属性
<property>
<name>dfs.permissions</name>
<value>true</value>
</property>
1
2
3
4
5
6

修改完成之后要分发到其它节点，同时要重启HDFS集群

基于HDFS权限本身比较鸡肋的特点，我们可以彻底放弃HDFS的权限校验，如果生产环境中

我们可以考虑借助kerberos以及sentry等安全框架来管理大数据集群安全。所以我们直接修
改HDFS的根目录权限为777

hadoop fs -chmod -R 777 /
1

在这里插入图片描述

上传文件

@Test
public void testCopyFromLocalFile() throws IOException,
InterruptedException, URISyntaxException {
// 1 获取文件系统
Configuration configuration = new Configuration();
configuration.set("dfs.replication", "2");
FileSystem fs = FileSystem.get(new URI("hdfs://192.168.181.135:9000"),
configuration, "root");
// 2 上传文件
fs.copyFromLocalFile(new Path("e:/test.txt"), new
Path("/test.txt"));
// 3 关闭资源
fs.close();
System.out.println("end");
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

默认副本数是3

代码指定

 configuration.set("dfs.replication", "2");
1

将hdfs-site.xml拷贝到项目的根目录下

<
?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
1
2
3
4
5
6
7
8
9

. 参数优先级
参数优先级排序：（1）代码中设置的值 >（2）用户自定义配置文件 >（3）服务器的默认配置

下载

@Test
public void testCopyToLocalFile() throws IOException, InterruptedException,
URISyntaxException{
// 1 获取文件系统
Configuration configuration = new Configuration();
FileSystem fs = FileSystem.get(new URI("hdfs://192.168.181.135:9000"),
configuration, "root");
// 2 执行下载操作
// boolean delSrc 指是否将原文件删除
// Path src 指要下载的文件路径
// Path dst 指将文件下载到的路径
// boolean useRawLocalFileSystem 是否开启文件校验
fs.copyToLocalFile(false, new Path("/test.txt"), new
Path("e:/test_copy.txt"), true);
// 3 关闭资源
fs.close();
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

删除文件/文件夹

@Test
public void testDelete() throws IOException, InterruptedException,
URISyntaxException{
// 1 获取文件系统
Configuration configuration = new Configuration();
FileSystem fs = FileSystem.get(new URI("hdfs://192.168.181.135:9000"),
configuration, "root");
// 2 执行删除  true递归
fs.delete(new Path("/api_test/"), true);
// 3 关闭资源
fs.close();
}
1
2
3
4
5
6
7
8
9
10
11
12

查看文件名称、权限、长度、块信息

@Test
public void testListFiles() throws IOException, InterruptedException,
URISyntaxException{
Configuration configuration = new Configuration();
FileSystem fs = FileSystem.get(new URI("hdfs://linux121:9000"),
configuration, "root");
// 2 获取文件详情
RemoteIterator<LocatedFileStatus> listFiles = fs.listFiles(new Path("/"),
true);
while(listFiles.hasNext()){
LocatedFileStatus status = listFiles.next();
// 输出详情
// 文件名称
System.out.println(status.getPath().getName());
// 长度
System.out.println(status.getLen());
// 权限
System.out.println(status.getPermission());
// 分组
System.out.println(status.getGroup());
// 获取存储的块信息
BlockLocation[] blockLocations = status.getBlockLocations();
for (BlockLocation blockLocation : blockLocations) {
// 获取块存储的主机节点
String[] hosts = blockLocation.getHosts();
for (String host : hosts) {
System.out.println(host);
}
} S
ystem.out.println("-----------华丽的分割线----------");
}
// 3 关闭资源
fs.close();

}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

文件夹判断

// 1 获取文件配置信息
Configuration configuration = new Configuration();
FileSystem fs = FileSystem.get(new URI("hdfs://linux121:9000"),
configuration, "root");
// 2 判断是文件还是文件夹
FileStatus[] listStatus = fs.listStatus(new Path("/"));
for (FileStatus fileStatus : listStatus) {
// 如果是文件
if (fileStatus.isFile()) {
System.out.println("f:"+fileStatus.getPath().getName());
}else {
System.out.println("d:"+fileStatus.getPath().getName());
}
}
// 3 关闭资源
fs.close();
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

I/O流操作HDFS

@Test
public void putFileToHDFS() throws IOException, InterruptedException,
URISyntaxException {
// 1 获取文件系统
Configuration configuration = new Configuration();
FileSystem fs = FileSystem.get(new URI("hdfs://linux121:9000"),
configuration, "root");
// 2 创建输入流
FileInputStream fis = new FileInputStream(new File("e:/test.txt"));
// 3 获取输出流
FSDataOutputStream fos = fs.create(new Path("/test_io.txt"));
// 4 流对拷
IOUtils.copyBytes(fis, fos, configuration);
// 5 关闭资源
IOUtils.closeStream(fos);
IOUtils.closeStream(fis);
fs.close();
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

IO流下载

// 文件下载
@Test
public void getFileFromHDFS() throws IOException, InterruptedException,
URISyntaxException{
// 1 获取文件系统
Configuration configuration = new Configuration();
FileSystem fs = FileSystem.get(new URI("hdfs://linux121:9000"),
configuration, "root");
// 2 获取输入流
FSDataInputStream fis = fs.open(new Path("/test_io.txt"));
// 3 获取输出流
FileOutputStream fos = new FileOutputStream(new
File("e:/test_io_copy.txt"));
// 4 流的对拷
IOUtils.copyBytes(fis, fos, configuration);
// 5 关闭资源
IOUtils.closeStream(fos);
IOUtils.closeStream(fis);
fs.close();
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

seek 定位读取


@Test
public void readFileSeek2() throws IOException, InterruptedException,
URISyntaxException{

// 1 获取文件系统
Configuration configuration = new Configuration();
FileSystem fs = FileSystem.get(new URI("hdfs://192.168.181.135:9000"),
configuration, "root");
// 2 打开输入流,读取数据输出到控制台
FSDataInputStream in = null;
try{
in= fs.open(new Path("/test.txt"));
IOUtils.copyBytes(in, System.out, 4096, false);
in.seek(0); //从头再次读取
IOUtils.copyBytes(in, System.out, 4096, false);
}finally {
IOUtils.closeStream(in);
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

java


package com.liu.hdfs;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.io.IOUtils;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;


public class HdfsClient {
    Configuration configuration = null;
    FileSystem fs = null;

    @Before
    public void init() throws URISyntaxException, IOException,
            InterruptedException {
//1 获取Hadoop 集群的configuration对象
        configuration = new Configuration();
// configuration.set("fs.defaultFS", "hdfs://linux135:9000");
        configuration.set("dfs.replication", "2");
//2 根据configuration获取Filesystem对象
        fs = FileSystem.get(new URI("hdfs://192.168.181.135:9000"), configuration,
                "root");
    }

    @After
    public void destory() throws IOException {
        //4 释放FileSystem对象（类似数据库连接）
        fs.close();
    }

    @Test
    public void testMkdirs() throws IOException, InterruptedException,
            URISyntaxException {

        fs.mkdirs(new Path("/api_test"));


    }

    @Test
    public void testMkdirs2() throws IOException {

        fs.mkdirs(new Path("/api_test_2"));


    }

    @Test
    public void testCopyFromLocalFile() throws URISyntaxException, IOException, InterruptedException {
        //e:\test.txt

        //3. 使用FileSystem对象
        // boolean delSrc 指是否将原文件删除
        // Path src 指要下载的文件路径
        // Path dst 指将文件下载到的路径
        // boolean useRawLocalFileSystem 是否开启文件校验
        fs.copyFromLocalFile(new Path("e:/test.txt"), new Path("/test.txt"));
        //上传文件到hdfs默认是3个副本，
        //如何改变上传文件的副本数量？
        //1 configuration对象中指定新的副本数量
    }


    @Test
    public void testCopyToLocalFile() throws IOException, InterruptedException,
            URISyntaxException {

        // 2 执行下载操作
        // boolean delSrc 指是否将原文件删除
        // Path src 指要下载的文件路径
        // Path dst 指将文件下载到的路径
        // boolean useRawLocalFileSystem 是否开启文件校验
        fs.copyToLocalFile(true, new Path("/test.txt"), new
                Path("e:/test_copy.txt"), true);

    }


    @Test
    public void testDelete() throws IOException {
        // 2 执行删除
        fs.delete(new Path("/api_test_2"), true);

    }

    //获取文件详情
    @Test
    public void testListFiles() throws IOException {

        RemoteIterator<LocatedFileStatus> listFiles = fs.listFiles(new Path("/"), true);

        while (listFiles.hasNext()) {
            LocatedFileStatus status = listFiles.next();
            // 输出详情
            // 文件名称
            System.out.println(status.getPath().getName());
            // 长度
            System.out.println(status.getLen());
            // 权限
            System.out.println(status.getPermission());
            // 分组
            System.out.println(status.getGroup());
            // 获取存储的块信息
            BlockLocation[] blockLocations = status.getBlockLocations();
            for (BlockLocation blockLocation : blockLocations) {
                // 获取块存储的主机节点
                String[] hosts = blockLocation.getHosts();
                for (String host : hosts) {
                    System.out.println(host);
                }
            }
            System.out.println("-----------华丽的分割线----------");
        }

    }

    @Test
    public void testListStatus() throws IOException, InterruptedException,
            URISyntaxException {

        FileStatus[] listStatus = fs.listStatus(new Path("/"));


        for (FileStatus fileStatus : listStatus) {
            if (fileStatus.isFile()) {
                System.out.println("f: " + fileStatus.getPath().getName());
            } else {
                System.out.println("d:  " + fileStatus.getPath().getName());
            }
        }
    }


    @Test
    public void putFileToHDFS() throws IOException, InterruptedException,
            URISyntaxException {

        // 2 创建输入流
        FileInputStream fis = new FileInputStream(new File("e:/test.txt"));
// 3 获取输出流
        FSDataOutputStream fos = fs.create(new Path("/test_io.txt"));
// 4 流对拷
        IOUtils.copyBytes(fis, fos, configuration);
// 5 关闭资源
        IOUtils.closeStream(fos);
        IOUtils.closeStream(fis);

    }

    //
//    文件下载
//1. 需求：从HDFS上下载test.txt文件到本地e盘上
//2. 编写代码
// 文件下载
    @Test
    public void getFileFromHDFS() throws IOException, InterruptedException,
            URISyntaxException {
        FSDataInputStream fis = fs.open(new Path("/test_io.txt"));
        // 3 获取输出流
        FileOutputStream fos = new FileOutputStream(new
                File("e:/test_io_copy.txt"));
        // 4 流的对拷
        IOUtils.copyBytes(fis, fos, configuration);
// 5 关闭资源
        IOUtils.closeStream(fos);
        IOUtils.closeStream(fis);
    }


    @Test
    public void readFileSeek2() throws IOException, InterruptedException,
            URISyntaxException{

        // 2 打开输入流,读取数据输出到控制台
        FSDataInputStream fis = null;

        try {
            fis = fs.open(new Path("/test.txt"));
            IOUtils.copyBytes(fis, System.out, 4096, false);
            System.out.println("\n======从头再来========");
            fis.seek(0); //从头再次读取
            IOUtils.copyBytes(fis, System.out, 4096, false);
        }finally {
            IOUtils.closeStream(fis);
        }
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/繁依Fanyi0/article/detail/937474