赞
踩
spring有专门支持Hadoop的使用。我们看一下官网的介绍:https://spring.io/projects/spring-hadoop
Spring for Apache Hadoop simplifies developing Apache Hadoop by providing a unified configuration model and easy to use APIs for using HDFS, MapReduce, Pig, and Hive. It also provides integration with other Spring ecosystem project such as Spring Integration and Spring Batch enabling you to develop solutions for big data ingest/export and Hadoop workflow orchestration.
Spring通过提供一组配置模型来简化开发Hadoop,并且可以很容易地通过API使用HDFS,MapReduce,Pig和Hive。它还提供与其他Spring生态系统项目(如Spring Integration和Spring Batch)的集成,使您能够为大数据摄取/导出和Hadoop工作流程编排开发解决方案。
特性:
Features:
Support to create Hadoop applications that are configured using Dependency Injection and run as standard Java applications vs. using Hadoop command line utilities.
Integration with Spring Boot to simply creat Spring apps that connect to HDFS to read and write data.
Create and configure applications that use Java MapReduce, Streaming, Hive, Pig, or HBase
Extensions to Spring Batch to support creating Hadoop based workflows for any type of Hadoop Job or HDFS operation.
Script HDFS operations using any JVM based scripting language.
Easily create custom Spring Boot based aplications that can be deployed to execute on YARN.
DAO support (Template & Callbacks) for HBase.
Support for Hadoop Security.
下面开始来创建项目。
首先在pom文件中加入依赖:
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-hadoop</artifactId>
<version>2.5.0.RELEASE</version>
</dependency>
工程可以参考我之前的博文https://blog.csdn.net/u011521382/article/details/81346266
刚开始jar包有点多,下得会比较慢。耐心等待一下就好了。
配置完pom文件后,既然是spring项目。那我们就要配置一个beans.xml的文件。
配置在resources目录下
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:hdp="http://www.springframework.org/schema/hadoop"
xmlns:context="http://www.springframework.org/schema/context"
xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd
http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd">
<hdp:configuration id="hadoopConfiguration">
fs.defaultFS=hdfs://hadoopMaster:8020
</hdp:configuration>
<hdp:file-system id="fileSystem" configuration-ref="hadoopConfiguration" user="root"></hdp:file-system>
</beans>
配置文件配置好项目的hdfs路径,以及fileSystem。因为代码中的fileSystem就依赖于这个。
下面看一下代码:
package com.yoyocheknow.hadoop.spring;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import org.springframework.context.ApplicationContext;
import org.springframework.context.support.ClassPathXmlApplicationContext;
/**
* 类说明
* 使用Spring Hadoop 来访问HDFS文件系统
* @author yoyocheknow
* @date 2018/8/9 19:42
*/
public class SpringHadoopHDFSApp {
private ApplicationContext ctx;
private FileSystem fileSystem;
//创建目录
@Test
public void testMkdir()throws Exception
{
fileSystem.mkdirs(new Path("/springhdfs"));
}
//读取HDFS文件内容
@Test
public void testText()throws Exception
{
FSDataInputStream in = fileSystem.open(new Path("/hello.txt"));
IOUtils.copyBytes(in,System.out,1024);
in.close();
}
@Before
public void setUp(){
ctx=new ClassPathXmlApplicationContext("beans.xml");
fileSystem=(FileSystem) ctx.getBean("fileSystem");
}
@After
public void tearDown()throws Exception{
ctx=null;
fileSystem.close();
}
}
下面运行测试样例看一下。
会发现,报错。报错的原因是因为我本机的用户是“ZZH”,而我云上的服务器的用户为“root”,在配置文件中配好即可。(详情见配置文件)。
下面来看看成功后的结果:
20180810:补充项目github地址:https://github.com/yoyocheknow/Hadoop
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。