hortonworks hadoop相关安装

namenode new generation size





一。 准备工作:

  • Configure NTP clients. Execute the following command on all the nodes in your cluster:                        

    yum install ntp
  • Enable the service. Execute the following command on all the nodes in your cluster:                        

    chkconfig ntpd on
  • Start the NTP. Execute the following command on all the nodes in your cluster:

    /etc/init.d/ntpd start
  • You can use the existing NTP server in your environment. Configure the firewall on the local NTP server to                        enable UDP input traffic on port 123 and replace with the ip addresses in the cluster. See the following                        sample rule:                        

    # iptables -A RH-Firewall-1-INPUT -s -m state --state NEW -p udp --dport 123 -j ACCEPT

                           Restart iptables. Execute the following command on all the nodes in your                        cluster:

    # iptables service iptables restart

                           Configure clients to use the local NTP server. Edit the                        /etc/ntp.conf and add the following line:                        



groupadd hadoop

useradd -g hadoop hdfs

useradd -g hadoop yarn

passwd hdfs

passwd yarn

passwd hive

passwd mapred


wget http://public-repo-1.hortonworks.com/HDP/tools/

修改directories.sh 文件中的TODO目录,指定为自定义的目录(可根据里面提示修改)

拷贝两个文件到/opt下,给于其它用户权限,修改每个用户 ~/.bash_profile加入两行,


. /opt/usersAndGroups.sh 

. /opt/directories.sh


source /etc/profile


echo "Create the NameNode Directories"

mkdir -p $DFS_NAME_DIR;
chmod -R 755 $DFS_NAME_DIR;

echo "Create the SecondaryNameNode Directories"
chmod -R 755 $FS_CHECKPOINT_DIR;
echo "Create datanode local dir"


mkdir -p $DFS_DATA_DIR;


chmod -R 750 $DFS_DATA_DIR;

echo "Create yarn local dir"

mkdir -p $YARN_LOCAL_DIR;


chmod -R 755 $YARN_LOCAL_DIR;

echo "Create yarn local log dir"



chmod -R 755 $YARN_LOCAL_LOG_DIR;

echo "Create the Log and PID Directories"
chmod -R 755 $HDFS_LOG_DIR;

mkdir -p $YARN_LOG_DIR;
chmod -R 755 $YARN_LOG_DIR;
mkdir -p $HDFS_PID_DIR;
chmod -R 755 $HDFS_PID_DIR;

mkdir -p $YARN_PID_DIR;
chmod -R 755 $YARN_PID_DIR;
mkdir -p $MAPRED_LOG_DIR;
chmod -R 755 $MAPRED_LOG_DIR;
mkdir -p $MAPRED_PID_DIR;
chmod -R 755 $MAPRED_PID_DIR;
mkdir -p $HADOOP_LOG_DIR;

chmod -R 755 $HADOOP_LOG_DIR

mkdir -p $YARN_LOG_DIR;

chmod -R 755 $YARN_LOG_DIR
mkdir -p $MAPRED_LOG_DIR;

chmod -R 755 $MAPRED_LOG_DIR
mkdir -p $HIVE_LOG_DIR;
chmod -R 755 $HIVE_LOG_DIR;

二。 root安装软件:

yum install hadoop hadoop-hdfs hadoop-libhdfs hadoop-yarn hadoop-mapreduce hadoop-client openssl

Install Snappy

Complete the following instructions on all the nodes in your cluster:

  1. Install Snappy.

    yum install snappy snappy-devel
  2. Make the Snappy libraries available to Hadoop:

    ln -sf /usr/lib64/libsnappy.so /usr/lib/hadoop/lib/native/.


    Install LZO

    Execute the following command on all the nodes in your cluster. From a terminal window, type:

    yum install lzo lzo-devel hadoop-lzo hadoop-lzo-native  

三。 修改hadoop配置



 并配置/etc/hosts ,  增加 $ip namenode

 Chapter 3. Setting Up the Hadoop Configuration

This section describes how to set up and edit the deployment configuration files for HDFS and MapReduce.

Use the following instructions to set up Hadoop configuration files:

  1. We strongly suggest that you edit and source the files you downloaded in Download Companion Files.

    Alternatively, you can also copy the contents to your ~/.bash_profile) to set up these environment variables in your environment.

  2. From the downloaded scripts.zip file, extract the files from theconfiguration_files/core_hadoop directory to a temporary directory.

  3. Modify the configuration files.

    In the temporary directory, locate the following files and modify the properties based on your environment.

    Search for TODO in the files for the properties to replace. See Define Environment Parameters for more information.

    1. Edit the /etc/hadoop/conf/hadoop-env.sh file.

    2. Change the value of the -XX:MaxnewSize parameter to 1/8th the value of the maximum heap size (-Xmx) parameter.

    3. Edit the core-site.xml and modify the following properties:

       <description>Enter your NameNode hostname</description>
    4. Edit the hdfs-site.xml and modify the following properties:

       <description>Comma separated list of paths. Use the list of directories from $DFS_NAME_DIR.  
                      For example, /grid/hadoop/hdfs/nn,/grid1/hadoop/hdfs/nn.</description>
       <value>file:///grid/hadoop/hdfs/dn, file:///grid1/hadoop/hdfs/dn</value>  
       <description>Comma separated list of paths. Use the list of directories from $DFS_DATA_DIR.  
                      For example, file:///grid/hadoop/hdfs/dn, file:///grid1/hadoop/hdfs/dn.</description>

       <description>Enter your NameNode hostname for http access.</description>
       <description>Enter your Secondary NameNode hostname.</description>

           <description>A comma separated list of paths. Use the list of directories from $FS_CHECKPOINT_DIR. 
           For example, /grid/hadoop/hdfs/snn,sbr/grid1/hadoop/hdfs/snn,sbr/grid2/hadoop/hdfs/snn </description>


      The value of NameNode new generation size should be 1/8 of maximum heap size (-Xmx). Ensure that you check the default setting for your environment.

      To change the default value:

    5. Edit the yarn-site.xml and modify the following properties:

       <description>Enter your ResourceManager hostname.</description>

       <description>Enter your ResourceManager hostname.</description>

       <description>Enter your ResourceManager hostname.</description>

       <description>Enter your ResourceManager hostname.</description>

       <description>Comma separated list of paths. Use the list of directories from $YARN_LOCAL_DIR.  
                      For example, /grid/hadoop/hdfs/yarn/local,/grid1/hadoop/hdfs/yarn/local.</description>

       <description>Use the list of directories from $YARN_LOCAL_LOG_DIR.  
                      For example, /grid/hadoop/yarn/logs /grid1/hadoop/yarn/logs /grid2/hadoop/yarn/logs</description>

       <description>URL for job history server</description>

       <description>URL for job history server</description>

    6. Edit the mapred-site.xml and modify the following properties:

       <description>Enter your JobHistoryServer hostname.</description>

       <description>Enter your JobHistoryServer hostname.</description>

  4. Optional: Configure MapReduce to use Snappy Compression

    In order to enable Snappy compression for MapReduce jobs, edit core-site.xml and mapred-site.xml.

    1. Add the following properties to mapred-site.xml:  

      <value>-server -XX:NewRatio=8 -Djava.library.path=/usr/lib/hadoop/lib/native/ -Djava.net.preferIPv4Stack=true</value>
      <value>-server -XX:NewRatio=8 -Djava.library.path=/usr/lib/hadoop/lib/native/ -Djava.net.preferIPv4Stack=true</value>


    2. Add the SnappyCodec to the codecs list in core-site.xml:    



  5. Copy the configuration files.

    • $HDFS_USER is the user owning the HDFS services. For example,hdfs.

    • $HADOOP_GROUP is a common group shared by services. For example,hadoop.

    1. On all hosts in your cluster, create the Hadoop configuration directory:

      rm -r $HADOOP_CONF_DIR
      mkdir -p $HADOOP_CONF_DIR

      where $HADOOP_CONF_DIR is the directory for storing the Hadoop configuration files.

      For example, /etc/hadoop/conf.

    2. Copy all the configuration files to $HADOOP_CONF_DIR.

    3. Set appropriate permissions:

      chmod -R 755 $HADOOP_CONF_DIR/../

修改 hadoop-env.sh

export JAVA_HOME=/opt/jdk1.6.0_33




export JAVA_HOME=/opt/jdk1.6.0_33

export HADOOP_YARN_HOME=/www/logs/hadoop-yarn

去掉 所有的$USER




不然读不到lived node

三。hdfs用户 运行测试:


 Format and Start HDFS

  1. Execute these commands on the NameNode host machine:

    su $HDFS_USER
    /usr/lib/hadoop/bin/hadoop namenode -format
    /usr/lib/hadoop/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR start namenode
  2. Execute these commands on the SecondaryNameNode:

    su $HDFS_USER
    /usr/lib/hadoop/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR start secondarynamenode
  3. Execute these commands on all DataNodes:

    su $HDFS_USER
    /usr/lib/hadoop/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR start datanode

jps 查看是否启动成功, 没成功看$HADOOP_LOG_DIR下对应的日志

Smoke Test HDFS


  1. See if you can reach the NameNode server with your browser:

  2. Create hdfs user directory in HDFS:

    su $HDFS_USER
    hadoop fs -mkdir -p /user/hdfs
  3. Try copying a file into HDFS and listing that file:

    su $HDFS_USER
    hadoop fs -copyFromLocal /etc/passwd passwd
    hadoop fs -ls 

4. Test browsing HDFS:



四。 yarn用户 启动yarn



export HADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec


 Start YARN

  1. Execute these commands from the ResourceManager server:

    <login as $YARN_USER>
    /usr/lib/hadoop-yarn/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager
  2. Execute these commands from all NodeManager nodes:

    <login as $YARN_USER>
    /usr/lib/hadoop-yarn/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start nodemanager


  • $YARN_USER is the user owning the YARN services. For example,yarn.

  • $HADOOP_CONF_DIR  is the directory for storing the Hadoop configuration files. For example, /etc/hadoop/conf.




Start MapReduce JobHistory Server

  1. Change permissions on the container-executor file.

    chown -R root:hadoop /usr/lib/hadoop-yarn/bin/container-executor
    chmod -R 6050 /usr/lib/hadoop-yarn/bin/container-executor


    If these permissions are not set, the healthcheck script will return an errror stating that the datanode is UNHEALTHY.

  2. Execute these commands from the JobHistory server to set up directories on HDFS :

    su $HDFS_USER
    hadoop fs -mkdir -p /mr-history/tmp
    hadoop fs -chmod -R 1777 /mr-history/tmp
    hadoop fs -mkdir -p /mr-history/done
    hadoop fs -chmod -R 1777 /mr-history/done
    hadoop fs -chown -R $MAPRED_USER:$HDFS_USER /mr-history
    hadoop fs -mkdir -p /app-logs
    hadoop fs -chmod -R 1777 /app-logs 
    hadoop fs -chown yarn /app-logs 

  3. Execute these commands from the JobHistory server:

    <login as $MAPRED_USER>
    export HADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec/
    /usr/lib/hadoop-mapreduce/sbin/mr-jobhistory-daemon.sh --config $HADOOP_CONF_DIR start historyserver


  • $HDFS_USER is the user owning the HDFS services. For example,hdfs.

  • $MAPRED_USER is the user owning the MapRed services. For example,mapred.

  • $HADOOP_CONF_DIR is the directory for storing the Hadoop configuration files. For example, /etc/hadoop/conf.

五。 hive 安装

Install the Hive and HCatalog RPMs

  1. On all client/gateway nodes (on which Hive programs will be executed), Hive Metastore Server, and HiveServer2 machine, install the Hive RPMs.

    yum install hive hcatalog

  2. Optional - Download and add the database connector JAR.

    By default, Hive uses embedded Derby database for its metastore. However, you can optionally choose to enable remote database (MySQL) for Hive metastore.

    1. Execute the following command on the Hive metastore machine.

      yum install mysql-connector-java*

    2. After the yum install, the mysql jar is placed in '/usr/share/java/'. Copy the downloaded JAR file to the /usr/lib/hive/lib/ directory  on your Hive host machine.

    3. Ensure that the JAR file has appropriate permissions. 

创建用户 :

CREATE USER 'hive'@'localhost' IDENTIFIED BY '123456';

GRANT ALL PRIVILEGES ON *.* TO 'hive'@'localhost';

#CREATE USER 'hive'@'%' IDENTIFIED BY '123456';



 Set Up the Hive/HCatalog Configuration Files

Use the following instructions to set up the Hive/HCatalog configuration files:

  1. Extract the Hive/HCatalog configuration files.

    From the downloaded scripts.zip file, extract the files inconfiguration_files/hive directory to a temporary directory.

  2. Modify the configuration files.

    In the temporary directory, locate the following file and modify the properties based on your environment. Search for TODO in the files for the properties to replace.

    1. Edit hive-site.xml and modify the following properties:

       <description>Enter your JDBC connection string. </description>
       <description>Enter your MySQL credentials. </description>
       <description>Enter your MySQL credentials. </description>

      Enter your MySQL credentials from Install MySQL (Optional).

       <description>URI for client to contact metastore server. To enable HiveServer2, leave the property value empty. </description>

  3. Copy the configuration files.

    1. On all Hive hosts create the Hive configuration directory.

      rm -r $HIVE_CONF_DIR ;
      mkdir -p $HIVE_CONF_DIR ;
    2. Copy all the configuration files to $HIVE_CONF_DIRdirectory.

    3. Set appropriate permissions:

      chmod -R 755 $HIVE_CONF_DIR/../ ;


chmod -R 755 $HIVE_LIB/../ ;

Create Directories on HDFS

  1. Create Hive user home directory on HDFS.

    Login as $HDFS_USER
    hadoop fs -mkdir -p /user/$HIVE_USER
    hadoop fs -chown $HIVE_USER:$HDFS_USER /user/$HIVE_USER
  2. Create warehouse directory on HDFS.  

    Login as $HDFS_USER
    hadoop fs -mkdir -p /apps/hive/external
    hadoop fs -mkdir -p /apps/hive/warehouse
    hadoop fs -chown -R $HIVE_USER:$HDFS_USER /apps/hive
    hadoop fs -chmod -R 775 /apps/hive 


    • $HDFS_USER is the user owning the HDFS services. For example,hdfs.

    • $HIVE_USER is the user owning the Hive services. For example,hive.

  3. Create hive scratch directory on HDFS.  

    Login as $HDFS_USER
    hadoop fs -mkdir -p /tmp/scratch
    hadoop fs -chown -R $HIVE_USER:$HDFS_USER /tmp/scratch
    hadoop fs -chmod -R 777 /tmp/scratch 

Validate the Installation

Use the following steps to validate your installation:

  1. Start Hive Metastore service.

     Login as $HIVE_USER
    nohup hive --service metastore>$HIVE_LOG_DIR/hive.out 2>$HIVE_LOG_DIR/hive.log & 
  2. Smoke Test Hive.

    1. Open Hive command line shell.

    2. Run sample commands.

      show databases;
      create table test(col1 int, col2 string);
      show tables;
  3. Start HiveServer2.

     /usr/lib/hive/bin/hiveserver2 >$HIVE_LOG_DIR/hiveserver2.out 2> $HIVE_LOG_DIR/hiveserver2.log & 

  4. Smoke Test HiveServer2.

    1. Open Beeline command line shell to interact with HiveServer2.

    2. Establish connection to server.

      !connect jdbc:hive2://$hive.server.full.hostname:10000 $HIVE_USER password org.apache.hive.jdbc.HiveDriver
    3. Run sample commands.

      show databases;
      create table test2(a int, b string);
      show tables;

六, sqooop 安装



   启动start ,停止stop;

   --config $HADOOP_CONF_DIR  为可选,有默认路径: /etc/hadoop/conf

    su $HDFS_USER

    hadoop-daemon.sh start namenode

   hadoop-daemon.sh start secondarynamenode

   su $YARN-USER

   yarn-daemon.sh start resourcemanager

   yar-daemon.sh start nodemanager



 export HADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec/
 /usr/lib/hadoop-mapreduce/sbin/mr-jobhistory-daemon.sh  start historyserver



Start Hive Metastore service:

nohup hive --service metastore>$HIVE_LOG_DIR/hive.out 2>$HIVE_LOG_DIR/hive.log & 

 Start HiveServer2:

/usr/lib/hive/bin/hiveserver2 >$HIVE_LOG_DIR/hiveserver2.out 2> $HIVE_LOG_DIR/hiveserver2.log & 

如果换namenode ip了,都需要重新格式化,还需要重建hadoop目录

如果独立部署hive, 安装hive后,配置hive和日志目录以及相关目录权限,如果出现错误,请把日志级别改为debug,就可以查看详细的错误日志




mkdir -p $OOZIE_DATA_DIR;


chmod -R 755 $OOZIE_DATA_DIR;


chown -R $OOZIE_USER:$HADOOP_GROUP /etc/oozie;

chmod -R 755 /etc/oozie;

chown -R $OOZIE_USER:$HADOOP_GROUP /usr/lib/oozie;

chmod -R 755 /usr/lib/oozie;


[Optional] Install MySQL

If you are installing Hive and HCatalog services, you need a MySQL database instance to store            metadata information. You can either use an existing MySQL instance or install a new instance of            MySQL manually. To install a new instance:

  1. Connect to the host machine you plan to use for Hive and HCatalog.

  2. Install MySQL server. From a terminal window, type:

    For RHEL/CentOS:                    

    yum install mysql-server
  3. Start the instance.


    /etc/init.d/mysqld start


  4. Set the root user password.

    mysqladmin -u root -p‘{password}’ password $mysqlpassword
  5. Remove unnecessary information from log and STDOUT.

    mysqladmin -u root 2>&1 >/dev/null
  6. As root, use mysql (or other client tool) to create the “dbuser”                and grant it adequate privileges. This user provides access to the Hive metastore.

    CREATE USER '$dbusername'@'localhost' IDENTIFIED BY '$dbuserpassword';
    GRANT ALL PRIVILEGES ON *.* TO 'dbuser'@'localhost';
    CREATE USER 'dbuser'@'%' IDENTIFIED BY 'dbuserpassword';
    GRANT ALL PRIVILEGES ON *.* TO 'dbuser'@'%';
  7. See if you can connect to the database as that user. You are prompted to enter the $dbuserpassword                password above.

    mysql -u dbuser -p $dbuserpassword
  8. Install the MySQL connector JAR file:                    

    yum install mysql-connector-java*
