当前位置:   article > 正文

Hadoop启动脚本分析_hadoop脚本

hadoop脚本

1.脚本调用图

2.start-all.sh

此脚本很简单,就是根据运行此脚本的目录进入安装hadoop目录下的bin目录,然后运行启动hdfs和mapred的启动脚本。hadoop默认优先加载libexec/hadoop-config.sh。libexec/hadoop-config.sh与bin/hadoop-config.sh初始内容一样。

# Start all hadoop daemons.  Run this on master node.

bin=`dirname "$0"`
bin=`cd "$bin"; pwd`

if [ -e "$bin/../libexec/hadoop-config.sh" ]; then
  . "$bin"/../libexec/hadoop-config.sh
else
  . "$bin/hadoop-config.sh"
fi

# start dfs daemons
"$bin"/start-dfs.sh --config $HADOOP_CONF_DIR

# start mapred daemons
"$bin"/start-mapred.sh --config $HADOOP_CONF_DIR
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16

3.start-dfs.sh

# Start hadoop dfs daemons.
# Optinally upgrade or rollback dfs state.
# Run this on master node.

usage="Usage: start-dfs.sh [-upgrade|-rollback]"

bin=`dirname "$0"`
bin=`cd "$bin"; pwd`

if [ -e "$bin/../libexec/hadoop-config.sh" ]; then
  . "$bin"/../libexec/hadoop-config.sh
else
  . "$bin/hadoop-config.sh"
fi

# get arguments
if [ $# -ge 1 ]; then
	nameStartOpt=$1
	shift
	case $nameStartOpt in
	  (-upgrade)
	  	;;
	  (-rollback) 
	  	dataStartOpt=$nameStartOpt
	  	;;
	  (*)
		  echo $usage
		  exit 1
	    ;;
	esac
fi

# start dfs daemons
# start namenode after datanodes, to minimize time namenode is up w/o data
# note: datanodes will log connection errors until namenode starts
"$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR start namenode $nameStartOpt
"$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR start datanode $dataStartOpt
"$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR --hosts masters start secondarynamenode
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38

此脚本首先检查是否带有参数,从以上代码可以看出此脚本只支持upgrade和rollback两个选项参数,一个参数用于更新文件系统,另一个是回滚文件系统,然后就开始启动namenode、datanode和secondarynamenode节点。

代码中的$HADOOP_CONF_DIR是在另一个脚本中设置的,这个脚本是hadoop-config.sh,后面会详细介绍,因为这个脚本在每一个启动脚本执行中都先执行,目的是为了检查和设置一些环境变量,例如JAVA_HOME和HADOOP_HOME等,而这个脚本又会执行hadoop-env.sh脚本来设置用户配置的相关环境变量,后面详细介绍这两个脚本。

从上面的脚本代码可以看出在启动namenode节点是在hadoop-daemon.sh脚本中启动,下面一节将分析这个脚本。而datanode和secondarynamenode节点的启动也会通过hadoop-daemon.sh脚本来执行。后面也将分析这个脚本的运行情况。

4.hadoop-daemon.sh

hadoop-daemon.sh用于启动namenode,脚本如下:

# Runs a Hadoop command as a daemon.
#
# Environment Variables
#
#   HADOOP_CONF_DIR  Alternate conf dir. Default is ${HADOOP_PREFIX}/conf.
#   HADOOP_LOG_DIR   Where log files are stored.  PWD by default.
#   HADOOP_MASTER    host:path where hadoop code should be rsync'd from
#   HADOOP_PID_DIR   The pid files are stored. /tmp by default.
#   HADOOP_IDENT_STRING   A string representing this instance of hadoop. $USER by default
#   HADOOP_NICENESS The scheduling priority for daemons. Defaults to 0.
##

usage="Usage: hadoop-daemon.sh [--config <conf-dir>] [--hosts hostlistfile] (start|stop) <hadoop-command> <args...>"

# if no args specified, show usage
if [ $# -le 1 ]; then
  echo $usage
  exit 1
fi

bin=`dirname "$0"`
bin=`cd "$bin"; pwd`

if [ -e "$bin/../libexec/hadoop-config.sh" ]; then
  . "$bin"/../libexec/hadoop-config.sh
else
  . "$bin/hadoop-config.sh"
fi

# get arguments
startStop=$1
shift
command=$1
shift

##将out日志文件进行备份:将out4->out5,out3->out4,out2-out3,out1->out2,out->out1
hadoop_rotate_log ()
{
    log=$1;
    num=5;
    if [ -n "$2" ]; then
	num=$2
    fi
    if [ -f "$log" ]; then # rotate logs
	while [ $num -gt 1 ]; do
	    prev=`expr $num - 1`
	    [ -f "$log.$prev" ] && mv "$log.$prev" "$log.$num"
	    num=$prev
	done
	mv "$log" "$log.$num";
    fi
}

if [ -f "${HADOOP_CONF_DIR}/hadoop-env.sh" ]; then
  . "${HADOOP_CONF_DIR}/hadoop-env.sh"
fi

# Determine if we're starting a secure datanode, and if so, redefine appropriate variables
if [ "$command" == "datanode" ] && [ "$EUID" -eq 0 ] && [ -n "$HADOOP_SECURE_DN_USER" ]; then
  export HADOOP_PID_DIR=$HADOOP_SECURE_DN_PID_DIR
  export HADOOP_LOG_DIR=$HADOOP_SECURE_DN_LOG_DIR
  export HADOOP_IDENT_STRING=$HADOOP_SECURE_DN_USER   
fi

if [ "$HADOOP_IDENT_STRING" = "" ]; then
  export HADOOP_IDENT_STRING="$USER"
fi

# get log directory
if [ "$HADOOP_LOG_DIR" = "" ]; then
  export HADOOP_LOG_DIR="$HADOOP_HOME/logs"
fi
mkdir -p "$HADOOP_LOG_DIR"
touch $HADOOP_LOG_DIR/.hadoop_test > /dev/null 2>&1
TEST_LOG_DIR=$?
if [ "${TEST_LOG_DIR}" = "0" ]; then
  rm -f $HADOOP_LOG_DIR/.hadoop_test
else
  chown $HADOOP_IDENT_STRING $HADOOP_LOG_DIR 
fi

if [ "$HADOOP_PID_DIR" = "" ]; then
  HADOOP_PID_DIR=/tmp
fi

# some variables
export HADOOP_LOGFILE=hadoop-$HADOOP_IDENT_STRING-$command-$HOSTNAME.log
export HADOOP_ROOT_LOGGER="INFO,DRFA"
##log=/_nosql/hadoop/logs/hadoop-root-namenode-ubuntu.out
##pid=/tmp/hadoop-root-namenode.pid
log=$HADOOP_LOG_DIR/hadoop-$HADOOP_IDENT_STRING-$command-$HOSTNAME.out
pid=$HADOOP_PID_DIR/hadoop-$HADOOP_IDENT_STRING-$command.pid

# Set default scheduling priority
if [ "$HADOOP_NICENESS" = "" ]; then
    export HADOOP_NICENESS=0
fi

case $startStop in

  (start)
    ##创建pid文件
    mkdir -p "$HADOOP_PID_DIR"

    if [ -f $pid ]; then
      if kill -0 `cat $pid` > /dev/null 2>&1; then
        echo $command running as process `cat $pid`.  Stop it first.
        exit 1
      fi
    fi

    if [ "$HADOOP_MASTER" != "" ]; then
      echo rsync from $HADOOP_MASTER
      rsync -a -e ssh --delete --exclude=.svn --exclude='logs/*' --exclude='contrib/hod/logs/*' $HADOOP_MASTER/ "$HADOOP_HOME"
    fi
    
##out日志备份
    hadoop_rotate_log $log
   
 echo starting $command, logging to $log
    cd "$HADOOP_PREFIX"
    nohup nice -n $HADOOP_NICENESS "$HADOOP_PREFIX"/bin/hadoop --config $HADOOP_CONF_DIR $command "$@" > "$log" 2>&1 < /dev/null &
    ##将上一语句运行之进程ID号写入pid文件
echo $! > $pid
    sleep 1; head "$log"
    ;;
          
  (stop)

    if [ -f $pid ]; then
      if kill -0 `cat $pid` > /dev/null 2>&1; then
        echo stopping $command
        kill `cat $pid`
      else
        echo no $command to stop
      fi
    else
      echo no $command to stop
    fi
    ;;

  (*)
    echo $usage
    exit 1
    ;;

esac
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
  • 134
  • 135
  • 136
  • 137
  • 138
  • 139
  • 140
  • 141
  • 142
  • 143
  • 144
  • 145
  • 146
  • 147

在具体介绍这个脚本以前先介绍几个环境变量的意义(在这个脚本的注释部分有介绍):

HADOOP_CONF_DIR选择配置文件目录。默认是${HADOOP_HOME}/conf。HADOOP_LOG_DIR 存放日志文件的目录。默认是 PWD命令产生的目录(hadoop根目录下)HADOOP_MASTER host:path where hadoopcode should be rsync’d from
HADOOP_PID_DIR The pid files arestored. /tmp by default.
HADOOP_IDENT_STRING A string representing this instance of hadoop. $USER bydefaultHADOOP_NICENESS The scheduling priority for daemons. Defaults to 0.

这个脚本首先判断所带的参数是否小于1,如果小于就打印使用此脚本的使用帮助,shell代码如下:

usage=“Usage:hadoop-daemon.sh [–config ] [–hosts hostlistfile](start|stop) <args…>”
if [ $# -le 1 ]; then
echo $usage
exit 1
fi

然后同其他脚本一样执行hadoop-config.sh脚本检查和设置相关环境变量。对于此脚本,hadoop-config.sh脚本的作用就是把配置文件和主机列表的文件处理好了并设置相应的环境变量保存它们。

接着保存启动还是停止的命令和相关参数,如下(注意:shift的shell脚本的作用就是将shell脚本所带参数向前移动一个):

startStop=$1
shift
command=$1
shift

继续就是定义一个用于滚动日志的函数:

##将out日志文件进行备份:将out4->out5,out3->out4,out2-out3,out1->out2,out->out1

hadoop_rotate_log()

{

log=$1;

num=5;

if [ -n “$2” ]; then

num=$2

fi

if [ -f “$log” ]; then # rotatelogs

while [ $num -gt 1 ]; do

prev=`expr $num - 1`

[ -f “ l o g . log. log.prev” ] && mv “ l o g . log. log.prev”“ l o g . log. log.num”

num=$prev

done

mv “ l o g " " log"" log""log.$num”;

fi

}

后面是一些根据配置文件中的配置选项来设置前面提到的环境变量,这些环境变量会用于具体启动namenode,例如有调度优先级的环境变量等:

if[ -f “${HADOOP_CONF_DIR}/hadoop-env.sh” ]; then

.“${HADOOP_CONF_DIR}/hadoop-env.sh”

fi

#Determine if we’re starting a secure datanode, and if so, redefine appropriatevariables

if[ “KaTeX parse error: Can't use function '\]' in math mode at position 24: … == "datanode" \̲]̲ && \[ "EUID”-eq 0 ] && [ -n “$HADOOP_SECURE_DN_USER” ]; then

exportHADOOP_PID_DIR=$HADOOP_SECURE_DN_PID_DIR

exportHADOOP_LOG_DIR=$HADOOP_SECURE_DN_LOG_DIR

exportHADOOP_IDENT_STRING=$HADOOP_SECURE_DN_USER

fi

## echo $USER 输出为当前登录用户名root

if[ “$HADOOP_IDENT_STRING” = “” ]; then

export HADOOP_IDENT_STRING=“$USER”

fi

#get log directory

if[ “$HADOOP_LOG_DIR” = “” ]; then

export HADOOP_LOG_DIR=“$HADOOP_HOME/logs”

fi

mkdir-p “$HADOOP_LOG_DIR”

touch$HADOOP_LOG_DIR/.hadoop_test > /dev/null 2>&1

## $最近一次命令退出状态

TEST_LOG_DIR=$

if[ “${TEST_LOG_DIR}” = “0” ]; then

rm -f $HADOOP_LOG_DIR/.hadoop_test

else

chown $HADOOP_IDENT_STRING $HADOOP_LOG_DIR

fi

## 进程号保存文件所在目录

if[ “$HADOOP_PID_DIR” = “” ]; then

HADOOP_PID_DIR=/tmp

fi

#some variables

exportHADOOP_LOGFILE=hadoop- H A D O O P _ I D E N T _ S T R I N G − HADOOP\_IDENT\_STRING- HADOOP_IDENT_STRINGcommand-$HOSTNAME.log

## 用于设置客户端日志级别

exportHADOOP_ROOT_LOGGER=“INFO,DRFA”

##log=/_nosql/hadoop/logs/hadoop-root-namenode-ubuntu.out

## pid=/tmp/hadoop-root-namenode.pid

log= H A D O O P _ L O G _ D I R / h a d o o p − HADOOP\_LOG\_DIR/hadoop- HADOOP_LOG_DIR/hadoopHADOOP_IDENT_STRING- c o m m a n d − command- commandHOSTNAME.out

pid= H A D O O P _ P I D _ D I R / h a d o o p − HADOOP\_PID\_DIR/hadoop- HADOOP_PID_DIR/hadoopHADOOP_IDENT_STRING-$command.pid

#Set default scheduling priority

if[ “$HADOOP_NICENESS” = “” ]; then

export HADOOP_NICENESS=0

fi

最后就是根据命令就是控制namenode的启停(start或stop)了,具体代码如下:

case $startStop in

(start)

mkdir -p “$HADOOP_PID_DIR”

if [ -f $pid ]; then

if kill -0 `cat $pid` > /dev/null 2>&1; then

echo c o m m a n d r u n n i n g a s p r o c e s s c ˋ a t command running as process \`cat commandrunningasprocesscˋatpid`. Stop it first.

exit 1

fi

fi

if [ “$HADOOP_MASTER” != “”]; then

echo rsync from $HADOOP_MASTER

## rsync是类unix系统下的数据镜像备份工具,从软件的命名上就可以看出来了——remote sync。

rsync -a -e ssh --delete --exclude=.svn–exclude=‘logs/*’ --exclude=‘contrib/hod/logs/*’ H A D O O P _ M A S T E R / " HADOOP\_MASTER/" HADOOP_MASTER/"HADOOP_HOME"

fi

## 日志文件备份

hadoop_rotate_log $log

echo starting $command, logging to $log

cd “$HADOOP_PREFIX”

nohup nice -n H A D O O P _ N I C E N E S S " HADOOP\_NICENESS " HADOOP_NICENESS"HADOOP_PREFIX"/bin/hadoop–config $HADOOP_CONF_DIR c o m m a n d " command " command"@" > "$log"2>&1 < /dev/null &

## $!最后一次后台进程的ID号

echo$! > $pid

sleep 1; head “$log”

;;

(stop)

if [ -f $pid ]; then

if kill -0 `cat $pid` > /dev/null2>&1; then

echo stopping $command

kill `cat $pid`

else

echo no $command to stop

fi

else

echo no $command to stop

fi

;;

(*)

echo$usage

exit 1

;;

esac

如果是start就是启动namenode的命令,那么首先创建存放pid文件的目录,如果存放pid的文件已经存在说明已经有namenode节点已经在运行了,那么就先停止在启动。然后根据日志滚动函数生成日志文件,最后就用nice根据调度优先级启动namenode,但是最终的启动还在另一个脚本hadoop,这个脚本是启动所有节点的终极脚本,它会选择一个带有main函数的类用java启动,这样才到达真正的启动java守护进程的效果,这个脚本是启动的重点,也是我们分析hadoop源码的入口处,所以后面章节重点分析。

如果是stop命令就执行简单的停止命令(kill pid),其他都是错误的,打印提示使用此脚本的文档。

5.hadoop-daemons.sh和slaves.sh

hadoop-daemons.sh用于启动datanode和secondarynamenode,脚本如下:

# Run a Hadoop command on all slave hosts.

usage="Usage: hadoop-daemons.sh [--config confdir] [--hosts hostlistfile] [start|stop] command args..."

# if no args specified, show usage
if [ $# -le 1 ]; then
  echo $usage
  exit 1
fi

bin=`dirname "$0"`
bin=`cd "$bin"; pwd`

if [ -e "$bin/../libexec/hadoop-config.sh" ]; then
  . "$bin"/../libexec/hadoop-config.sh
else
  . "$bin/hadoop-config.sh"
fi

exec "$bin/slaves.sh" --config $HADOOP_CONF_DIR cd "$HADOOP_HOME" ; "$bin/hadoop-daemon.sh" --config $HADOOP_CONF_DIR "$@"
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20

这个脚本简单,因为他最后也是通过上一节介绍的脚本hadoop-daemon.sh来启动的,只是在这之前做了一些特殊处理,就是执行另一个脚本slaves.sh,代码如下:

exec" b i n / s l a v e s . s h " − − c o n f i g bin/slaves.sh" --config bin/slaves.sh"configHADOOP_CONF_DIR cd" H A D O O P _ H O M E " ; " HADOOP\_HOME" ;" HADOOP_HOME";"bin/hadoop-daemon.sh"–config H A D O O P _ C O N F _ D I R " HADOOP\_CONF\_DIR" HADOOP_CONF_DIR"@"

slaves.sh脚本的主要功能就是通过ssh在所有的从节点上运行启动从节点的启动脚本,就是上面代码中的最后两条命令,进入hadoop的目录运行bin目录下的hadoop-daemon.sh脚本。执行这个功能的代码如下:

# Run a shell command on all slave hosts.
#
# Environment Variables
#
#   HADOOP_SLAVES    File naming remote hosts.
#     Default is ${HADOOP_CONF_DIR}/slaves.
#   HADOOP_CONF_DIR  Alternate conf dir. Default is ${HADOOP_HOME}/conf.
#   HADOOP_SLAVE_SLEEP Seconds to sleep between spawning remote commands.
#   HADOOP_SSH_OPTS Options passed to ssh when running remote commands.
##

usage="Usage: slaves.sh [--config confdir] command..."

# if no args specified, show usage
if [ $# -le 0 ]; then
  echo $usage
  exit 1
fi

bin=`dirname "$0"`
bin=`cd "$bin"; pwd`

if [ -e "$bin/../libexec/hadoop-config.sh" ]; then
  . "$bin"/../libexec/hadoop-config.sh
else
  . "$bin/hadoop-config.sh"
fi

# If the slaves file is specified in the command line,
# then it takes precedence over the definition in 
# hadoop-env.sh. Save it here.
HOSTLIST=$HADOOP_SLAVES

if [ -f "${HADOOP_CONF_DIR}/hadoop-env.sh" ]; then
  . "${HADOOP_CONF_DIR}/hadoop-env.sh"
fi

if [ "$HOSTLIST" = "" ]; then
  if [ "$HADOOP_SLAVES" = "" ]; then
    export HOSTLIST="${HADOOP_CONF_DIR}/slaves"
  else
    export HOSTLIST="${HADOOP_SLAVES}"
  fi
fi

for slave in `cat "$HOSTLIST"|sed  "s/#.*$//;/^$/d"`; do
 ssh $HADOOP_SSH_OPTS $slave $"${@// /\ }" 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47

2>&1 | sed “s/^/KaTeX parse error: Expected 'EOF', got '&' at position 11: slave: /" &̲ if [ "HADOOP_SLAVE_SLEEP” != “” ]; then
sleep $HADOOP_SLAVE_SLEEP
fi
done

wait
  • 1

以上代码首先找到所有从节点的主机名称(在slaves文件中,或者配置文件中配置有),然后通过for循环依次通过ssh远程后台运行启动脚本程序,最后等待程序完成才退出此shell脚本。

ssh $HADOOP_SSH_OPTS $slave " " "{@// /\ }"

2>&1 |sed “s/^/$slave: /” &

上面语句示例如下:

ssh localhost cd /_nosql/hadoop/libexec/… ;/_nosql/hadoop/bin/hadoop-daemon.sh --config /_nosql/hadoop/libexec/…/confstart datanode

因此这个脚本主要完成的功能就是在所有从节点执行启动相应节点的脚本。这个脚本执行datanode是从slaves文件中找到datanode节点,执行secondarynamenode是在master文件找到节点主机(在start-dfs.sh脚本中用-hostsmaster指定的,不然默认会找到slaves文件,datanode就是按默认找到的)。

6.start-mapred.sh

# Start hadoop map reduce daemons.  Run this on master node.

bin=`dirname "$0"`
bin=`cd "$bin"; pwd`

if [ -e "$bin/../libexec/hadoop-config.sh" ]; then
  . "$bin"/../libexec/hadoop-config.sh
else
  . "$bin/hadoop-config.sh"
fi

# start mapred daemons
# start jobtracker first to minimize connection errors at startup
"$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR start jobtracker
"$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR start tasktracker
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15

这个脚本就两句重要代码,就是分别启动jobtracker和tasktracker节点,其他的环境变量还是通过相应的脚本照样设置,如下:

" b i n " / h a d o o p − d a e m o n . s h − − c o n f i g bin"/hadoop-daemon.sh --config bin"/hadoopdaemon.shconfigHADOOP_CONF_DIR start jobtracker" b i n " / h a d o o p − d a e m o n s . s h − − c o n f i g bin"/hadoop-daemons.sh --config bin"/hadoopdaemons.shconfigHADOOP_CONF_DIR start tasktracker

从代码可以看出还是通过上一节相同的方式来启动,具体就不在分析了,请看前一节。

7.hadoop.sh

脚本太长就不再贴了。

这个脚本才是重点,前面的脚本执行都是为这个脚本执行做铺垫的,这个脚本的功能也是相当强大,不仅仅可以启动各个节点的服务,还能够执行很多命令和工具。它会根据传入的参数来决定执行什么样的功能(包括启动各个节点服务),下面详细介绍这个脚本的执行流程和功能。

1. 切换到bin目录下运行脚本hadoop-config.sh,代码如下:

bin=`dirname"$0"`

bin=`cd"$bin"; pwd`

if[ -e “$bin”/…/libexec/hadoop-config.sh ]; then

.“$bin”/…/libexec/hadoop-config.sh

else

. “$bin”/hadoop-config.sh

fi

2.得到hadoop运行实例的名称和检测运行hadoop的环境是否是windows下的linux模拟环境cygwin,代码如下:

cygwin=false

case"`uname`" in

CYGWIN*)cygwin=true;;

esac

3.判断参数个数是否为0个,是的话打印脚本使用方式并退出,否则就获得具体命令,获得命令的代码如下:

#if no args specified, show usage

if[ $# = 0 ]; then

echo “Usage: hadoop [–config confdir]COMMAND”

echo “where COMMAND is one of:”

echo “namenode -format format theDFS filesystem”

echo “secondarynamenode run the DFSsecondary namenode”

echo “namenode run the DFSnamenode”

echo “datanode run a DFSdatanode”

echo “dfsadmin run a DFSadmin client”

echo “mradmin run aMap-Reduce admin client”

echo “fsck run a DFS filesystem checkingutility”

echo “fs run a genericfilesystem user client”

echo “balancer run a clusterbalancing utility”

echo “fetchdt fetch adelegation token from the NameNode”

echo “jobtracker run theMapReduce job Tracker node”

echo “pipes run a Pipesjob”

echo “tasktracker run aMapReduce task Tracker node”

echo “historyserver run jobhistory servers as a standalone daemon”

echo “job manipulateMapReduce jobs”

echo “queue getinformation regarding JobQueues”

echo “version print theversion”

echo “jar run ajar file”

echo “distcp copy file or directoriesrecursively”

echo “archive -archiveName NAME -p * create a hadoop archive”

echo “classpath prints theclass path needed to get the”

echo " Hadoop jar and therequired libraries"

echo “daemonlog get/set thelog level for each daemon”

echo " or"

echo “CLASSNAME run the classnamed CLASSNAME”

echo “Most commands print help wheninvoked w/o parameters.”

exit 1

fi

#get arguments

COMMAND=$1

shift

4.Determine if we’restarting a secure datanode, and if so, redefine appropriate variables

#Determine if we’re starting a secure datanode, and if so, redefine appropriatevariables

if[ “KaTeX parse error: Can't use function '\]' in math mode at position 24: … == "datanode" \̲]̲ && \[ "EUID”-eq 0 ] && [ -n “$HADOOP_SECURE_DN_USER” ]; then

HADOOP_PID_DIR=$HADOOP_SECURE_DN_PID_DIR

HADOOP_LOG_DIR=$HADOOP_SECURE_DN_LOG_DIR

HADOOP_IDENT_STRING=$HADOOP_SECURE_DN_USER

starting_secure_dn=“true”

fi

5.设置java执行的相关参数,例如JAVA_HOME变量、运行jvm的最大堆空间等,代码如下:

#some Java parameters

if[ “$JAVA_HOME” != “” ]; then

#echo “run java in $JAVA_HOME”

JAVA_HOME=$JAVA_HOME

fi

if[ “$JAVA_HOME” = “” ]; then

echo “Error: JAVA_HOME is not set.”

exit 1

fi

JAVA=$JAVA_HOME/bin/java

JAVA_HEAP_MAX=-Xmx1000m

#check envvars which might override default args

if[ “$HADOOP_HEAPSIZE” != “” ]; then

#echo “run with heapsize$HADOOP_HEAPSIZE”

JAVA_HEAP_MAX=“-Xmx”“$HADOOP_HEAPSIZE”“m”

#echo $JAVA_HEAP_MAX

fi

6.设置CLASSPATH,这一步很重要,因为不设置的话很多类可能找不到,具体设置了那些路径到CLASSPATH看下面的具体代码:

# CLASSPATH initially contains $HADOOP_CONF_DIR

CLASSPATH=“${HADOOP_CONF_DIR}”

if[ “KaTeX parse error: Can't use function '\]' in math mode at position 39: …\_FIRST" != "" \̲]̲ && \["HADOOP_CLASSPATH” != “” ] ; then

CLASSPATH= C L A S S P A T H : {CLASSPATH}: CLASSPATH:{HADOOP_CLASSPATH}

fi

CLASSPATH= C L A S S P A T H : {CLASSPATH}: CLASSPATH:JAVA_HOME/lib/tools.jar

#for developers, add Hadoop classes to CLASSPATH

if[ -d “$HADOOP_HOME/build/classes” ]; then

CLASSPATH= C L A S S P A T H : {CLASSPATH}: CLASSPATH:HADOOP_HOME/build/classes

fi

if[ -d “$HADOOP_HOME/build/webapps” ]; then

CLASSPATH= C L A S S P A T H : {CLASSPATH}: CLASSPATH:HADOOP_HOME/build

fi

if[ -d “$HADOOP_HOME/build/test/classes” ]; then

CLASSPATH= C L A S S P A T H : {CLASSPATH}: CLASSPATH:HADOOP_HOME/build/test/classes

fi

if[ -d “$HADOOP_HOME/build/tools” ]; then

CLASSPATH= C L A S S P A T H : {CLASSPATH}: CLASSPATH:HADOOP_HOME/build/tools

fi

#so that filenames w/ spaces are handled correctly in loops below

IFS=

#for releases, add core hadoop jar & webapps to CLASSPATH

if[ -e $HADOOP_PREFIX/share/hadoop/hadoop-core-* ]; then

# binary layout

if [ -d “$HADOOP_PREFIX/share/hadoop/webapps”]; then

CLASSPATH= C L A S S P A T H : {CLASSPATH}: CLASSPATH:HADOOP_PREFIX/share/hadoop

fi

for f in$HADOOP_PREFIX/share/hadoop/hadoop-core-*.jar; do

CLASSPATH= C L A S S P A T H : {CLASSPATH}: CLASSPATH:f;

done

# add libs to CLASSPATH

for f in $HADOOP_PREFIX/share/hadoop/lib/*.jar;do

CLASSPATH= C L A S S P A T H : {CLASSPATH}: CLASSPATH:f;

done

for f in$HADOOP_PREFIX/share/hadoop/lib/jsp-2.1/*.jar; do

CLASSPATH= C L A S S P A T H : {CLASSPATH}: CLASSPATH:f;

done

for f in$HADOOP_PREFIX/share/hadoop/hadoop-tools-*.jar; do

TOOL_PATH= T O O L _ P A T H : {TOOL\_PATH}: TOOL_PATH:f;

done

else

# tarball layout

if [ -d “$HADOOP_HOME/webapps” ];then

CLASSPATH= C L A S S P A T H : {CLASSPATH}: CLASSPATH:HADOOP_HOME

fi

for f in $HADOOP_HOME/hadoop-core-*.jar; do

CLASSPATH= C L A S S P A T H : {CLASSPATH}: CLASSPATH:f;

done

# add libs to CLASSPATH

for f in $HADOOP_HOME/lib/*.jar; do

CLASSPATH= C L A S S P A T H : {CLASSPATH}: CLASSPATH:f;

done

if [ -d"$HADOOP_HOME/build/ivy/lib/Hadoop/common" ]; then

for f in$HADOOP_HOME/build/ivy/lib/Hadoop/common/*.jar; do

CLASSPATH= C L A S S P A T H : {CLASSPATH}: CLASSPATH:f;

done

fi

for f in $HADOOP_HOME/lib/jsp-2.1/*.jar; do

CLASSPATH= C L A S S P A T H : {CLASSPATH}: CLASSPATH:f;

done

for f in $HADOOP_HOME/hadoop-tools-*.jar; do

TOOL_PATH= T O O L _ P A T H : {TOOL\_PATH}: TOOL_PATH:f;

done

for f in$HADOOP_HOME/build/hadoop-tools-*.jar; do

TOOL_PATH= T O O L _ P A T H : {TOOL\_PATH}: TOOL_PATH:f;

done

fi

#add user-specified CLASSPATH last

if[ “KaTeX parse error: Can't use function '\]' in math mode at position 38: …H\_FIRST" = "" \̲]̲ && \["HADOOP_CLASSPATH” != “” ]; then

CLASSPATH= C L A S S P A T H : {CLASSPATH}: CLASSPATH:{HADOOP_CLASSPATH}

fi

#default log directory & file

if[ “$HADOOP_LOG_DIR” = “” ]; then

HADOOP_LOG_DIR=“$HADOOP_HOME/logs”

fi

if[ “$HADOOP_LOGFILE” = “” ]; then

HADOOP_LOGFILE=‘hadoop.log’

fi

#default policy file for service-level authorization

if[ “$HADOOP_POLICYFILE” = “” ]; then

HADOOP_POLICYFILE=“hadoop-policy.xml”

fi

#restore ordinary behaviour

unsetIFS

上面代码省略很大一部分,具体还有那些可以看具体的hadoop脚本。

7.根据3保存的命令选择对应的启动java类,如下:

#figure out which class to run

if[ “$COMMAND” = “classpath” ] ; then

if $cygwin; then

CLASSPATH=`cygpath -p -w"$CLASSPATH"`

fi

echo $CLASSPATH

exit

elif[ “$COMMAND” = “namenode” ] ; then

CLASS=‘org.apache.hadoop.hdfs.server.namenode.NameNode’

HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_NAMENODE_OPTS”

elif[ “$COMMAND” = “secondarynamenode” ] ; then

CLASS=‘org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode’

HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_SECONDARYNAMENODE_OPTS”

elif[ “$COMMAND” = “datanode” ] ; then

CLASS=‘org.apache.hadoop.hdfs.server.datanode.DataNode’

if [ “$starting_secure_dn” =“true” ]; then

HADOOP_OPTS=“ H A D O O P _ O P T S − j v m s e r v e r HADOOP\_OPTS -jvm server HADOOP_OPTSjvmserverHADOOP_DATANODE_OPTS”

else

HADOOP_OPTS=“ H A D O O P _ O P T S − s e r v e r HADOOP\_OPTS -server HADOOP_OPTSserverHADOOP_DATANODE_OPTS”

fi

elif[ “$COMMAND” = “fs” ] ; then

CLASS=org.apache.hadoop.fs.FsShell

HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_CLIENT_OPTS”

elif[ “$COMMAND” = “dfs” ] ; then

CLASS=org.apache.hadoop.fs.FsShell

HADOOP_OPTS=“$HADOOP_OPTS $HADOOP_CLIENT_OPTS”

elif[ “$COMMAND” = “dfsadmin” ] ; then

CLASS=org.apache.hadoop.hdfs.tools.DFSAdmin

HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_CLIENT_OPTS”

elif[ “$COMMAND” = “mradmin” ] ; then

CLASS=org.apache.hadoop.mapred.tools.MRAdmin

HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_CLIENT_OPTS”

elif[ “$COMMAND” = “fsck” ] ; then

CLASS=org.apache.hadoop.hdfs.tools.DFSck

HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_CLIENT_OPTS”

elif[ “$COMMAND” = “balancer” ] ; then

CLASS=org.apache.hadoop.hdfs.server.balancer.Balancer

HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_BALANCER_OPTS”

elif[ “$COMMAND” = “fetchdt” ] ; then

CLASS=org.apache.hadoop.hdfs.tools.DelegationTokenFetcher

elif[ “$COMMAND” = “jobtracker” ] ; then

CLASS=org.apache.hadoop.mapred.JobTracker

HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_JOBTRACKER_OPTS”

elif[ “$COMMAND” = “historyserver” ] ; then

CLASS=org.apache.hadoop.mapred.JobHistoryServer

HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_JOB_HISTORYSERVER_OPTS”

elif[ “$COMMAND” = “tasktracker” ] ; then

CLASS=org.apache.hadoop.mapred.TaskTracker

HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_TASKTRACKER_OPTS”

elif[ “$COMMAND” = “job” ] ; then

CLASS=org.apache.hadoop.mapred.JobClient

HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_CLIENT_OPTS”

elif[ “$COMMAND” = “queue” ] ; then

CLASS=org.apache.hadoop.mapred.JobQueueClient

HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_CLIENT_OPTS”

elif[ “$COMMAND” = “pipes” ] ; then

CLASS=org.apache.hadoop.mapred.pipes.Submitter

HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_CLIENT_OPTS”

elif[ “$COMMAND” = “version” ] ; then

CLASS=org.apache.hadoop.util.VersionInfo

HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_CLIENT_OPTS”

elif[ “$COMMAND” = “jar” ] ; then

CLASS=org.apache.hadoop.util.RunJar

HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_CLIENT_OPTS”

elif[ “$COMMAND” = “distcp” ] ; then

CLASS=org.apache.hadoop.tools.DistCp

CLASSPATH= C L A S S P A T H : {CLASSPATH}: CLASSPATH:{TOOL_PATH}

HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_CLIENT_OPTS”

elif[ “$COMMAND” = “daemonlog” ] ; then

CLASS=org.apache.hadoop.log.LogLevel

HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_CLIENT_OPTS”

elif[ “$COMMAND” = “archive” ] ; then

CLASS=org.apache.hadoop.tools.HadoopArchives

CLASSPATH= C L A S S P A T H : {CLASSPATH}: CLASSPATH:{TOOL_PATH}

HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_CLIENT_OPTS”

elif[ “$COMMAND” = “sampler” ] ; then

CLASS=org.apache.hadoop.mapred.lib.InputSampler

HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_CLIENT_OPTS”

else

CLASS=$COMMAND

fi

具体可以执行那些命令从以上代码完全可以看出来,而且执行哪一个命令具体对应哪一个类都很有清楚的对应,让我们在分析某一个具体功能的代码的时候能够很块找到入口点。从上面代码最后第二行可以看出hadoop脚本也可以直接运行一个java的jar包或类,这样方便开发者测试自己开发的基于hadoop平台的程序,看样子小脚本能够学到大量知识。

8.如果是cygwin环境需要转换路径,代码如下:

#cygwin path translation

if$cygwin; then

CLASSPATH=`cygpath -p -w"$CLASSPATH"`

HADOOP_HOME=`cygpath -w"$HADOOP_HOME"`

HADOOP_LOG_DIR=`cygpath -w"$HADOOP_LOG_DIR"`

TOOL_PATH=`cygpath -p -w “$TOOL_PATH”`

fi

9.设置java执行需要的本地库路径JAVA_LIBRARY_PATH,具体代码如下:

#Determinethe JAVA_PLATFORM

JAVA_PLATFORM=`CLASSPATH= C L A S S P A T H {CLASSPATH} CLASSPATH{JAVA} -Xmx32m ${HADOOP_JAVA_PLATFORM_OPTS}org.apache.hadoop.util.PlatformName | sed -e “s/ /_/g”`

if [“$JAVA_PLATFORM” = “Linux-amd64-64” ]; then

JSVC_ARCH=“amd64”

else

JSVC_ARCH=“i386”

fi

#setup ‘java.library.path’ for native-hadoop code if necessary

JAVA_LIBRARY_PATH=‘’

if [-d “ H A D O O P _ H O M E / b u i l d / n a t i v e " − o − d " {HADOOP\_HOME}/build/native" -o -d" HADOOP_HOME/build/native"od"{HADOOP_HOME}/lib/native” -o -e “${HADOOP_PREFIX}/lib/libhadoop.a”]; then

if [ -d “$HADOOP_HOME/build/native”]; then

JAVA_LIBRARY_PATH= H A D O O P _ H O M E / b u i l d / n a t i v e / {HADOOP\_HOME}/build/native/ HADOOP_HOME/build/native/{JAVA_PLATFORM}/lib

fi

if [ -d “${HADOOP_HOME}/lib/native”]; then

if [ “x$JAVA_LIBRARY_PATH” !=“x” ]; then

JAVA_LIBRARY_PATH= J A V A _ L I B R A R Y _ P A T H : {JAVA\_LIBRARY\_PATH}: JAVA_LIBRARY_PATH:{HADOOP_HOME}/lib/native/${JAVA_PLATFORM}

else

JAVA_LIBRARY_PATH= H A D O O P _ H O M E / l i b / n a t i v e / {HADOOP\_HOME}/lib/native/ HADOOP_HOME/lib/native/{JAVA_PLATFORM}

fi

fi

if [ -e"${HADOOP_PREFIX}/lib/libhadoop.a" ]; then

JAVA_LIBRARY_PATH=${HADOOP_PREFIX}/lib

fi

fi

#cygwin path translation

if$cygwin; then

JAVA_LIBRARY_PATH=`cygpath -p"$JAVA_LIBRARY_PATH"`

fi

10. 设置hadoop可选项变量:HADOOP_OPTS;

HADOOP_OPTS=“ H A D O O P _ O P T S − D h a d o o p . l o g . d i r = HADOOP\_OPTS-Dhadoop.log.dir= HADOOP_OPTSDhadoop.log.dir=HADOOP_LOG_DIR”

HADOOP_OPTS=“ H A D O O P _ O P T S − D h a d o o p . l o g . f i l e = HADOOP\_OPTS-Dhadoop.log.file= HADOOP_OPTSDhadoop.log.file=HADOOP_LOGFILE”

HADOOP_OPTS=“ H A D O O P _ O P T S − D h a d o o p . h o m e . d i r = HADOOP\_OPTS-Dhadoop.home.dir= HADOOP_OPTSDhadoop.home.dir=HADOOP_HOME”

HADOOP_OPTS=“ H A D O O P _ O P T S − D h a d o o p . i d . s t r = HADOOP\_OPTS-Dhadoop.id.str= HADOOP_OPTSDhadoop.id.str=HADOOP_IDENT_STRING”

HADOOP_OPTS=“ H A D O O P _ O P T S − D h a d o o p . r o o t . l o g g e r = HADOOP\_OPTS-Dhadoop.root.logger= HADOOP_OPTSDhadoop.root.logger={HADOOP_ROOT_LOGGER:-INFO,console}”

11. 首先判断是运行节点的启动节点运行命令还是普通的客户端命令,然后根据相关条件设置运行的模式(有三种:jvsc、su和normal),最后一步就是根据上面确定的运行模式具体运行命令,只有datanode节点能够使用jsvc运行,如下代码所示:

#turnsecurity logger on the namenode and jobtracker only

if [$COMMAND = “namenode” ] || [ $COMMAND = “jobtracker” ];then

HADOOP_OPTS=“ H A D O O P _ O P T S − D h a d o o p . s e c u r i t y . l o g g e r = HADOOP\_OPTS-Dhadoop.security.logger= HADOOP_OPTSDhadoop.security.logger={HADOOP_SECURITY_LOGGER:-INFO,DRFAS}”

else

HADOOP_OPTS=“ H A D O O P _ O P T S − D h a d o o p . s e c u r i t y . l o g g e r = HADOOP\_OPTS-Dhadoop.security.logger= HADOOP_OPTSDhadoop.security.logger={HADOOP_SECURITY_LOGGER:-INFO,NullAppender}”

fi

if [“x$JAVA_LIBRARY_PATH” != “x” ]; then

HADOOP_OPTS=“ H A D O O P _ O P T S − D j a v a . l i b r a r y . p a t h = HADOOP\_OPTS-Djava.library.path= HADOOP_OPTSDjava.library.path=JAVA_LIBRARY_PATH”

fi

HADOOP_OPTS=“ H A D O O P _ O P T S − D h a d o o p . p o l i c y . f i l e = HADOOP\_OPTS-Dhadoop.policy.file= HADOOP_OPTSDhadoop.policy.file=HADOOP_POLICYFILE”

#Check to see if we should start a secure datanode

if [“$starting_secure_dn” = “true” ]; then

if [ “$HADOOP_PID_DIR” =“” ]; then

HADOOP_SECURE_DN_PID=“/tmp/hadoop_secure_dn.pid”

else

HADOOP_SECURE_DN_PID=“$HADOOP_PID_DIR/hadoop_secure_dn.pid”

fi

exec “ H A D O O P _ H O M E / l i b e x e c / j s v c . HADOOP\_HOME/libexec/jsvc. HADOOP_HOME/libexec/jsvc.{JSVC_ARCH}”-Dproc_ C O M M A N D − o u t f i l e " COMMAND -outfile " COMMANDoutfile"HADOOP_LOG_DIR/jsvc.out"

-errfile “$HADOOP_LOG_DIR/jsvc.err”

-pidfile “$HADOOP_SECURE_DN_PID”

-nodetach

-user “$HADOOP_SECURE_DN_USER”

-cp “$CLASSPATH”

$JAVA_HEAP_MAX $HADOOP_OPTS

org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter"$@"

else

# run it

exec “ J A V A " − D p r o c _ JAVA" -Dproc\_ JAVA"Dproc_COMMAND$JAVA_HEAP_MAX H A D O O P _ O P T S − c l a s s p a t h " HADOOP\_OPTS -classpath " HADOOP_OPTSclasspath"CLASSPATH” C L A S S " CLASS" CLASS"@"

fi

到此为止所有脚本执行完毕,剩余就是不能识别模式的错误处理和提示。在执行具体命令的时候可能涉及到用户名的检测,例如su可以指定一个用户名来运行,如果不指定就按照linux上的用户名来运行。

8.hadoop-config.sh和hadoop-env.sh

这两个脚本基本上在上面分析的所有脚本都涉及到,他们的主要功能就是根据命令行参数来设置一些配置文件的路径和环境变量的值,都是一些公共设置,所以在执行每个脚本的时候都设置一遍。具体的代码就不详细分析了!

hadoop-config.sh

# included in all the hadoop scripts with source command
# should not be executable directly
# also should not be passed any arguments, since we need original $*

# resolve links - $0 may be a softlink

this="${BASH_SOURCE-$0}"
common_bin=$(cd -P -- "$(dirname -- "$this")" && pwd -P)
script="$(basename -- "$this")"
this="$common_bin/$script"

# convert relative path to absolute path
config_bin=`dirname "$this"`
script=`basename "$this"`
config_bin=`cd "$config_bin"; pwd`
this="$config_bin/$script"

# the root of the Hadoop installation
export HADOOP_PREFIX=`dirname "$this"`/..

#check to see if the conf dir is given as an optional argument
if [ $# -gt 1 ]
then
    if [ "--config" = "$1" ]
	  then
	      shift
	      confdir=$1
	      shift
	      HADOOP_CONF_DIR=$confdir
    fi
fi
 
# Allow alternate conf dir location.
if [ -e "${HADOOP_PREFIX}/conf/hadoop-env.sh" ]; then
  DEFAULT_CONF_DIR="conf"
else
  DEFAULT_CONF_DIR="etc/hadoop"
fi
HADOOP_CONF_DIR="${HADOOP_CONF_DIR:-$HADOOP_PREFIX/$DEFAULT_CONF_DIR}"

#check to see it is specified whether to use the slaves or the
# masters file
if [ $# -gt 1 ]
then
    if [ "--hosts" = "$1" ]
    then
        shift
        slavesfile=$1
        shift
        export HADOOP_SLAVES="${HADOOP_CONF_DIR}/$slavesfile"
    fi
fi

if [ -f "${HADOOP_CONF_DIR}/hadoop-env.sh" ]; then
  . "${HADOOP_CONF_DIR}/hadoop-env.sh"
fi

if [ "$HADOOP_HOME_WARN_SUPPRESS" = "" ] && [ "$HADOOP_HOME" != "" ]; then
  echo "Warning: $HADOOP_HOME is deprecated." 1>&2
  echo 1>&2
fi

export HADOOP_HOME=${HADOOP_PREFIX}
export HADOOP_HOME_WARN_SUPPRESS=1
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64

hadoop-env.sh

# Set Hadoop-specific environment variables here.

# The only required environment variable is JAVA_HOME.  All others are
# optional.  When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.

# The java implementation to use.  Required.
# export JAVA_HOME=/usr/lib/j2sdk1.5-sun

# Extra Java CLASSPATH elements.  Optional.
# export HADOOP_CLASSPATH=

# The maximum amount of heap to use, in MB. Default is 1000.
# export HADOOP_HEAPSIZE=2000

# Extra Java runtime options.  Empty by default.
# export HADOOP_OPTS=-server

# Command specific options appended to HADOOP_OPTS when specified
export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS"
export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_SECONDARYNAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS"
export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANCER_OPTS"
export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS"
# export HADOOP_TASKTRACKER_OPTS=
# The following applies to multiple commands (fs, dfs, fsck, distcp etc)
# export HADOOP_CLIENT_OPTS

# Extra ssh options.  Empty by default.
# export HADOOP_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR"

# Where log files are stored.  $HADOOP_HOME/logs by default.
# export HADOOP_LOG_DIR=${HADOOP_HOME}/logs

# File naming remote slave hosts.  $HADOOP_HOME/conf/slaves by default.
# export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves

# host:path where hadoop code should be rsync'd from.  Unset by default.
# export HADOOP_MASTER=master:/home/$USER/src/hadoop

# Seconds to sleep between slave commands.  Unset by default.  This
# can be useful in large clusters, where, e.g., slave rsyncs can
# otherwise arrive faster than the master can service them.
# export HADOOP_SLAVE_SLEEP=0.1

# The directory where pid files are stored. /tmp by default.
# export HADOOP_PID_DIR=/var/hadoop/pids

# A string representing this instance of hadoop. $USER by default.
# export HADOOP_IDENT_STRING=$USER

# The scheduling priority for daemon processes.  See 'man nice'.
# export HADOOP_NICENESS=10 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54

9.总结

这个启动脚本还是比较复杂,从这个启动脚本我学习到很多知识,第一就是学到很多有关于shell编程的知识,里面很多shell编程的技巧值得学习和借鉴;第二,通过整个启动过程的了解,知道了运行hadoop需要设置的很多东西,包括我们在配置文件中配置的一些选项是怎么起作用的、设置了哪些classpath路径等,第三,详细了解了所有能够通过hadoop执行的命令。还有其他许多收获竟在不言中。

说明:

下面查看方法是在bin/hadoop文件的最后exec上加如下语句:

echo “ J A V A − D p r o c _ JAVA-Dproc\_ JAVADproc_COMMAND $JAVA_HEAP_MAX $HADOOP_OPTS -classpath C L A S S P A T H CLASSPATH CLASSPATHCLASS $@”

exec" J A V A " − D p r o c _ JAVA" -Dproc\_ JAVA"Dproc_COMMAND $JAVA_HEAP_MAX H A D O O P _ O P T S − c l a s s p a t h " HADOOP\_OPTS -classpath" HADOOP_OPTSclasspath"CLASSPATH" C L A S S " CLASS " CLASS"@"

shell脚本最终执行的语句示例:

hdfs启动和停止:

root@ubuntu:/_nosql/hadoop#bin/start-dfs.sh

starting namenode, logging to/_nosql/hadoop/libexec/…/logs/hadoop-root-namenode-ubuntu.out

/soft/java/jdk1.6.0_13/bin/java-Dproc_namenode -Xmx1000m-Dcom.sun.management.jmxremote-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote-Dcom.sun.management.jmxremote-Dhadoop.log.dir=/_nosql/hadoop/libexec/…/logs-Dhadoop.log.file=hadoop-root-namenode-ubuntu.log -Dhadoop.home.dir=/_nosql/hadoop/libexec/…-Dhadoop.id.str=root -Dhadoop.root.logger=DEBUG,DRFA-Dhadoop.security.logger=INFO,DRFAS-Djava.library.path=/_nosql/hadoop/libexec/…/lib/native/Linux-i386-32-Dhadoop.policy.file=hadoop-policy.xml -classpath/_nosql/hadoop/libexec/…/conf:/soft/java/jdk1.6.0_13/lib/tools.jar:/_nosql/hadoop/libexec/…声明:本文内容由网友自发贡献,转载请注明出处:【wpsshop博客】

推荐阅读
相关标签