赞
踩
此脚本很简单,就是根据运行此脚本的目录进入安装hadoop目录下的bin目录,然后运行启动hdfs和mapred的启动脚本。hadoop默认优先加载libexec/hadoop-config.sh。libexec/hadoop-config.sh与bin/hadoop-config.sh初始内容一样。
# Start all hadoop daemons. Run this on master node.
bin=`dirname "$0"`
bin=`cd "$bin"; pwd`
if [ -e "$bin/../libexec/hadoop-config.sh" ]; then
. "$bin"/../libexec/hadoop-config.sh
else
. "$bin/hadoop-config.sh"
fi
# start dfs daemons
"$bin"/start-dfs.sh --config $HADOOP_CONF_DIR
# start mapred daemons
"$bin"/start-mapred.sh --config $HADOOP_CONF_DIR
# Start hadoop dfs daemons.
# Optinally upgrade or rollback dfs state.
# Run this on master node.
usage="Usage: start-dfs.sh [-upgrade|-rollback]"
bin=`dirname "$0"`
bin=`cd "$bin"; pwd`
if [ -e "$bin/../libexec/hadoop-config.sh" ]; then
. "$bin"/../libexec/hadoop-config.sh
else
. "$bin/hadoop-config.sh"
fi
# get arguments
if [ $# -ge 1 ]; then
nameStartOpt=$1
shift
case $nameStartOpt in
(-upgrade)
;;
(-rollback)
dataStartOpt=$nameStartOpt
;;
(*)
echo $usage
exit 1
;;
esac
fi
# start dfs daemons
# start namenode after datanodes, to minimize time namenode is up w/o data
# note: datanodes will log connection errors until namenode starts
"$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR start namenode $nameStartOpt
"$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR start datanode $dataStartOpt
"$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR --hosts masters start secondarynamenode
此脚本首先检查是否带有参数,从以上代码可以看出此脚本只支持upgrade和rollback两个选项参数,一个参数用于更新文件系统,另一个是回滚文件系统,然后就开始启动namenode、datanode和secondarynamenode节点。
代码中的$HADOOP_CONF_DIR是在另一个脚本中设置的,这个脚本是hadoop-config.sh,后面会详细介绍,因为这个脚本在每一个启动脚本执行中都先执行,目的是为了检查和设置一些环境变量,例如JAVA_HOME和HADOOP_HOME等,而这个脚本又会执行hadoop-env.sh脚本来设置用户配置的相关环境变量,后面详细介绍这两个脚本。
从上面的脚本代码可以看出在启动namenode节点是在hadoop-daemon.sh脚本中启动,下面一节将分析这个脚本。而datanode和secondarynamenode节点的启动也会通过hadoop-daemon.sh脚本来执行。后面也将分析这个脚本的运行情况。
hadoop-daemon.sh用于启动namenode,脚本如下:
# Runs a Hadoop command as a daemon.
#
# Environment Variables
#
# HADOOP_CONF_DIR Alternate conf dir. Default is ${HADOOP_PREFIX}/conf.
# HADOOP_LOG_DIR Where log files are stored. PWD by default.
# HADOOP_MASTER host:path where hadoop code should be rsync'd from
# HADOOP_PID_DIR The pid files are stored. /tmp by default.
# HADOOP_IDENT_STRING A string representing this instance of hadoop. $USER by default
# HADOOP_NICENESS The scheduling priority for daemons. Defaults to 0.
##
usage="Usage: hadoop-daemon.sh [--config <conf-dir>] [--hosts hostlistfile] (start|stop) <hadoop-command> <args...>"
# if no args specified, show usage
if [ $# -le 1 ]; then
echo $usage
exit 1
fi
bin=`dirname "$0"`
bin=`cd "$bin"; pwd`
if [ -e "$bin/../libexec/hadoop-config.sh" ]; then
. "$bin"/../libexec/hadoop-config.sh
else
. "$bin/hadoop-config.sh"
fi
# get arguments
startStop=$1
shift
command=$1
shift
##将out日志文件进行备份:将out4->out5,out3->out4,out2-out3,out1->out2,out->out1
hadoop_rotate_log ()
{
log=$1;
num=5;
if [ -n "$2" ]; then
num=$2
fi
if [ -f "$log" ]; then # rotate logs
while [ $num -gt 1 ]; do
prev=`expr $num - 1`
[ -f "$log.$prev" ] && mv "$log.$prev" "$log.$num"
num=$prev
done
mv "$log" "$log.$num";
fi
}
if [ -f "${HADOOP_CONF_DIR}/hadoop-env.sh" ]; then
. "${HADOOP_CONF_DIR}/hadoop-env.sh"
fi
# Determine if we're starting a secure datanode, and if so, redefine appropriate variables
if [ "$command" == "datanode" ] && [ "$EUID" -eq 0 ] && [ -n "$HADOOP_SECURE_DN_USER" ]; then
export HADOOP_PID_DIR=$HADOOP_SECURE_DN_PID_DIR
export HADOOP_LOG_DIR=$HADOOP_SECURE_DN_LOG_DIR
export HADOOP_IDENT_STRING=$HADOOP_SECURE_DN_USER
fi
if [ "$HADOOP_IDENT_STRING" = "" ]; then
export HADOOP_IDENT_STRING="$USER"
fi
# get log directory
if [ "$HADOOP_LOG_DIR" = "" ]; then
export HADOOP_LOG_DIR="$HADOOP_HOME/logs"
fi
mkdir -p "$HADOOP_LOG_DIR"
touch $HADOOP_LOG_DIR/.hadoop_test > /dev/null 2>&1
TEST_LOG_DIR=$?
if [ "${TEST_LOG_DIR}" = "0" ]; then
rm -f $HADOOP_LOG_DIR/.hadoop_test
else
chown $HADOOP_IDENT_STRING $HADOOP_LOG_DIR
fi
if [ "$HADOOP_PID_DIR" = "" ]; then
HADOOP_PID_DIR=/tmp
fi
# some variables
export HADOOP_LOGFILE=hadoop-$HADOOP_IDENT_STRING-$command-$HOSTNAME.log
export HADOOP_ROOT_LOGGER="INFO,DRFA"
##log=/_nosql/hadoop/logs/hadoop-root-namenode-ubuntu.out
##pid=/tmp/hadoop-root-namenode.pid
log=$HADOOP_LOG_DIR/hadoop-$HADOOP_IDENT_STRING-$command-$HOSTNAME.out
pid=$HADOOP_PID_DIR/hadoop-$HADOOP_IDENT_STRING-$command.pid
# Set default scheduling priority
if [ "$HADOOP_NICENESS" = "" ]; then
export HADOOP_NICENESS=0
fi
case $startStop in
(start)
##创建pid文件
mkdir -p "$HADOOP_PID_DIR"
if [ -f $pid ]; then
if kill -0 `cat $pid` > /dev/null 2>&1; then
echo $command running as process `cat $pid`. Stop it first.
exit 1
fi
fi
if [ "$HADOOP_MASTER" != "" ]; then
echo rsync from $HADOOP_MASTER
rsync -a -e ssh --delete --exclude=.svn --exclude='logs/*' --exclude='contrib/hod/logs/*' $HADOOP_MASTER/ "$HADOOP_HOME"
fi
##out日志备份
hadoop_rotate_log $log
echo starting $command, logging to $log
cd "$HADOOP_PREFIX"
nohup nice -n $HADOOP_NICENESS "$HADOOP_PREFIX"/bin/hadoop --config $HADOOP_CONF_DIR $command "$@" > "$log" 2>&1 < /dev/null &
##将上一语句运行之进程ID号写入pid文件
echo $! > $pid
sleep 1; head "$log"
;;
(stop)
if [ -f $pid ]; then
if kill -0 `cat $pid` > /dev/null 2>&1; then
echo stopping $command
kill `cat $pid`
else
echo no $command to stop
fi
else
echo no $command to stop
fi
;;
(*)
echo $usage
exit 1
;;
esac
在具体介绍这个脚本以前先介绍几个环境变量的意义(在这个脚本的注释部分有介绍):
HADOOP_CONF_DIR选择配置文件目录。默认是${HADOOP_HOME}/conf。HADOOP_LOG_DIR 存放日志文件的目录。默认是 PWD命令产生的目录(hadoop根目录下)HADOOP_MASTER host:path where hadoopcode should be rsync’d from
HADOOP_PID_DIR The pid files arestored. /tmp by default.
HADOOP_IDENT_STRING A string representing this instance of hadoop. $USER bydefaultHADOOP_NICENESS The scheduling priority for daemons. Defaults to 0.
这个脚本首先判断所带的参数是否小于1,如果小于就打印使用此脚本的使用帮助,shell代码如下:
usage=“Usage:hadoop-daemon.sh [–config ] [–hosts hostlistfile](start|stop) <args…>”
if [ $# -le 1 ]; then
echo $usage
exit 1
fi
然后同其他脚本一样执行hadoop-config.sh脚本检查和设置相关环境变量。对于此脚本,hadoop-config.sh脚本的作用就是把配置文件和主机列表的文件处理好了并设置相应的环境变量保存它们。
接着保存启动还是停止的命令和相关参数,如下(注意:shift的shell脚本的作用就是将shell脚本所带参数向前移动一个):
startStop=$1
shift
command=$1
shift
继续就是定义一个用于滚动日志的函数:
##将out日志文件进行备份:将out4->out5,out3->out4,out2-out3,out1->out2,out->out1
hadoop_rotate_log()
{
log=$1;
num=5;
if [ -n “$2” ]; then
num=$2
fi
if [ -f “$log” ]; then # rotatelogs
while [ $num -gt 1 ]; do
prev=`expr $num - 1`
[ -f “ l o g . log. log.prev” ] && mv “ l o g . log. log.prev”“ l o g . log. log.num”
num=$prev
done
mv “ l o g " " log"" log""log.$num”;
fi
}
后面是一些根据配置文件中的配置选项来设置前面提到的环境变量,这些环境变量会用于具体启动namenode,例如有调度优先级的环境变量等:
if[ -f “${HADOOP_CONF_DIR}/hadoop-env.sh” ]; then
.“${HADOOP_CONF_DIR}/hadoop-env.sh”
fi
#Determine if we’re starting a secure datanode, and if so, redefine appropriatevariables
if[ “KaTeX parse error: Can't use function '\]' in math mode at position 24: … == "datanode" \̲]̲ && \[ "EUID”-eq 0 ] && [ -n “$HADOOP_SECURE_DN_USER” ]; then
exportHADOOP_PID_DIR=$HADOOP_SECURE_DN_PID_DIR
exportHADOOP_LOG_DIR=$HADOOP_SECURE_DN_LOG_DIR
exportHADOOP_IDENT_STRING=$HADOOP_SECURE_DN_USER
fi
## echo $USER 输出为当前登录用户名root
if[ “$HADOOP_IDENT_STRING” = “” ]; then
export HADOOP_IDENT_STRING=“$USER”
fi
#get log directory
if[ “$HADOOP_LOG_DIR” = “” ]; then
export HADOOP_LOG_DIR=“$HADOOP_HOME/logs”
fi
mkdir-p “$HADOOP_LOG_DIR”
touch$HADOOP_LOG_DIR/.hadoop_test > /dev/null 2>&1
## $最近一次命令退出状态
TEST_LOG_DIR=$
if[ “${TEST_LOG_DIR}” = “0” ]; then
rm -f $HADOOP_LOG_DIR/.hadoop_test
else
chown $HADOOP_IDENT_STRING $HADOOP_LOG_DIR
fi
## 进程号保存文件所在目录
if[ “$HADOOP_PID_DIR” = “” ]; then
HADOOP_PID_DIR=/tmp
fi
#some variables
exportHADOOP_LOGFILE=hadoop- H A D O O P _ I D E N T _ S T R I N G − HADOOP\_IDENT\_STRING- HADOOP_IDENT_STRING−command-$HOSTNAME.log
## 用于设置客户端日志级别
exportHADOOP_ROOT_LOGGER=“INFO,DRFA”
##log=/_nosql/hadoop/logs/hadoop-root-namenode-ubuntu.out
## pid=/tmp/hadoop-root-namenode.pid
log= H A D O O P _ L O G _ D I R / h a d o o p − HADOOP\_LOG\_DIR/hadoop- HADOOP_LOG_DIR/hadoop−HADOOP_IDENT_STRING- c o m m a n d − command- command−HOSTNAME.out
pid= H A D O O P _ P I D _ D I R / h a d o o p − HADOOP\_PID\_DIR/hadoop- HADOOP_PID_DIR/hadoop−HADOOP_IDENT_STRING-$command.pid
#Set default scheduling priority
if[ “$HADOOP_NICENESS” = “” ]; then
export HADOOP_NICENESS=0
fi
最后就是根据命令就是控制namenode的启停(start或stop)了,具体代码如下:
case $startStop in
(start)
mkdir -p “$HADOOP_PID_DIR”
if [ -f $pid ]; then
if kill -0 `cat $pid` > /dev/null 2>&1; then
echo c o m m a n d r u n n i n g a s p r o c e s s c ˋ a t command running as process \`cat commandrunningasprocesscˋatpid`. Stop it first.
exit 1
fi
fi
if [ “$HADOOP_MASTER” != “”]; then
echo rsync from $HADOOP_MASTER
## rsync是类unix系统下的数据镜像备份工具,从软件的命名上就可以看出来了——remote sync。
rsync -a -e ssh --delete --exclude=.svn–exclude=‘logs/*’ --exclude=‘contrib/hod/logs/*’ H A D O O P _ M A S T E R / " HADOOP\_MASTER/" HADOOP_MASTER/"HADOOP_HOME"
fi
## 日志文件备份
hadoop_rotate_log $log
echo starting $command, logging to $log
cd “$HADOOP_PREFIX”
nohup nice -n H A D O O P _ N I C E N E S S " HADOOP\_NICENESS " HADOOP_NICENESS"HADOOP_PREFIX"/bin/hadoop–config $HADOOP_CONF_DIR c o m m a n d " command " command"@" > "$log"2>&1 < /dev/null &
## $!最后一次后台进程的ID号
echo$! > $pid
sleep 1; head “$log”
;;
(stop)
if [ -f $pid ]; then
if kill -0 `cat $pid` > /dev/null2>&1; then
echo stopping $command
kill `cat $pid`
else
echo no $command to stop
fi
else
echo no $command to stop
fi
;;
(*)
echo$usage
exit 1
;;
esac
如果是start就是启动namenode的命令,那么首先创建存放pid文件的目录,如果存放pid的文件已经存在说明已经有namenode节点已经在运行了,那么就先停止在启动。然后根据日志滚动函数生成日志文件,最后就用nice根据调度优先级启动namenode,但是最终的启动还在另一个脚本hadoop,这个脚本是启动所有节点的终极脚本,它会选择一个带有main函数的类用java启动,这样才到达真正的启动java守护进程的效果,这个脚本是启动的重点,也是我们分析hadoop源码的入口处,所以后面章节重点分析。
如果是stop命令就执行简单的停止命令(kill pid),其他都是错误的,打印提示使用此脚本的文档。
hadoop-daemons.sh用于启动datanode和secondarynamenode,脚本如下:
# Run a Hadoop command on all slave hosts.
usage="Usage: hadoop-daemons.sh [--config confdir] [--hosts hostlistfile] [start|stop] command args..."
# if no args specified, show usage
if [ $# -le 1 ]; then
echo $usage
exit 1
fi
bin=`dirname "$0"`
bin=`cd "$bin"; pwd`
if [ -e "$bin/../libexec/hadoop-config.sh" ]; then
. "$bin"/../libexec/hadoop-config.sh
else
. "$bin/hadoop-config.sh"
fi
exec "$bin/slaves.sh" --config $HADOOP_CONF_DIR cd "$HADOOP_HOME" ; "$bin/hadoop-daemon.sh" --config $HADOOP_CONF_DIR "$@"
这个脚本简单,因为他最后也是通过上一节介绍的脚本hadoop-daemon.sh来启动的,只是在这之前做了一些特殊处理,就是执行另一个脚本slaves.sh,代码如下:
exec" b i n / s l a v e s . s h " − − c o n f i g bin/slaves.sh" --config bin/slaves.sh"−−configHADOOP_CONF_DIR cd" H A D O O P _ H O M E " ; " HADOOP\_HOME" ;" HADOOP_HOME";"bin/hadoop-daemon.sh"–config H A D O O P _ C O N F _ D I R " HADOOP\_CONF\_DIR" HADOOP_CONF_DIR"@"
slaves.sh脚本的主要功能就是通过ssh在所有的从节点上运行启动从节点的启动脚本,就是上面代码中的最后两条命令,进入hadoop的目录运行bin目录下的hadoop-daemon.sh脚本。执行这个功能的代码如下:
# Run a shell command on all slave hosts.
#
# Environment Variables
#
# HADOOP_SLAVES File naming remote hosts.
# Default is ${HADOOP_CONF_DIR}/slaves.
# HADOOP_CONF_DIR Alternate conf dir. Default is ${HADOOP_HOME}/conf.
# HADOOP_SLAVE_SLEEP Seconds to sleep between spawning remote commands.
# HADOOP_SSH_OPTS Options passed to ssh when running remote commands.
##
usage="Usage: slaves.sh [--config confdir] command..."
# if no args specified, show usage
if [ $# -le 0 ]; then
echo $usage
exit 1
fi
bin=`dirname "$0"`
bin=`cd "$bin"; pwd`
if [ -e "$bin/../libexec/hadoop-config.sh" ]; then
. "$bin"/../libexec/hadoop-config.sh
else
. "$bin/hadoop-config.sh"
fi
# If the slaves file is specified in the command line,
# then it takes precedence over the definition in
# hadoop-env.sh. Save it here.
HOSTLIST=$HADOOP_SLAVES
if [ -f "${HADOOP_CONF_DIR}/hadoop-env.sh" ]; then
. "${HADOOP_CONF_DIR}/hadoop-env.sh"
fi
if [ "$HOSTLIST" = "" ]; then
if [ "$HADOOP_SLAVES" = "" ]; then
export HOSTLIST="${HADOOP_CONF_DIR}/slaves"
else
export HOSTLIST="${HADOOP_SLAVES}"
fi
fi
for slave in `cat "$HOSTLIST"|sed "s/#.*$//;/^$/d"`; do
ssh $HADOOP_SSH_OPTS $slave $"${@// /\ }"
2>&1 | sed “s/^/KaTeX parse error: Expected 'EOF', got '&' at position 11: slave: /" &̲ if [ "HADOOP_SLAVE_SLEEP” != “” ]; then
sleep $HADOOP_SLAVE_SLEEP
fi
done
wait
以上代码首先找到所有从节点的主机名称(在slaves文件中,或者配置文件中配置有),然后通过for循环依次通过ssh远程后台运行启动脚本程序,最后等待程序完成才退出此shell脚本。
ssh $HADOOP_SSH_OPTS $slave " " "{@// /\ }"
2>&1 |sed “s/^/$slave: /” &
上面语句示例如下:
ssh localhost cd /_nosql/hadoop/libexec/… ;/_nosql/hadoop/bin/hadoop-daemon.sh --config /_nosql/hadoop/libexec/…/confstart datanode
因此这个脚本主要完成的功能就是在所有从节点执行启动相应节点的脚本。这个脚本执行datanode是从slaves文件中找到datanode节点,执行secondarynamenode是在master文件找到节点主机(在start-dfs.sh脚本中用-hostsmaster指定的,不然默认会找到slaves文件,datanode就是按默认找到的)。
# Start hadoop map reduce daemons. Run this on master node.
bin=`dirname "$0"`
bin=`cd "$bin"; pwd`
if [ -e "$bin/../libexec/hadoop-config.sh" ]; then
. "$bin"/../libexec/hadoop-config.sh
else
. "$bin/hadoop-config.sh"
fi
# start mapred daemons
# start jobtracker first to minimize connection errors at startup
"$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR start jobtracker
"$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR start tasktracker
这个脚本就两句重要代码,就是分别启动jobtracker和tasktracker节点,其他的环境变量还是通过相应的脚本照样设置,如下:
" b i n " / h a d o o p − d a e m o n . s h − − c o n f i g bin"/hadoop-daemon.sh --config bin"/hadoop−daemon.sh−−configHADOOP_CONF_DIR start jobtracker" b i n " / h a d o o p − d a e m o n s . s h − − c o n f i g bin"/hadoop-daemons.sh --config bin"/hadoop−daemons.sh−−configHADOOP_CONF_DIR start tasktracker
从代码可以看出还是通过上一节相同的方式来启动,具体就不在分析了,请看前一节。
脚本太长就不再贴了。
这个脚本才是重点,前面的脚本执行都是为这个脚本执行做铺垫的,这个脚本的功能也是相当强大,不仅仅可以启动各个节点的服务,还能够执行很多命令和工具。它会根据传入的参数来决定执行什么样的功能(包括启动各个节点服务),下面详细介绍这个脚本的执行流程和功能。
1. 切换到bin目录下运行脚本hadoop-config.sh,代码如下:
bin=`dirname"$0"`
bin=`cd"$bin"; pwd`
if[ -e “$bin”/…/libexec/hadoop-config.sh ]; then
.“$bin”/…/libexec/hadoop-config.sh
else
. “$bin”/hadoop-config.sh
fi
2.得到hadoop运行实例的名称和检测运行hadoop的环境是否是windows下的linux模拟环境cygwin,代码如下:
cygwin=false
case"`uname`" in
CYGWIN*)cygwin=true;;
esac
3.判断参数个数是否为0个,是的话打印脚本使用方式并退出,否则就获得具体命令,获得命令的代码如下:
#if no args specified, show usage
if[ $# = 0 ]; then
echo “Usage: hadoop [–config confdir]COMMAND”
echo “where COMMAND is one of:”
echo “namenode -format format theDFS filesystem”
echo “secondarynamenode run the DFSsecondary namenode”
echo “namenode run the DFSnamenode”
echo “datanode run a DFSdatanode”
echo “dfsadmin run a DFSadmin client”
echo “mradmin run aMap-Reduce admin client”
echo “fsck run a DFS filesystem checkingutility”
echo “fs run a genericfilesystem user client”
echo “balancer run a clusterbalancing utility”
echo “fetchdt fetch adelegation token from the NameNode”
echo “jobtracker run theMapReduce job Tracker node”
echo “pipes run a Pipesjob”
echo “tasktracker run aMapReduce task Tracker node”
echo “historyserver run jobhistory servers as a standalone daemon”
echo “job manipulateMapReduce jobs”
echo “queue getinformation regarding JobQueues”
echo “version print theversion”
echo “jar run ajar file”
echo “distcp copy file or directoriesrecursively”
echo “archive -archiveName NAME -p * create a hadoop archive”
echo “classpath prints theclass path needed to get the”
echo " Hadoop jar and therequired libraries"
echo “daemonlog get/set thelog level for each daemon”
echo " or"
echo “CLASSNAME run the classnamed CLASSNAME”
echo “Most commands print help wheninvoked w/o parameters.”
exit 1
fi
#get arguments
COMMAND=$1
shift
4.Determine if we’restarting a secure datanode, and if so, redefine appropriate variables
#Determine if we’re starting a secure datanode, and if so, redefine appropriatevariables
if[ “KaTeX parse error: Can't use function '\]' in math mode at position 24: … == "datanode" \̲]̲ && \[ "EUID”-eq 0 ] && [ -n “$HADOOP_SECURE_DN_USER” ]; then
HADOOP_PID_DIR=$HADOOP_SECURE_DN_PID_DIR
HADOOP_LOG_DIR=$HADOOP_SECURE_DN_LOG_DIR
HADOOP_IDENT_STRING=$HADOOP_SECURE_DN_USER
starting_secure_dn=“true”
fi
5.设置java执行的相关参数,例如JAVA_HOME变量、运行jvm的最大堆空间等,代码如下:
#some Java parameters
if[ “$JAVA_HOME” != “” ]; then
#echo “run java in $JAVA_HOME”
JAVA_HOME=$JAVA_HOME
fi
if[ “$JAVA_HOME” = “” ]; then
echo “Error: JAVA_HOME is not set.”
exit 1
fi
JAVA=$JAVA_HOME/bin/java
JAVA_HEAP_MAX=-Xmx1000m
#check envvars which might override default args
if[ “$HADOOP_HEAPSIZE” != “” ]; then
#echo “run with heapsize$HADOOP_HEAPSIZE”
JAVA_HEAP_MAX=“-Xmx”“$HADOOP_HEAPSIZE”“m”
#echo $JAVA_HEAP_MAX
fi
6.设置CLASSPATH,这一步很重要,因为不设置的话很多类可能找不到,具体设置了那些路径到CLASSPATH看下面的具体代码:
# CLASSPATH initially contains $HADOOP_CONF_DIR
CLASSPATH=“${HADOOP_CONF_DIR}”
if[ “KaTeX parse error: Can't use function '\]' in math mode at position 39: …\_FIRST" != "" \̲]̲ && \["HADOOP_CLASSPATH” != “” ] ; then
CLASSPATH= C L A S S P A T H : {CLASSPATH}: CLASSPATH:{HADOOP_CLASSPATH}
fi
CLASSPATH= C L A S S P A T H : {CLASSPATH}: CLASSPATH:JAVA_HOME/lib/tools.jar
#for developers, add Hadoop classes to CLASSPATH
if[ -d “$HADOOP_HOME/build/classes” ]; then
CLASSPATH= C L A S S P A T H : {CLASSPATH}: CLASSPATH:HADOOP_HOME/build/classes
fi
if[ -d “$HADOOP_HOME/build/webapps” ]; then
CLASSPATH= C L A S S P A T H : {CLASSPATH}: CLASSPATH:HADOOP_HOME/build
fi
if[ -d “$HADOOP_HOME/build/test/classes” ]; then
CLASSPATH= C L A S S P A T H : {CLASSPATH}: CLASSPATH:HADOOP_HOME/build/test/classes
fi
if[ -d “$HADOOP_HOME/build/tools” ]; then
CLASSPATH= C L A S S P A T H : {CLASSPATH}: CLASSPATH:HADOOP_HOME/build/tools
fi
#so that filenames w/ spaces are handled correctly in loops below
IFS=
#for releases, add core hadoop jar & webapps to CLASSPATH
if[ -e $HADOOP_PREFIX/share/hadoop/hadoop-core-* ]; then
# binary layout
if [ -d “$HADOOP_PREFIX/share/hadoop/webapps”]; then
CLASSPATH= C L A S S P A T H : {CLASSPATH}: CLASSPATH:HADOOP_PREFIX/share/hadoop
fi
for f in$HADOOP_PREFIX/share/hadoop/hadoop-core-*.jar; do
CLASSPATH= C L A S S P A T H : {CLASSPATH}: CLASSPATH:f;
done
# add libs to CLASSPATH
for f in $HADOOP_PREFIX/share/hadoop/lib/*.jar;do
CLASSPATH= C L A S S P A T H : {CLASSPATH}: CLASSPATH:f;
done
for f in$HADOOP_PREFIX/share/hadoop/lib/jsp-2.1/*.jar; do
CLASSPATH= C L A S S P A T H : {CLASSPATH}: CLASSPATH:f;
done
for f in$HADOOP_PREFIX/share/hadoop/hadoop-tools-*.jar; do
TOOL_PATH= T O O L _ P A T H : {TOOL\_PATH}: TOOL_PATH:f;
done
else
# tarball layout
if [ -d “$HADOOP_HOME/webapps” ];then
CLASSPATH= C L A S S P A T H : {CLASSPATH}: CLASSPATH:HADOOP_HOME
fi
for f in $HADOOP_HOME/hadoop-core-*.jar; do
CLASSPATH= C L A S S P A T H : {CLASSPATH}: CLASSPATH:f;
done
# add libs to CLASSPATH
for f in $HADOOP_HOME/lib/*.jar; do
CLASSPATH= C L A S S P A T H : {CLASSPATH}: CLASSPATH:f;
done
if [ -d"$HADOOP_HOME/build/ivy/lib/Hadoop/common" ]; then
for f in$HADOOP_HOME/build/ivy/lib/Hadoop/common/*.jar; do
CLASSPATH= C L A S S P A T H : {CLASSPATH}: CLASSPATH:f;
done
fi
for f in $HADOOP_HOME/lib/jsp-2.1/*.jar; do
CLASSPATH= C L A S S P A T H : {CLASSPATH}: CLASSPATH:f;
done
for f in $HADOOP_HOME/hadoop-tools-*.jar; do
TOOL_PATH= T O O L _ P A T H : {TOOL\_PATH}: TOOL_PATH:f;
done
for f in$HADOOP_HOME/build/hadoop-tools-*.jar; do
TOOL_PATH= T O O L _ P A T H : {TOOL\_PATH}: TOOL_PATH:f;
done
fi
#add user-specified CLASSPATH last
if[ “KaTeX parse error: Can't use function '\]' in math mode at position 38: …H\_FIRST" = "" \̲]̲ && \["HADOOP_CLASSPATH” != “” ]; then
CLASSPATH= C L A S S P A T H : {CLASSPATH}: CLASSPATH:{HADOOP_CLASSPATH}
fi
#default log directory & file
if[ “$HADOOP_LOG_DIR” = “” ]; then
HADOOP_LOG_DIR=“$HADOOP_HOME/logs”
fi
if[ “$HADOOP_LOGFILE” = “” ]; then
HADOOP_LOGFILE=‘hadoop.log’
fi
#default policy file for service-level authorization
if[ “$HADOOP_POLICYFILE” = “” ]; then
HADOOP_POLICYFILE=“hadoop-policy.xml”
fi
#restore ordinary behaviour
unsetIFS
上面代码省略很大一部分,具体还有那些可以看具体的hadoop脚本。
7.根据3保存的命令选择对应的启动java类,如下:
#figure out which class to run
if[ “$COMMAND” = “classpath” ] ; then
if $cygwin; then
CLASSPATH=`cygpath -p -w"$CLASSPATH"`
fi
echo $CLASSPATH
exit
elif[ “$COMMAND” = “namenode” ] ; then
CLASS=‘org.apache.hadoop.hdfs.server.namenode.NameNode’
HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_NAMENODE_OPTS”
elif[ “$COMMAND” = “secondarynamenode” ] ; then
CLASS=‘org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode’
HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_SECONDARYNAMENODE_OPTS”
elif[ “$COMMAND” = “datanode” ] ; then
CLASS=‘org.apache.hadoop.hdfs.server.datanode.DataNode’
if [ “$starting_secure_dn” =“true” ]; then
HADOOP_OPTS=“ H A D O O P _ O P T S − j v m s e r v e r HADOOP\_OPTS -jvm server HADOOP_OPTS−jvmserverHADOOP_DATANODE_OPTS”
else
HADOOP_OPTS=“ H A D O O P _ O P T S − s e r v e r HADOOP\_OPTS -server HADOOP_OPTS−serverHADOOP_DATANODE_OPTS”
fi
elif[ “$COMMAND” = “fs” ] ; then
CLASS=org.apache.hadoop.fs.FsShell
HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_CLIENT_OPTS”
elif[ “$COMMAND” = “dfs” ] ; then
CLASS=org.apache.hadoop.fs.FsShell
HADOOP_OPTS=“$HADOOP_OPTS $HADOOP_CLIENT_OPTS”
elif[ “$COMMAND” = “dfsadmin” ] ; then
CLASS=org.apache.hadoop.hdfs.tools.DFSAdmin
HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_CLIENT_OPTS”
elif[ “$COMMAND” = “mradmin” ] ; then
CLASS=org.apache.hadoop.mapred.tools.MRAdmin
HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_CLIENT_OPTS”
elif[ “$COMMAND” = “fsck” ] ; then
CLASS=org.apache.hadoop.hdfs.tools.DFSck
HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_CLIENT_OPTS”
elif[ “$COMMAND” = “balancer” ] ; then
CLASS=org.apache.hadoop.hdfs.server.balancer.Balancer
HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_BALANCER_OPTS”
elif[ “$COMMAND” = “fetchdt” ] ; then
CLASS=org.apache.hadoop.hdfs.tools.DelegationTokenFetcher
elif[ “$COMMAND” = “jobtracker” ] ; then
CLASS=org.apache.hadoop.mapred.JobTracker
HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_JOBTRACKER_OPTS”
elif[ “$COMMAND” = “historyserver” ] ; then
CLASS=org.apache.hadoop.mapred.JobHistoryServer
HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_JOB_HISTORYSERVER_OPTS”
elif[ “$COMMAND” = “tasktracker” ] ; then
CLASS=org.apache.hadoop.mapred.TaskTracker
HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_TASKTRACKER_OPTS”
elif[ “$COMMAND” = “job” ] ; then
CLASS=org.apache.hadoop.mapred.JobClient
HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_CLIENT_OPTS”
elif[ “$COMMAND” = “queue” ] ; then
CLASS=org.apache.hadoop.mapred.JobQueueClient
HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_CLIENT_OPTS”
elif[ “$COMMAND” = “pipes” ] ; then
CLASS=org.apache.hadoop.mapred.pipes.Submitter
HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_CLIENT_OPTS”
elif[ “$COMMAND” = “version” ] ; then
CLASS=org.apache.hadoop.util.VersionInfo
HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_CLIENT_OPTS”
elif[ “$COMMAND” = “jar” ] ; then
CLASS=org.apache.hadoop.util.RunJar
HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_CLIENT_OPTS”
elif[ “$COMMAND” = “distcp” ] ; then
CLASS=org.apache.hadoop.tools.DistCp
CLASSPATH= C L A S S P A T H : {CLASSPATH}: CLASSPATH:{TOOL_PATH}
HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_CLIENT_OPTS”
elif[ “$COMMAND” = “daemonlog” ] ; then
CLASS=org.apache.hadoop.log.LogLevel
HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_CLIENT_OPTS”
elif[ “$COMMAND” = “archive” ] ; then
CLASS=org.apache.hadoop.tools.HadoopArchives
CLASSPATH= C L A S S P A T H : {CLASSPATH}: CLASSPATH:{TOOL_PATH}
HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_CLIENT_OPTS”
elif[ “$COMMAND” = “sampler” ] ; then
CLASS=org.apache.hadoop.mapred.lib.InputSampler
HADOOP_OPTS=“ H A D O O P _ O P T S HADOOP\_OPTS HADOOP_OPTSHADOOP_CLIENT_OPTS”
else
CLASS=$COMMAND
fi
具体可以执行那些命令从以上代码完全可以看出来,而且执行哪一个命令具体对应哪一个类都很有清楚的对应,让我们在分析某一个具体功能的代码的时候能够很块找到入口点。从上面代码最后第二行可以看出hadoop脚本也可以直接运行一个java的jar包或类,这样方便开发者测试自己开发的基于hadoop平台的程序,看样子小脚本能够学到大量知识。
8.如果是cygwin环境需要转换路径,代码如下:
#cygwin path translation
if$cygwin; then
CLASSPATH=`cygpath -p -w"$CLASSPATH"`
HADOOP_HOME=`cygpath -w"$HADOOP_HOME"`
HADOOP_LOG_DIR=`cygpath -w"$HADOOP_LOG_DIR"`
TOOL_PATH=`cygpath -p -w “$TOOL_PATH”`
fi
9.设置java执行需要的本地库路径JAVA_LIBRARY_PATH,具体代码如下:
#Determinethe JAVA_PLATFORM
JAVA_PLATFORM=`CLASSPATH= C L A S S P A T H {CLASSPATH} CLASSPATH{JAVA} -Xmx32m ${HADOOP_JAVA_PLATFORM_OPTS}org.apache.hadoop.util.PlatformName | sed -e “s/ /_/g”`
if [“$JAVA_PLATFORM” = “Linux-amd64-64” ]; then
JSVC_ARCH=“amd64”
else
JSVC_ARCH=“i386”
fi
#setup ‘java.library.path’ for native-hadoop code if necessary
JAVA_LIBRARY_PATH=‘’
if [-d “ H A D O O P _ H O M E / b u i l d / n a t i v e " − o − d " {HADOOP\_HOME}/build/native" -o -d" HADOOP_HOME/build/native"−o−d"{HADOOP_HOME}/lib/native” -o -e “${HADOOP_PREFIX}/lib/libhadoop.a”]; then
if [ -d “$HADOOP_HOME/build/native”]; then
JAVA_LIBRARY_PATH= H A D O O P _ H O M E / b u i l d / n a t i v e / {HADOOP\_HOME}/build/native/ HADOOP_HOME/build/native/{JAVA_PLATFORM}/lib
fi
if [ -d “${HADOOP_HOME}/lib/native”]; then
if [ “x$JAVA_LIBRARY_PATH” !=“x” ]; then
JAVA_LIBRARY_PATH= J A V A _ L I B R A R Y _ P A T H : {JAVA\_LIBRARY\_PATH}: JAVA_LIBRARY_PATH:{HADOOP_HOME}/lib/native/${JAVA_PLATFORM}
else
JAVA_LIBRARY_PATH= H A D O O P _ H O M E / l i b / n a t i v e / {HADOOP\_HOME}/lib/native/ HADOOP_HOME/lib/native/{JAVA_PLATFORM}
fi
fi
if [ -e"${HADOOP_PREFIX}/lib/libhadoop.a" ]; then
JAVA_LIBRARY_PATH=${HADOOP_PREFIX}/lib
fi
fi
#cygwin path translation
if$cygwin; then
JAVA_LIBRARY_PATH=`cygpath -p"$JAVA_LIBRARY_PATH"`
fi
10. 设置hadoop可选项变量:HADOOP_OPTS;
HADOOP_OPTS=“ H A D O O P _ O P T S − D h a d o o p . l o g . d i r = HADOOP\_OPTS-Dhadoop.log.dir= HADOOP_OPTS−Dhadoop.log.dir=HADOOP_LOG_DIR”
HADOOP_OPTS=“ H A D O O P _ O P T S − D h a d o o p . l o g . f i l e = HADOOP\_OPTS-Dhadoop.log.file= HADOOP_OPTS−Dhadoop.log.file=HADOOP_LOGFILE”
HADOOP_OPTS=“ H A D O O P _ O P T S − D h a d o o p . h o m e . d i r = HADOOP\_OPTS-Dhadoop.home.dir= HADOOP_OPTS−Dhadoop.home.dir=HADOOP_HOME”
HADOOP_OPTS=“ H A D O O P _ O P T S − D h a d o o p . i d . s t r = HADOOP\_OPTS-Dhadoop.id.str= HADOOP_OPTS−Dhadoop.id.str=HADOOP_IDENT_STRING”
HADOOP_OPTS=“ H A D O O P _ O P T S − D h a d o o p . r o o t . l o g g e r = HADOOP\_OPTS-Dhadoop.root.logger= HADOOP_OPTS−Dhadoop.root.logger={HADOOP_ROOT_LOGGER:-INFO,console}”
11. 首先判断是运行节点的启动节点运行命令还是普通的客户端命令,然后根据相关条件设置运行的模式(有三种:jvsc、su和normal),最后一步就是根据上面确定的运行模式具体运行命令,只有datanode节点能够使用jsvc运行,如下代码所示:
#turnsecurity logger on the namenode and jobtracker only
if [$COMMAND = “namenode” ] || [ $COMMAND = “jobtracker” ];then
HADOOP_OPTS=“ H A D O O P _ O P T S − D h a d o o p . s e c u r i t y . l o g g e r = HADOOP\_OPTS-Dhadoop.security.logger= HADOOP_OPTS−Dhadoop.security.logger={HADOOP_SECURITY_LOGGER:-INFO,DRFAS}”
else
HADOOP_OPTS=“ H A D O O P _ O P T S − D h a d o o p . s e c u r i t y . l o g g e r = HADOOP\_OPTS-Dhadoop.security.logger= HADOOP_OPTS−Dhadoop.security.logger={HADOOP_SECURITY_LOGGER:-INFO,NullAppender}”
fi
if [“x$JAVA_LIBRARY_PATH” != “x” ]; then
HADOOP_OPTS=“ H A D O O P _ O P T S − D j a v a . l i b r a r y . p a t h = HADOOP\_OPTS-Djava.library.path= HADOOP_OPTS−Djava.library.path=JAVA_LIBRARY_PATH”
fi
HADOOP_OPTS=“ H A D O O P _ O P T S − D h a d o o p . p o l i c y . f i l e = HADOOP\_OPTS-Dhadoop.policy.file= HADOOP_OPTS−Dhadoop.policy.file=HADOOP_POLICYFILE”
#Check to see if we should start a secure datanode
if [“$starting_secure_dn” = “true” ]; then
if [ “$HADOOP_PID_DIR” =“” ]; then
HADOOP_SECURE_DN_PID=“/tmp/hadoop_secure_dn.pid”
else
HADOOP_SECURE_DN_PID=“$HADOOP_PID_DIR/hadoop_secure_dn.pid”
fi
exec “ H A D O O P _ H O M E / l i b e x e c / j s v c . HADOOP\_HOME/libexec/jsvc. HADOOP_HOME/libexec/jsvc.{JSVC_ARCH}”-Dproc_ C O M M A N D − o u t f i l e " COMMAND -outfile " COMMAND−outfile"HADOOP_LOG_DIR/jsvc.out"
-errfile “$HADOOP_LOG_DIR/jsvc.err”
-pidfile “$HADOOP_SECURE_DN_PID”
-nodetach
-user “$HADOOP_SECURE_DN_USER”
-cp “$CLASSPATH”
$JAVA_HEAP_MAX $HADOOP_OPTS
org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter"$@"
else
# run it
exec “ J A V A " − D p r o c _ JAVA" -Dproc\_ JAVA"−Dproc_COMMAND$JAVA_HEAP_MAX H A D O O P _ O P T S − c l a s s p a t h " HADOOP\_OPTS -classpath " HADOOP_OPTS−classpath"CLASSPATH” C L A S S " CLASS" CLASS"@"
fi
到此为止所有脚本执行完毕,剩余就是不能识别模式的错误处理和提示。在执行具体命令的时候可能涉及到用户名的检测,例如su可以指定一个用户名来运行,如果不指定就按照linux上的用户名来运行。
这两个脚本基本上在上面分析的所有脚本都涉及到,他们的主要功能就是根据命令行参数来设置一些配置文件的路径和环境变量的值,都是一些公共设置,所以在执行每个脚本的时候都设置一遍。具体的代码就不详细分析了!
hadoop-config.sh
# included in all the hadoop scripts with source command
# should not be executable directly
# also should not be passed any arguments, since we need original $*
# resolve links - $0 may be a softlink
this="${BASH_SOURCE-$0}"
common_bin=$(cd -P -- "$(dirname -- "$this")" && pwd -P)
script="$(basename -- "$this")"
this="$common_bin/$script"
# convert relative path to absolute path
config_bin=`dirname "$this"`
script=`basename "$this"`
config_bin=`cd "$config_bin"; pwd`
this="$config_bin/$script"
# the root of the Hadoop installation
export HADOOP_PREFIX=`dirname "$this"`/..
#check to see if the conf dir is given as an optional argument
if [ $# -gt 1 ]
then
if [ "--config" = "$1" ]
then
shift
confdir=$1
shift
HADOOP_CONF_DIR=$confdir
fi
fi
# Allow alternate conf dir location.
if [ -e "${HADOOP_PREFIX}/conf/hadoop-env.sh" ]; then
DEFAULT_CONF_DIR="conf"
else
DEFAULT_CONF_DIR="etc/hadoop"
fi
HADOOP_CONF_DIR="${HADOOP_CONF_DIR:-$HADOOP_PREFIX/$DEFAULT_CONF_DIR}"
#check to see it is specified whether to use the slaves or the
# masters file
if [ $# -gt 1 ]
then
if [ "--hosts" = "$1" ]
then
shift
slavesfile=$1
shift
export HADOOP_SLAVES="${HADOOP_CONF_DIR}/$slavesfile"
fi
fi
if [ -f "${HADOOP_CONF_DIR}/hadoop-env.sh" ]; then
. "${HADOOP_CONF_DIR}/hadoop-env.sh"
fi
if [ "$HADOOP_HOME_WARN_SUPPRESS" = "" ] && [ "$HADOOP_HOME" != "" ]; then
echo "Warning: $HADOOP_HOME is deprecated." 1>&2
echo 1>&2
fi
export HADOOP_HOME=${HADOOP_PREFIX}
export HADOOP_HOME_WARN_SUPPRESS=1
hadoop-env.sh
# Set Hadoop-specific environment variables here.
# The only required environment variable is JAVA_HOME. All others are
# optional. When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.
# The java implementation to use. Required.
# export JAVA_HOME=/usr/lib/j2sdk1.5-sun
# Extra Java CLASSPATH elements. Optional.
# export HADOOP_CLASSPATH=
# The maximum amount of heap to use, in MB. Default is 1000.
# export HADOOP_HEAPSIZE=2000
# Extra Java runtime options. Empty by default.
# export HADOOP_OPTS=-server
# Command specific options appended to HADOOP_OPTS when specified
export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS"
export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_SECONDARYNAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS"
export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANCER_OPTS"
export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS"
# export HADOOP_TASKTRACKER_OPTS=
# The following applies to multiple commands (fs, dfs, fsck, distcp etc)
# export HADOOP_CLIENT_OPTS
# Extra ssh options. Empty by default.
# export HADOOP_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR"
# Where log files are stored. $HADOOP_HOME/logs by default.
# export HADOOP_LOG_DIR=${HADOOP_HOME}/logs
# File naming remote slave hosts. $HADOOP_HOME/conf/slaves by default.
# export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves
# host:path where hadoop code should be rsync'd from. Unset by default.
# export HADOOP_MASTER=master:/home/$USER/src/hadoop
# Seconds to sleep between slave commands. Unset by default. This
# can be useful in large clusters, where, e.g., slave rsyncs can
# otherwise arrive faster than the master can service them.
# export HADOOP_SLAVE_SLEEP=0.1
# The directory where pid files are stored. /tmp by default.
# export HADOOP_PID_DIR=/var/hadoop/pids
# A string representing this instance of hadoop. $USER by default.
# export HADOOP_IDENT_STRING=$USER
# The scheduling priority for daemon processes. See 'man nice'.
# export HADOOP_NICENESS=10
这个启动脚本还是比较复杂,从这个启动脚本我学习到很多知识,第一就是学到很多有关于shell编程的知识,里面很多shell编程的技巧值得学习和借鉴;第二,通过整个启动过程的了解,知道了运行hadoop需要设置的很多东西,包括我们在配置文件中配置的一些选项是怎么起作用的、设置了哪些classpath路径等,第三,详细了解了所有能够通过hadoop执行的命令。还有其他许多收获竟在不言中。
说明:
下面查看方法是在bin/hadoop文件的最后exec上加如下语句:
echo “ J A V A − D p r o c _ JAVA-Dproc\_ JAVA−Dproc_COMMAND $JAVA_HEAP_MAX $HADOOP_OPTS -classpath C L A S S P A T H CLASSPATH CLASSPATHCLASS $@”
exec" J A V A " − D p r o c _ JAVA" -Dproc\_ JAVA"−Dproc_COMMAND $JAVA_HEAP_MAX H A D O O P _ O P T S − c l a s s p a t h " HADOOP\_OPTS -classpath" HADOOP_OPTS−classpath"CLASSPATH" C L A S S " CLASS " CLASS"@"
shell脚本最终执行的语句示例:
hdfs启动和停止:
root@ubuntu:/_nosql/hadoop#bin/start-dfs.sh
starting namenode, logging to/_nosql/hadoop/libexec/…/logs/hadoop-root-namenode-ubuntu.out
/soft/java/jdk1.6.0_13/bin/java-Dproc_namenode -Xmx1000m-Dcom.sun.management.jmxremote-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote-Dcom.sun.management.jmxremote-Dhadoop.log.dir=/_nosql/hadoop/libexec/…/logs-Dhadoop.log.file=hadoop-root-namenode-ubuntu.log -Dhadoop.home.dir=/_nosql/hadoop/libexec/…-Dhadoop.id.str=root -Dhadoop.root.logger=DEBUG,DRFA-Dhadoop.security.logger=INFO,DRFAS-Djava.library.path=/_nosql/hadoop/libexec/…/lib/native/Linux-i386-32-Dhadoop.policy.file=hadoop-policy.xml -classpath/_nosql/hadoop/libexec/…/conf:/soft/java/jdk1.6.0_13/lib/tools.jar:/_nosql/hadoop/libexec/…声明:本文内容由网友自发贡献,转载请注明出处:【wpsshop博客】
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。