赞
踩
换了新笔记本,做个笔记。
Hive-2.3.4
(即使是当地的单机也需要SSH,否则格式化的hadoop的存储系统时无权限,导致失败
:本地主机:@localhost:权限被拒绝(公钥,密码)开始)
ssh免密两步骤(在客户端下依次执行,所有选项按回车即可)
(1)生成密钥: $ ssh-keygen -t rsa -P '' # 注意: '' 是两个英文单引号
命令执行后, 会在当前用户目录新建.ssh目录并生成两个文件:私钥 id_rsa 和公钥 id_rsa.pub
Ps:可能导致ssh免密不生效的命令$ ssh-keygen -t dsa -f ~/.ssh/id_dsa
(2)复制登录认证的公钥:$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
PS: 远程免密登录时只需要把id_rsa.pub追加到远程登录机器的~/.ssh/authorized_keys文件中, 没有的话新建一个即可
尝试连接(第一次可能需要输入密码):
:〜$ ssh localhost
Ps:如果配了ssh免密,登入时还需要输入密码,需要需改.ssh文件夹访问权限,分配权限为登陆用户
- chmod 700 /home/raini/.ssh
- chmod 600 /home/raini/.ssh/*
- chown raini: /home/raini/.ssh
- chown raini: /home/raini/.ssh/*
1.分别解压Java和Scala到自己想存放的目录
2.配置环境变量
raini @ biyuzhe:〜$ gedit .bashrc (在末尾加入)
- ## java
- export JAVA_HOME=/home/raini/app/jdk
- export JRE_HOME=${JAVA_HOME}/jre
- export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:$CLASSPATH
- export PATH=${JAVA_HOME}/bin:$JRE_HOME/bin:$PATH
-
- ## scala
- export SCALA_HOME=/home/raini/app/scala
- export PATH=${SCALA_HOME}/bin:$PATH
3.执行$ source .bashrc (应用更改)
4.验证
- ## hadoop-3.x
- export HADOOP_HOME=/home/raini/app/hadoop
- export CLASSPATH=".:$JAVA_HOME/lib:$CLASSPATH"
- #
- export HADOOP_COMMON_HOME=$HADOOP_HOME
- export HADOOP_HDFS_HOME=$HADOOP_HOME
- export HADOOP_MAPRED_HOME=$HADOOP_HOME
- export HADOOP_YARN_HOME=$HADOOP_HOME
- #
- export HADOOP_INSTALL=$HADOOP_HOME
- export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
- export HADOOP_CONF_DIR=$HADOOP_HOME
- export HADOOP_PREFIX=$HADOOP_HOME
- export HADOOP_LIBEXEC_DIR=$HADOOP_HOME/libexec
- export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
- export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
- #
- export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
- #
- #export HDFS_DATANODE_USER=root
- #export HDFS_DATANODE_SECURE_USER=root
- #export HDFS_SECONDARYNAMENODE_USER=root
- #export HDFS_NAMENODE_USER=root
- <configuration>
- <property>
- <name>fs.defaultFS</name>
- <value>hdfs://biyuzhe:9000</value>
- </property>
- <property>
- <name>hadoop.tmp.dir</name>
- <value>/home/raini/app/hadoop/data/tmp</value>
- </property>
- </configuration>
集群模式:
- <!-- 配置副本个数以及数据存放的路径 -->
- <configuration>
- <property>
- <name>dfs.replication</name>
- <value>1</value>
- </property>
- <property>
- <name>dfs.namenode.name.dir</name>
- <value>/abutionai/app/hadoop/data/hdfs/namenode</value>
- </property>
- <property>
- <name>dfs.namenode.data.dir</name>
- <value>/abutionai/app/hadoop/data/hdfs/datanode</value>
- </property>
-
- <!-- 为abutionDB的配置 -->
- <property>
- <name>dfs.support.append</name>
- <value>true</value>
- </property>
- <property>
- <name>dfs.datanode.synconclose</name>
- <value>true</value>
- </property>
-
- </configuration>
集群模式:
集群模式(单机可选):
- export JAVA_HOME=/home/raini/app/jdk
- export HADOOP_HOME=/home/raini/app/hadoop
-
- # 默认情况下,Hadoop的生成大量调试日志。 为了制止这种行为,开头和结尾查找行的export HADOOP_OPTS并将其更改为:
- export HADOOP_OPTS="$HADOOP_OPTS -XX:-PrintWarnings -Djava.net.preferIPv4Stack=true"
-
- # pid文件
- export HADOOP_PID_DIR=/home/raini/app/tmp/pids
master.thutmose.cn
集群模式一行写一台从节点ip
$ bin/hdfs namenode -format
$ sbin/start-dfs.sh
http://localhost:50070/ ,查看hadoop状况
问题解决:
(针对root用户报错)Attempting to operate on hdfs namenode as root
1、master,slave都需要修改start-dfs.sh,stop-dfs.sh,start-yarn.sh,stop-yarn.sh四个文件
2、如果你的Hadoop是另外启用其它用户来启动,记得将root改为对应用户
在/hadoop/sbin路径下:
将start-dfs.sh,stop-dfs.sh两个文件顶部添加以下参数
#!/usr/bin/env bash
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
还有,start-yarn.sh,stop-yarn.sh顶部也需添加以下:
#!/usr/bin/env bash
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
顺便把pyspark也配置了
- ## spark
- export SPARK_HOME=/home/raini/app/spark
- export PATH=${SPARK_HOME}/bin:$PATH
- export PYSPARK_PYTHONPATH=${SPARK_HOME}/bin:${SPARK_HOME}/python:${SPARK_HOME}/python/lib/py4j-0.10.7-src.zip:$PATH
- # PYSPARK
- export PYSPARK_DRIVER_PYTHON=$ANACONDA_ROOT/bin/ipython notebook
- export PYSPARK_PYTHON=$ANACONDA_ROOT/envs/py35/bin/python
- export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
- 1. vim slaves (追加自己的主机名)
- ##localhost
- biyuzhe
-
- 2.vim spark-env.sh
- 追加:
- export JAVA_HOME=/home/raini/app/jdk
- export SCALA_HOME=/home/raini/app/scala
- export SPARK_WORKER_MEMORY=1G
- export HADOOP_HOME=/home/raini/app/hadoop
- export HADOOP_CONF_DIR=/home/raini/app/hadoop/etc/hadoop
- export SPARK_MASTER_HOST=biyuzhe
- export SPARK_PID_DIR=/home/raini/app/spark/data/pid
- export SPARK_LOCAL_DIRS=/home/raini/app/spark/data/spark_shuffle
-
-
- 3.vim spark-defaults.conf
- 追加:
- # Example:
- # spark.master spark://master:7077
- # spark.eventLog.enabled true
- # spark.eventLog.dir hdfs://namenode:8021/directory ## 但是hadoop配置的是9000
- # spark.serializer org.apache.spark.serializer.KryoSerializer
- # spark.driver.memory 5g
- # spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
-
- spark.master spark://biyuzhe:7077
- spark.eventLog.enabled true
- spark.eventLog.dir hdfs://biyuzhe:9000/eventLog
- spark.serializer org.apache.spark.serializer.KryoSerializer
- spark.driver.memory 1g
- # spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
-
- ## 安装mmlspark
- #spark.jars.packages Azure:mmlspark:0.12
$SPARK_HOME/sbin/start-all.sh
5.web监控
http://biyuzhe:8080/
- # added by Anaconda3
- export ANACONDA_ROOT=/home/raini/app/anoconda3
- export PATH=${ANACONDA_ROOT}/bin:$PATH
-
- # pyspark
- export PYSPARK_DRIVER_PYTHON=$ANACONDA_ROOT/bin/ipython notebook
- export PYSPARK_PYTHON=$ANACONDA_ROOT/envs/py35/bin/python
- export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
raini @ biyuzhe:〜$ source .bashrc
raini @ biyuzhe:〜$ pyspark --packages Azure:mmlspark:0.14
自动跳转到的IPython的中,现在就可以编辑与运行代码了:
不使用的IPython的:
修改spark-env.sh文件,在末尾添加
export PYSPARK_PYTHON=/home/raini/app/anoconda3/envs/py35/bin/python
(完)
错误SparkContext:91 - 初始化SparkContext时出错.java.net.ConnectException
:从biyuzhe / 127.0.1.1调用biyuzhe:8021连接异常失败: java.net.ConnectException:拒绝连接; 有关更多详细信息,请参阅:
sun.reflect
上的sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
中的sun.reflect.NativeConstructorAccessorImpl.newInstance0(本地方法)中的http://wiki.apache .org / hadoop / ConnectionRefused.DissatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
在java.lang.reflect.Constructor.newInstance(Constructor.java:423)
在org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java) :792)
在org.apache .hadoop。 net.NetUtils.wrapException(NetUtils.java:732)
在org.apache.hadoop.ipc.Client.call(Client.java:1479)
在org.apache .hadoop。 ipc.Client.call(Client.java:1412)...
方法1.修改配置的端口8021(火花的默认)成9000(HDFS的默认)
方法2(待验证)./ etc / hosts中不要有:: 1的段,屏蔽掉:
sudo apt install mysql-server
一、(仅对于新安装的执行这步,已有MySQL的跳过)
重置root用户密码:
SET PASSWORD FOR 'root'@'localhost' = PASSWORD('root');
或:
update mysql.user set authentication_string=PASSWORD('root'), plugin='mysql_native_password' where user='root';
给用户赋权:
grant all privileges on *.* to 'root'@'%' identified by 'root';
grant all privileges on *.* to 'root'@'localhost' identified by 'root';(与上相仿)
更新:
flush privileges;
二、
登录MySQL:
# mysql -u root -p
建立数据库hive:
mysql> create database hive;
mysql> show databases;
修改hive数据库的字符集为latin1:
mysql> alter database hive character set latin1;
创建hive用户,并授权:
mysql> create user 'hive'@'localhost' identified by 'hive';
mysql> grant select,insert,update,delete,alter,create,index,references on metastore.* to 'hive'@'localhost';
或者:mysql>grant all privileges on *.* to 'hive'@'node1' identified by 'hive' with grant option;
注意:@后面改成你的hostname
更新:
mysql>flush privileges;
三、使用新用户登录并设置密码
$ mysql [-h master] -uhive -p (回车再回车)
mysql>
SET PASSWORD FOR hive@localhost = PASSWORD('hive');
查询的MySQL的版本:
mysql>select version(); //5.7.24-0ubuntu0.18.04.1
下载MySQL的JDBC的驱动包:
http://dev.mysql.com/downloads/connector/j/
选择独立平台,下载mysql-connector-java-8.0.13.zip,复制msyql的JDBC驱动包到蜂巢的LIB目录下。
在.bashrc中添加如下:
#Hive
export HIVE_HOME = /home/raini/app/hive
export PATH = $ PATH:${HIVE_HOME}/bin
export CLASSPATH = $CLASSPATH.:{HIVE_HOME}/lib
配置hive-env.sh文件:
HADOOP_HOME = /home/raini/app/hadoop
export HIVE_CONF_DIR = /home/raini/app/hive/conf
#export HADOOP_HEAPSIZE = 512
#导入第三方lib包,参考(https://blog.csdn.net/qianshangding0708/article/details/50381966)
#export HIVE_AUX_JARS_PATH = /home/raini/app/hive/../.jar(绝对路径,多个用,分隔)
PS :(不配置该变量,仅需要将所需 jar放入新建目录 $ {HIVE_HOME}/auxlib下即可)
(可选配置):
HADOOP_HOME=/usr/local/Cellar/hadoop/3.1.1/libexec
export HIVE_CONF_DIR=/Users/zhengsiming/app/hive/conf
#export HADOOP_HEAPSIZE = 512
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_201.jdk/Contents/Home
export HADOOP_HOME=/usr/local/Cellar/hadoop/3.1.1/libexec
export HIVE_HOME=/Users/zhengsiming/app/hive
- 配置Hive的site.xml文件
(创建Hive-site.xml中所需文件夹):
raini @ biyuzhe:〜$ hadoop fs -mkdir -p /user/hive/tmp
raini @ biyuzhe:〜$ hadoop fs -mkdir -p /user/hive/log
raini @ biyuzhe:〜$ hadoop fs -mkdir -p /user/hive/warehouse
(需要给755权限):
raini @ biyuzhe:〜$ hadoop fs -chmod g + w /user/hive/tmp
raini @ biyuzhe:〜$ hadoop fs -chmod g + w /user/hive/log
raini @ biyuzhe:〜$ hadoop fs -chmod g + w /user/hive/warehouse
(一步到位):
- hdfs dfs -mkdir -p /user/hive/warehouse
- hdfs dfs -mkdir -p /user/hive/tmp
- hdfs dfs -mkdir -p /user/hive/log
- hdfs dfs -chmod -R 777 /user/hive/warehouse
- hadoop fs -chmod 777 /user/hive/tmp
- hdfs dfs -chmod -R 777 /user/hive/tmp
- hdfs dfs -chmod -R 777 /user/hive/log
(新建hive-site.xml文件):
- <?xml version="1.0" encoding="UTF-8" standalone="no"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <configuration>
-
- <!--
- <property>
- <name>hive.metastore.local</name>
- <value>true</value>
- <description>使用本机mysql服务器存储元数据。这种存储方式需要在本地运行一个mysql服务器</description>
- </property>
- -->
-
- <property>
- <name>javax.jdo.option.ConnectionURL</name>
- <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value><!--不能用biyuzhe-->
- <description></description>
- </property>
-
- <property>
- <name>javax.jdo.option.ConnectionDriverName</name>
- <value>com.mysql.cj.jdbc.Driver</value>
- <description>MySQL-5.5之前 com.mysql.jdbc.Driver</description>
- </property>
-
- <property>
- <name>javax.jdo.option.ConnectionUserName</name>
- <value>hive</value>
- <description></description>
- </property>
-
- <property>
- <name>javax.jdo.option.ConnectionPassword</name>
- <value>hive</value>
- </property>
- <!--
- <property>
- <name>hive.metastore.uris</name>
- <value>thrift://localhost:9083</value>
- <description>uri1,uri2,...该参数Hive中metastore(元数据存储)采用remote方式,非Local方式。jdbc/odbcconnectionhive,ifmysqlmustset</description>
- </property>
- -->
- <property>
- <name>hive.metastore.warehouse.dir</name>
- <value>/user/hive/warehouse</value>
- <description>指定Hive的数据存储目录,默认位置在HDFS的/user/hive/warehouse路径下</description>
- </property>
-
- <property>
- <name>hive.exec.scratdir</name>
- <value>/user/hive/tmp</value>
- <description>hive的数据临时文件目录,默认位置为HDFS的/tmp/hive路径下</description>
- </property>
-
- <!--以下4个属性在default文件中都是通过${system:java.io.tmpdir}/${system:user.name}定义的,改成自己在hive目录下建的iotmp-->
-
- <property>
- <name>hive.querylog.location</name>
- <value>/home/raini/app/hive/logs</value>
- <description>这个是用于存放hive相关日志的目录,Location of Hive run time structured log file</description>
- </property>
-
- <property>
- <name>hive.server2.logging.operation.log.location</name>
- <value>/home/raini/app/hive/iotmp/operation_logs</value>
- <description>Top level directory where operation logs are stored if logging functionality is enabled</description>
- </property>
- <property>
- <name>hive.downloaded.resources.dir</name>
- <value>/home/raini/app/hive/iotmp/resource_dir</value>
- <description>Temporary local directory for added resources in the remote file system.</description>
- </property>
- <property>
- <name>hive.exec.local.scratchdir</name>
- <value>/home/raini/app/hive/iotmp/scratchdir</value>
- <description>Local scratch space for Hive jobs</description>
- </property>
-
- <property>
- <name>hive.cli.print.current.db</name>
- <value>true</value>
- </property>
-
- </configuration>
- (vim hive-log4j.proprties)和(vim hive-exec-log4j2.properties):
property.hive.log.dir =(/home/raini/app/app/hive/log)
#当hive运行时,日志存储的地方,(上面hive已经配置过了,所以这步跳过)
- (第一次执行,初始化):
raini@biyuzhe:~$ schematool -dbType mysql -initSchema
--(启动hive服务):
raini@biyuzhe:~$ hive --service metastore &
raini@biyuzhe:~$ hive --service metastore > /tmp/hive_metastore.log 2>&1 &
--(启动hive):
raini@biyuzhe:~$ hive
Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
Ps:(最新的mysql驱动用的驱动名称变了,
所以改变jdbc.driverClassName= com.mysql.cj.jdbc.Driver )
--(测试):
hive (default)> create table test(id int, name string) row format delimited FIELDS TERMINATED BY ',';
OK
Time taken: 3.162 seconds
多余的:若是分布式,scp -r hive-2.3/ raini@node2:/home/app/hive之后只需做一部就可设置node2节点为客户端节点(可以在node2终端打开hive)
<!--
<property>
<name>hive.metastore.uris</name>
<value>thrift://node2:9083</value>
<description>把这个配置解开即可t</description>
</property>
-->
连接之前要先设置代理用户,可不输入用户名和密码直接回车进入。在hadoop的core-site.xml中,设置如下属性(proxyuser后面是运行hive的超级用户,raini是我的用户名):
- <property>
- <name>hadoop.proxyuser.raini.hosts</name>
- <value>*</value>
- </property>
- <property>
- <name>hadoop.proxyuser.raini.groups</name>
- <value>*</value>
- </property>
设置了以后, 无论使用什么用户登陆,都使用hive超级用户 (raini启动hiveserver2) 来代理, 使当前用户以raini的权限进行操作, 但所建立的表还是属于当前用户.
--(端口信息可以在hive-site.xml修改,默认的,可跳过):
- <property>
- <name>hive.server2.thrift.port</name>
- <value>10000</value>
- </property>
- <property>
- <name>hive.server2.thrift.bind.host</name>
- <value>localhost</value> <!-- 默认是localhost -->
- </property>
--(在一个窗口中启动hiveserver2):
或放置后台:raini@biyuzhe:~/app$ hive --service hiveserver2 &
此时后台多了一个RunJar
--(启动beeline):
beeline> !connect jdbc:hive2://localhost:10000
可以看到不输入用户名密码也可以进入,(因为前面hdfs 上的文件夹/ tmp 和/ hive /仓给给了权限,可777,也可755 )
(也可以使用Hbase自带zookeeper)
只需配置conf/zoo.cfg即可
- # 最重要的5个 # 通常是tickTime=tickTime*initLimit 就是20000
- tickTime=2000
- initLimit=5
- syncLimit=2
- dataDir=/home/raini/app/zookeeper/dataDir
- dataLogDir=/home/raini/app/zookeeper/dataLogDir
- clientPort=2181
-
- # the maximum number of client connections.
- # increase this if you need to handle more clients
- #maxClientCnxns=60
- #
- # Be sure to read the maintenance section of the
- # administrator guide before turning on autopurge.
- #
- # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
- #
- # The number of snapshots to retain in dataDir
- #autopurge.snapRetainCount=3
- # Purge task interval in hours
- # Set to "0" to disable auto purge feature
- #autopurge.purgeInterval=1
- # 最重要的5个 # 通常是tickTime=tickTime*initLimit 就是20000
- tickTime=2000
- initLimit=5
- syncLimit=2
- dataDir=/home/raini/app/zookeeper/dataDir
- dataLogDir=/home/raini/app/zookeeper/dataLogDir
- # 先查看端口是否占用,或修改成12181
- clientPort=2181
-
- # 3台机器
- server.1=192.168.110.1:2888:3888
- server.2=192.168.110.2:2888:3888
- server.3=192.168.110.3:2888:3888
-
- #maxClientCnxns=60
- #autopurge.snapRetainCount=3
- #autopurge.purgeInterval=1
完成~ 发送到另外两台机器。
然后:
- node1的data/myid配置如下:
- echo '1' > data/myid
-
- node2的data/myid配置如下:
- echo '2' > data/myid
-
- node3的data/myid配置如下:
- echo '3' > data/myid
分别启动:
zkServer.sh start > /home/app/zookeeper/zookeeper.out
查看状态:
zkServer.sh status
最后:
将zoo.cfg 复制到hbase/conf/下即可。
Ps: 若第一台zoo启动后没查看到进程,可以不管它,启动完后面的机器它会自己起来的。
## 追加:
export JAVA_HOME=/home/raini/app/jdk
export HBASE_CLASSPATH=/home/raini/app/hbase/conf/
export HBASE_PID_DIR=/home/raini/app/tmp/pids
#使用HBase自带的zookeeper(单机环境)
export HBASE_MANAGES_ZK=true
#不使用HBase自带的zookeeper(集群环境)
export HBASE_MANAGES_ZK=false
(单机环境):
- <configuration>
- <property>
- <name>hbase.rootdir</name>
- <value>hdfs://biyuzhe:9000/hbase</value>
- </property>
- <property>
- <name>hbase.cluster.distributed</name>
- <value>true</value>
- </property>
- <property>
- <name>hbase.zookeeper.quorum</name>
- <value>127.0.0.1</value>
- </property>
- <property>
- <name>hbase.zookeeper.property.clientPort</name>
- <value>2181</value>
- </property>
- <property>
- <name>zookeeper.znode.parent</name>
- <value>/hbase</value>
- </property>
- <property>
- <name>hbase.zookeeper.property.dataDir</name>
- <value>/home/raini/app/tmp/hbase_zoo_dataDir</value>
- </property>
- <!-- <property>
- <name>hbase.tmp.dir</name>
- <value>/home/raini/app/tmp/hbase/</value>
- </property>
- <property>
- <name>hbase.master</name>
- <value>biyuzhe:60000</value>
- </property>
-
- <property>
- <name>hbase.wal.provider</name>
- <value>file://home/raini/tmp/hbase-wal</value>
- </property>-->
- <property>
- <name>dfs.replication</name>
- <value>1</value>
- </property>
- <property>
- <name>hbase.master.maxclockskew</name>
- <value>150000</value>
- </property>
- <property>
- <name>zookeeper.session.timeout.ms</name>
- <value>150000</value>
- </property>
- </configuration>
(集群环境-3台):
注意:
hbase.zookeeper.property.clientPort 和 zookeeper.znode.parent 要配置好(方便以后配置janusGraph)
还有配置里使用的要么使用IP,不然全使用hostname,以免出现janusGraph(hbase-client)连接不上hbase-server
#修改为主机名 <----建议写与hostname不同的主机ip , 不需要写master-ip在里边,因为master作为Hmaster了
node2-ip
node3-ip
在node1$ start-hbase.sh
$ hbase shell
- status –查看HBase状态
- hbase(main):001:0> status
-
- version –查看HBase版本信息
- hbase(main):002:0> version
-
- create tablename,columnname1,…,columnnameN –创建表
- hbase(main):013:0* create 'testtable','colfam1','colfam2','colfam3'
-
- describe tablename –描述表定义
- hbase(main):014:0> describe 'testtable'
-
- list –列出所有表
- hbase(main):015:0> list
-
- put tablename,rowname,columnname,value –插入数据
- hbase(main):019:0* put 'testtable','row1','colfam1','123'
- hbase(main):020:0> put 'testtable','row1','colfam1:col1','456'
-
- scan tablename –全表查询
- hbase(main):021:0> scan 'testtable'
-
- get tablename,rowname –查询表中行的数据
- hbase(main):022:0> get 'testtable','row1'
-
- count tablename –查询表中的记录数
- hbase(main):023:0> count 'testtable'
-
- delete tablename,rowname, columnname–删除一个CELL
- hbase(main):041:0> delete 'testtable','row1','colfam1:col1'
-
- disable & drop tablename –删除表
- hbase(main):043:0> disable 'testtable'
- hbase(main):042:0> drop 'testtable'
-
- exists tablename –判断表是否存在
- hbase(main):045:0> exists 'testtable'
-
- disable&alter tablename –删除表中一个列族
- hbase(main):008:0> describe 'testtable'
-
- truncate tablename –清空整张表
- hbase(main):005:0> truncate 'testtable'
- hbase(main):016:0* disable 'testtable'
- hbase(main):011:0> alter 'testtable',NAME=>'colfam1',METHOD=>'delete'
- hbase(main):012:0> describe 'testtable'
- hbase(main):018:0> enable 'testtable'
-
- deleteall tablename rowname –删除表中整行
- hbase(main):010:0> scan 'testtable'
- hbase(main):012:0> deleteall 'testtable','row1'
-
下载:http://mirror.bit.edu.cn/apache/phoenix/
官网:http://phoenix.apache.org/#
前提:Hadoop ,zookeeper,Hbase-1.4 都安装成功
解压:$ tar -zxvf ./apache-phoenix-4.14.1-HBase-1.4-bin.tar.gz
安装:Phoenix 仅安装在Master节点
配置:
1、将 Phoenix 目录下的 phoenix-4.14.1-HBase-1.4-client.jar、phoenix-core-4.14.1-HBase-1.4.jar、phoenix-4.14.1-HBase-1.4-server.jar 拷贝到 hbase 集群各个节点 的安装目录 lib 里。
2、将 hbase 配置文件 hbase-site.xml 拷贝到 Phoenix 的 bin 目录下,覆盖原有的配置文件。
3、将 hdfs 配置文件 core-site.xml、 hdfs-site.xml 拷贝到 Phoenix 的 bin 目录下。
环境变量:
- #phoenix
- export PHOENIX_HOME=/home/raini/app/phoenix
- export PHOENIX_CLASSPATH=$PHOENIX_HOME
- export PATH=$PATH:$PHOENIX_HOME/bin
修改启动文件权限:
Phoenix/bin/下 : chmod 777 psql.py和sqlline.py
启动:
重启 hbase 集群,raini@biyuzhe:~/app/phoenix$ python2 ./bin/sqlline.py biyuzhe:2181
- # cassandra
- export CASSANDRA_HOME=/home/raini/app/cassandra
- export PATH=$CASSANDRA_HOME/bin:$PATH
可以根据磁盘情况设置这3个文件夹,但是要和cassandra.yaml里的想对应
- mkdir /home/raini/app/tmp/cassandra/data
- mkdir /home/raini/app/tmp/cassandra/commitlog
- mkdir /home/raini/app/tmp/cassandra/saved_caches
- mkdir /home/raini/app/tmp/cassandra/hints
tmp/cassandra/data SSTable文件在磁盘中的存储位置,可以有多个地址
tmp/cassandra/commitlog 文件在磁盘中的存储位置.
tmp/cassandra/saved_caches 数据缓存文件在磁盘中的存储位置.保存表和行的缓存
tmp/cassandra/hints 存储提示目录
如果可能,可以考虑将tmp/cassandra/data和tmp/cassandra/commitlog设置在不同的磁盘中,这样有利于分散整体系统的磁盘I/O的压力.
-
- cluster_name: 'JanusGraphCassandraCluster'
-
- hints_directory: /cassandra/hints #存储提示目录
-
- - seeds: "127.0.0.1" #Cassandra集群中的种子节点地址,可以设置多个,用半角逗号隔开,必须是ip
-
- listen_address: localhost #需要监听的IP或主机名。改成本机IP
-
- start_rpc: true #是否开始thrift rpc服务器,默认false
-
- rpc_address: localhost #Cassandra服务器对外提供服务的地址 本机ip
-
- rpc_port: 9160 #Cassandra服务器对外提供服务的端口号 9161
启动:
$ cassandra -f -R #启动,-f表示前台启动,-R表示以管理员身份启动
$ cassandra >> /home/raini/app/cassandra/cassandra.out & #后台启动
在某些本地化的环境中,如果得到如下错误:
- expr: 语法错误
- expr: 语法错误
- bin/cassandra: 59: [: Illegal number:
- bin/cassandra: 63: [: Illegal number:
- bin/cassandra: 67: [: Illegal number:
- expr: 语法错误
- bin/cassandra: 81: [: Illegal number:
- Invalid initial heap size: -XmsM
- Error: Could not create the Java Virtual Machine.
- Error: A fatal exception has occurred. Program will exit.
这种情况下,在/etc/cassandra/cassandra-env.sh中取消以下行注释即可:
- #MAX_HEAP_SIZE="4G"
- #HEAP_NEWSIZE="800M"
将MAX HEAP SIZE 设置为不超过硬件RAM的一半,这没用。Cassandra使用Off-Heap-Storage。
进入数据库
$ bin/cqlsh
#类似于mysql 现在还没有配置身份验证 所以暂时不需要带上用户名密码
$ ./bin/cqlsh node1 9042 # 连接到一个指定的服务器 (9042是监听端口)
thriftServer端口是9161
在使用命令的时候记得常用tab,会有自动补齐功能。
帮助:
cqlsh> help;
cqlsh> CREATE_TABLE help;
显示当前cluster:
cqlsh> DESCRIBE CLUSTER;
显示当前存在的keyspaces:
cqlsh> DESCRIBE KEYSPACES ;
Cluster: JanusGraphCassandraCluster
这些system_traces system_schema system_auth system system_distributed自带的系统keyspaces是用来做内部管理的,有点和master,temp database类似。Cassandra使用这些keyspaces保存schema,tracing和security information。
使用keyspace和表
-Cassandra keyspace和关系型数据库的概念类似。它可以定义一个或多个(表 or column families)。
创建keyspace:
cqlsh> CREATE KEYSPACE janusgraph WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'};
cqlsh> DESCRIBE janusgraph;
注:class代表使用什么作为replication策略,replication_factor表示这个keyspace的数据需要写到几个node上面去。在production的环境下面,一定不能只使用1个replication_factor。
切换到新建的keyspace:
cqlsh> USE janusgraph ;
在新建的keyspace里面创建表:
cqlsh:janusgraph> CREATE TABLE user ( first_name text, last_name text, PRIMARY KEY (first_name));
cqlsh:janusgraph> DESCRIBE user ;
注:也可以直接使用 CREATE TABLE janusgraph.user ( 这种语法创建表,不需要切换keyspace。
插入数据到表中:
cqlsh:janusgraph> INSERT INTO user (first_name, last_name ) VALUES ( 'zhe', 'xiao');
cqlsh:janusgraph> SELECT * FROM user ;
cqlsh:my_keyspace> DELETE last_name FROM user WHERE first_name = 'zhe';
cqlsh:my_keyspace> select * from user ;
cqlsh:my_keyspace> DELETE FROM user WHERE first_name = 'zhe';
清空或者删除表:
cqlsh:my_keyspace> TRUNCATE user ;
cqlsh:my_keyspace> DROP TABLE user ;
启动cassandra报错:java.lang.OutOfMemoryError: unable to create new native thread
当前会话有效设置
ulimit -u # 查看nproc
ulimit -u 65535 # 设置nproc,仅当前会话有效
全局有效
cat /etc/security/limits.d/90-nproc.conf
* soft nproc 1024
vi /etc/security/limits.d/90-nproc.conf
* soft nproc 655350
3台机器配置不同点1:
(node1)node.name: es-node-1
(node2)node.name: es-node-2
(node3)node.name: es-node-3
3台机器配置不同点2:
(node1)network.host: 192.168.110.21
(node2)network.host: 192.168.110.22
(node3)network.host: 192.168.110.23
3台机器配置如下:
(3个配置文件,最少可以只有两处不同,红标处--即需要修改为对应机器的参数)node1示例:
单机模式: 只需将黄线处注释掉即可。
修改为:
-Xms2g
-Xms2g
Ps: 数值需一样,很多人配置为机器内存的1/2
ERROR: bootstrap checks failed
解决方案:
vim /etc/security/limits.conf //添加, 【注销后并重新登录生效】
- * soft nofile 300000
-
- * hard nofile 300000
-
- * soft nproc 102400
-
- * hard nproc 102400
查看是否生效
[seven@localhost ~]$ ulimit -Hn
65536
max virtual memory areas vm.max_map_count [65530] likely too low, increase to at least [262144]
解决方案:
vim /etc/sysctl.conf //添加
- fs.file-max = 1645037
-
- vm.max_map_count=655360
执行:sysctl -p
system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk
解决方法:在elasticsearch.yml中配置bootstrap.system_call_filter为false,注意要在Memory下面:
- bootstrap.memory_lock: false
- bootstrap.system_call_filter: false
修改 /etc/security/limits.d/90-nproc.conf
- 原:
- * soft nproc 1024
- 改为:
- * soft nproc 5120
Ps: Xshell连接的集群需要断开重连接才生效
每台机器都执行:elasticseach > ./elasticseach.out &
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。