赞
踩
查看磁盘信息: fdisk -l
查看内存: df-h
free -h
递归创建文件夹: mkdir -p
查看日期函数: date --help
例:date -d '2020-9-3' +%s
传入字符串返回时间戳
date -d '2020-9-3' +%F
返回完整日期
查看ip:
ip addr
ifconfig -a
(其中“inet addr:”后面一连串数字就是Linux系统的ip地址)
window查看ip: ipconfig
关机: shotdown now
poweroff
重启: reboot
查找文件: find / -name 文件名
注释:
linux:#
sql:--
spark、java、scala://
:set nu
shell命令:
#!/bin/bash used=`free -m | awk 'NR==2' | awk '{print $3}'` free=`free -m | awk 'NR==2' | awk '{print $4}'` echo "===========================" >> /var/log/mem.log date >> /var/log/mem.log echo "Memory usage | [Use:${used}MB][Free:${free}MB]" >> /var/log/mem.log if [ $free -ge 100 ] ; then sync && echo 1 > /proc/sys/vm/drop_caches sync && echo 2 > /proc/sys/vm/drop_caches sync && echo 3 > /proc/sys/vm/drop_caches echo "OK" >> /var/log/mem.log else echo "Not required" >> /var/log/mem.log fi
chmod u +x freemem.sh
./freemem.sh
启动spark :
spark安装根目录下: sbin/start-all.sh
或者sbin目录下: ./start-all.sh
启动spark shell, 三种启动方式:
首先前要启动hive元数据管理:hive --service metastore
(1)本地模式启动:spark-shell
(2)spark自己搭建的集群:spark-shell --master spark://hadoop-single:7077
(3)Spark on YARN(需提前启动yarn):spark-shell --master yarn
退出spark shell : quit
解决站溢出问题
更改配置: vi /opt/install/spark/conf/spark-defaults.conf
更改: spark.driver.extraJavaOptions="-Xss30M"
启动hadoop: start-all.sh
通过hdfs查询文件有多少行:
hdfs dfs -cat /app/data/exam/meituan_waimai_meishi.csv | wc -l
hdfs:
删除文件夹: /hdfs dfs -rm -r /input /input/data
(data是要删除的文件夹,删除文件夹连同文件夹下的文件一起删除)
删除文件: /hdfs dfs -rm -f /input/filename
(filename 则是要删除的文件名字)
hadoop退出安全模式: hadoop dfsadmin -safemode leave
zkServer.sh start
pyspark
pyspark
jupyter notebook --allow-root
退出python命令行: Ctrl + d
quit()
exit()
导入模块:
import numpy as np
import pandas as pd
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.ml.linalg import Vector #导入向量包
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.classification import RandomForestClassifier
from pyspark.ml.import.Pipeline
jupyter NoteBook
kafka-server-start.sh ./config/server.properties
kafka-topics.sh --create --zookeeper 192.168.206.129:2181 --topic testDemo --partitions 3 --replication-factor 1
kafka-console-producer.sh --topic testDemo --broker-list 192.168.206.129:9092
kafka-console-consumer.sh --bootstrap-server 192.168.206.129:9092 --topic testDemo --from-beginning
kafka-topics.sh --zookeeper 127.0.0.1:2181 --delete --topic testDemo
kafka-topics.sh --zookeeper 192.168.206.129:2181 --list
kafka-topics.sh --zookeeper 192.168.206.129:2181 --describe --topic testDemo
kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list 192.168.206.129:9092 --topic testDemo -time -1 --offsets 1
kafka-topics.sh --zookeeper 192.168.206.129:2181 --alter --topic test --config retention-ms=10000
DROP TABLE table_name;
truncate table student;
linux 执行sql语句:
①进入mysql:use mysql -uroot -pok
②执行语句:soucrce /home/a.sql
mysql查看允许任意主机登录:
select user,host,password from user;
函数:
ceil(1.2) = 2
floor(1.2) = 1
启动hbase: start-hbase.sh
进入hbase-shell命令行: hbase shell
查看版本: version
查看集群状态: status
查看文件总行数:
①: count ‘表名’
count '表名',{INTERVAL => 1000000}
默认情况下每1000行显示当前计数,可以自定义计数间隔
②:利用 hbase.RowCounter包 执行 MR 任务
在linux命令行执行: hbase org.apache.hadoop.hbase.mapreduce.RowCounter '表名'
显示结果见(倒数第5行): 《 ROWS=727 》
命令空间:
list_namespace
create_namespace 'events_db'
list_namespace_tables 'events_db'
drop_namespace 'events_db'
alter_namespace 'events_db', {METHOD => 'set', 'PROPERTY_NAME' => 'PROPERTY_VALUE'}
建表语句:
user_friends为表名,uf为列簇
create 'events_db:user_friends','uf'
3个列簇:profile region registration
create 'events_db:users','profile','region','registration'
disable 'events_db:user_friends'
drop 'events_db:user_friends'
alter'exam:analysis','delete'=>'question'
alter 'user', NAME => 'f2'
例1: 表名: customer,列簇: addr ,order rowkey: jsmith
create 'customer', {NAME=>'addr'}, {NAME=>'order'}
list
desc 'customer'
put 'customer', 'jsmith', 'addr:city', 'montreal'
get 'customer', 'jsmith'
例2: 表名: user,列族: baseinfo,privateinfo
create 'user',{NAME=>'baseinfo'},{NAME=>'privateinfo'}
desc 'user'
scan 'user'
put 'user','002','baseinfo:name','kitty'
put 'user','002','baseinfo:age',18
put 'user','002','privateinfo:tall','1.8'
put 'user','002','privateinfo:health','good'
put 'user','001','baseinfo:name','hello'
put 'user','001','baseinfo:age',20
put 'user','001','privateinfo:tall','1.75'
put 'user','001','privateinfo:health','good'
put 'user','003','baseinfo:name','dava'
put 'user','003','baseinfo:age',25
put 'user','003','privateinfo:tall','1.8'
put 'user','003','privateinfo:health','bad'
get 'user','001'
get 'user','001','baseinfo'
get 'user','001',{COLUMNS=>['baseinfo']}
get 'user','001',{COLUMNS=>['baseinfo:name']}
put 'user','001','baseinfo:name','jack'
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv \
-Dimporttsv.separator=, \
-Dimporttsv.columns="HBASE_ROW_KEY,order:numb,order:date" \
customer file:///home/vagrant/hbase_import_data.csv
hive --service metastore
nohup hive --service metastore &
hive 命令行: hive
启动第三方插件beeline:
需启动hiveserver2 服务:nohup hive --service hiveserver2 &
新建窗口,启动beeline服务:beeline -u jdbc:hive2://localhost:10000
hive本地模式:
set hive.exec.mode.local.auto=true;
set hive.exec.mode.local.auto.inputbytes.max=50000000;
set hive.exec.mode.local.auto.tasks.max=10;
set hive.exec.mode.local.auto=true;(默认为false)
当一个job满足如下条件才能真正使用本地模式
1.job的输入数据大小必须小于参数:hive.exec.mode.local.auto.inputbytes.max(默认128MB)
2.job的map数必须小于参数:hive.exec.mode.local.auto.tasks.max(默认4)
3.job的reduce数必须为0或者1
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.optimize.sort.dynamic.partition=true;
set hive auto.convert.join=false;
hive -f map_hbase.hql
如果是在SQL文件当中,使用下面的命令:
设置变量:set hivevar:dt='2020-09-04'
使用变量:insert into table xxx partition(dt=${hivevar:dat})
如果想通过脚本传入变量,使用下面的命令:
hive -hivevar dt='2020-09-04' -f xxx.sql
create table ctas_employee as select * from employee;
①插入数据,从ctas_employee查询出所有的数据插入到employee:
insert into employee select * from ctas_employee;
②同时向两张表插入数据,employee、employee_external
from ctas_employee
insert into table employee select *
insert into table employee_external select *;
③通过文件加载数据:
有local: 从linux复制数据到Hive的指定目录下
无local: 从hdfs剪切数据到Hive指定目录下
格式:
load data [local] inpath '...' [overwrite] into table xxx
④通过文件插入数据:
from employee
insert overwrite/into [local] directory '/root/employee2.txt' select * ;
⑤加载数据至动态分区表:(year、month是 表employee_partitioned中的字段)
from ctas_partitioned
insert into table employee_partitioned partition(year,month)
select name,work_place,sex_age,skills_score,depart_title,'2020','07'
EXPORT TABLE employee TO local '/tmp/output3';
IMPORT TABLE employee FROM '/tmp/output3';
alter table employee partition add
partition (year=2019,mouth=7)
partition (year=2020,mouth=7)
partition (year=2020,mouth=8);
drop database dataName cascade;
select name,wps,skills_score from employee
lateral view explode(work_place) work_place_single as wps;
多列侧视图:
select name,wps,skill,score from employee
lateral view explode(work_place) work_place_single as wps
lateral view explode(skills_score) sks as skill,score;
SELECT * FROM (SELECT * FROM employee) a;
相当于
with a
as (SELECT * FROM employee)
SELECT * FROM a
union all 不去重:
select * from emp_basic eb where eb.emp_name='Erhart Scambler'
union all
select * from emp_basic eb where eb.emp_name='Erhart Scambler';
union 去重:
select * from emp_basic eb where eb.emp_name='Erhart Scambler'
union
select * from emp_basic eb where eb.emp_name='Erhart Scambler' ;
select 字段1,字段2,接聚合函数(字段x)
from ...
gruop by 字段1,字段2 (除了字段x,其他select后接的字段都要放到gruop by 中)
①mysql中建库建表:
CREATE DATABASE IF NOT EXISTS sales_source DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
USE sales_source;
DROP TABLE IF EXISTS customer;
CREATE TABLE customer
(
customer_number INT(11) NOT NULL AUTO_INCREMENT,
customer_name VARCHAR(128) NOT NULL,
customer_street_address VARCHAR(256) NOT NULL,
customer_zip_code INT(11) NOT NULL,
customer_city VARCHAR(32) NOT NULL,
customer_state VARCHAR(32) NOT NULL,
PRIMARY KEY (customer_number)
);
插入数值:
INSERT INTO customer ( customer_name , customer_street_address , customer_zip_code , customer_city , customer_state ) VALUES ('Big Customers', '7500 Louise Dr.', '17050', 'Mechanicsburg', 'PA') , ( 'Small Stores', '2500 Woodland St.', '17055', 'Pittsburgh', 'PA') , ('Medium Retailers', '1111 Ritter Rd.', '17055', 'Pittsburgh', 'PA' ) , ('Good Companies', '9500 Scott St.', '17050', 'Mechanicsburg', 'PA') , ('Wonderful Shops', '3333 Rossmoyne Rd.', '17050', 'Mechanicsburg', 'PA') , ('Loyal Clients', '7070 Ritter Rd.', '17055', 'Pittsburgh', 'PA') ;
②hive中建库建表:
create database sales_rds;
USE sales_rds;
DROP TABLE IF EXISTS rds.customer;
CREATE TABLE sales_rds.customer
(
customer_number INT ,
customer_name VARCHAR(128) ,
customer_street_address VARCHAR(256) ,
customer_zip_code INT ,
customer_city VARCHAR(32) ,
customer_state VARCHAR(32)
);
③sqoop全量抽取customer表(hive中customer此处要求事先存在)
sqoop import \
--connect jdbc:mysql://localhost:3306/sales_source \
--username root \
--password ok \
--table customer \
--hive-import \
--hive-table sales_rds.customer \
--hive-overwrite \
--target-dir temp
其中:
–target-dir temp 是指中间临时文件夹
–fields-terminated-by 指定分隔符
④查询:
select * from customer;
①mysql建表:
CREATE TABLE sales_order
(
order_number INT(11) NOT NULL AUTO_INCREMENT,
customer_number INT(11) NOT NULL,
product_code INT(11) NOT NULL,
order_date DATETIME NOT NULL,
entry_date DATETIME NOT NULL,
order_amount DECIMAL(18,2) NOT NULL,
PRIMARY KEY (order_number)
);
②hive建表:
CREATE TABLE sales_rds.sales_order
(
order_number INT ,
customer_number INT,
product_code INT ,
order_date timestamp ,
entry_date timestamp ,
order_amount DECIMAL(18,2)
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
;
③sqoop增量导入
sqoop import \
--connect jdbc:mysql://localhost:3306/sales_source \
--username root \
--password root \
--table sales_order \
--hive-import \
--hive-table sales_rds.sales_order \
--fields-terminated-by '\t' \ #字段分隔符
--lines-terminated-by '\n' \ #行分隔符
--check-column entry_date \ # 检查列, 通过检查某一列的值实现递增
--incremental append \ # 增量的方式, 可以选择append或者是lastmodified
--last-value '1900-1-1' # 上一次检查列最大的值
也可以创建 sqoop job 来实现:
创建 job:
sqoop job \
--create myjob \
-- import \
--connect jdbc:mysql://localhost:3306/sales_source \
--username root \
--password ok \
--table sales_order \
--hive-import \
--hive-table sales_rds.sales_order \
--fields-terminated-by '\t' \
--lines-terminated-by '\n' \
--check-column entry_date \
--incremental append \
--last-value '1900-1-1'
执行job:
sqoop job --exec myjob
删除job:
sqoop job --delete myjob
注: 设置免密执行job
echo -n "ok" > sqoopPWD.pwd
hdfs dfs -mkdir -p /sqoop/pwd
hdfs dfs -put sqoopPWD.pwd /sqoop/pwd/
hdfs dfs -chmod 400 /sqoop/pwd/sqoopPWD.pwd
需在 sqoop/conf/sqoop-site.xml 中放开注释:
<property>
<name>sqoop.metastore.client.record.password</name>
<value>true</value>
<description>If true, allow saved passwords in the metastore.
</description>
</property>
cd /opt/install/zeppelin081/bin/
./zeppelin-daemon.sh start
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。