当前位置:   article > 正文

Hadoop + Spark 大数据巨量分析与机器学习整合开发实战-学习笔记_hadoop+spark大数据巨量分析

hadoop+spark大数据巨量分析

 

  1. spark_Document: http://spark.apache.org/docs/latest/index.html
  2. spark_API: http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.package
  3. 测试数据集movielens https://grouplens.org/datasets/movielens/
  4. hadoop集群搭建 http://hadoop.apache.org/docs/r1.0.4/cn/cluster_setup.html
  5. JARs http://www.java2s.com
  6. ubuntu更新源 https://www.linuxidc.com/Linux/2017-11/148627.html
  7. unknown host: ---> sudo gedit /etc/resolv.conf ---> nameserver 8.8.8.8
  8. --------------------------------------------------D1 linux_enviromrnt-----------------------------------------
  9. Linux查看进程 ps -ef / ps -aux -> ps -ef|grep java -> kill -9 1827
  10. 查看第3-10行 cat filename | tail -n +3 | head -n +10
  11. #其他操作
  12. sudo apt-get update 更新源
  13. sudo apt-get install package 安装包
  14. sudo apt-get remove package 删除包
  15. sudo apt-cache search package 搜索软件包
  16. sudo apt-cache show package 获取包的相关信息,如说明、大小、版本等
  17. sudo apt-get install package --reinstall 重新安装包
  18. sudo apt-get -f install 修复安装
  19. sudo apt-get remove package --purge 删除包,包括配置文件等
  20. sudo apt-get build-dep package 安装相关的编译环境
  21. sudo apt-get upgrade 更新已安装的包
  22. sudo apt-get dist-upgrade 升级系统
  23. sudo apt-cache depends package 了解使用该包依赖那些包
  24. sudo apt-cache rdepends package 查看该包被哪些包依赖
  25. sudo apt-get source package 下载该包的源代码
  26. sudo apt-get clean && sudo apt-get autoclean 清理无用的包
  27. sudo apt-get check 检查是否有损坏的依赖
  28. --------------------------------------------------------------------------------------------------------------
  29. -----------------------------------------THE BOOK OF <HADOOP WITH SPARK>-----------------------------------------
  30. 书中范例下载:
  31. http://pan.baidu.com.cn/s/1qYMtjNQ
  32. http://pan.baidu.com.cn/hadoopsparkbook
  33. ----------------------------------------------------------CP1 Info of big data&ML p1-8
  34. ----------------------------------------------------------CP2 VirtualBox p11-18
  35. 1.virtualbox5.2.34下载 https://www.virtualbox.org/wiki/Download_Old_Builds_5_2
  36. ---> Next ~> ~> ~> Yes ~> Install ~> Finish
  37. ---> Set languages (File ~> preferences ~> language
  38. ---> Set Restore_File ( 管理 ~> 全局设定 ~> VRDP认证库 <其他> ~> 默认虚拟电脑位置:)
  39. ---> Build a Vm
  40. ~> next ~> 4096M ~> ~> VDI ~> D ~> 80G ~>
  41. 2.ubuntu18.04 https://ubuntu.com/download/desktop
  42. ubuntu18.04很卡解决方案 sudo apt install gnome-session-flashback
  43. https://jingyan.baidu.com/article/37bce2bea3c07f1002f3a22a.html
  44. #更新linux sudo apt-get update
  45. 3.安装:mysql http://www.cnblogs.com/jpfss/p/7944622.html
  46. --安装mysql: sudo apt-get install mysql-server
  47. ->获取mysql用户名密码文件: sudo gedit /etc/mysql/debian.cnf
  48. ->登录mysql: mysql -u用户名 -p密码
  49. ->修改mysql密码:
  50. ->配置快捷命令
  51. sudo gedit ~/.bashrc
  52. alias mysql='mysql -u debian-sys-maint -pAQeZFkTb0y5EECNU'
  53. source ~/.bashrc
  54. ->修改mysql在不支持中文 https://www.cnblogs.com/guodao/p/9702465.html
  55. #启动、关闭服务和查看运行状态
  56. sudo service mysql start
  57. sudo service mysql stop
  58. sudo service mysql status
  59. 删除 mysql https://blog.csdn.net/iehadoop/article/details/82961264
  60. 查看MySQL的依赖项: dpkg --list|grep mysql
  61. sudo apt-get remove mysql-common
  62. sudo apt-get autoremove --purge mysql-server-5.7
  63. 清除残留数据: dpkg -l|grep ^rc|awk '{print$2}'|sudo xargs dpkg -P
  64. pkg --list|grep mysql
  65. 继续删除剩余依赖项,如:sudo apt-get autoremove --purge mysql-apt-config
  66. 4.挂载文件夹
  67. vbox设置 -> 共享文件夹+ -> OK
  68. sudo mount -t vboxsf BZ /home/zieox/Desktop/BZ
  69. 5.安装anaconda https://blog.csdn.net/ksws0292756/article/details/79143460
  70. 启动anaconda.SH: sudo bash Anaconda3-2018.12-Linux-x86_64.sh
  71. 启动anaconda.navigator: anaconda-navigator
  72. 6.安装java8/step12INSTEAD https://blog.csdn.net/claire017/article/details/80953632
  73. 7.安装IDEA https://www.cnblogs.com/gzu-link-pyu/p/8263312.html
  74. 解压: sudo tar -zxvf ideaIC-2018.3.5-no-jdk.tar.gz -C /opt
  75. 8.安装PyCharm
  76. 9.安装scala-plugs https://www.cnblogs.com/starwater/p/6766831.html
  77. scala_SDK:
  78. 1. 右键项目名称找到Open Module Settings
  79. 2. 左侧Project Settings目录中点击Libraries
  80. 3. 点击+new Project Library选择Scala SDK
  81. 4. 添加下载好的jar文件夹
  82. 10.安装scala_shell https://blog.csdn.net/wangkai_123456/article/details/53669094
  83. 升级高版本scala: sudo apt-get remove scala
  84. 解压: tar zxvf scala-2.12.8.tgz
  85. 移动: mv scala /usr/local/scala
  86. Source /etc/profile https://blog.csdn.net/qq_35571554/article/details/82850563
  87. 卸载: sudo apt-get --purge remove polipo
  88. 11.maven https://baijiahao.baidu.com/s?id=1612907927393262341&wfr=spider&for=pc
  89. tar -xvf apache-maven-3.6.0-bin.tar.gz
  90. sudo mv apache-maven-3.6.0 /usr/local
  91. 下载地址: http://maven.apache.org/download.cgi
  92. 12.关于maven.pom.xml库 https://mvnrepository.com/
  93. 13.linux下安装gradle https://blog.csdn.net/yzpbright/article/details/53359855
  94. 14.卸载open-jdk sudo apt-get remove openjdk*
  95. 15.安装sunJDK12 https://blog.csdn.net/smile_from_2015/article/details/80056297
  96. tar -zxvf jdk-12_linux-x64_bin.tar.gz
  97. cd /usr/lib
  98. sudo mkdir jdk
  99. sudo mv jdk-12 /usr/lib/jdk
  100. sudo gedit /etc/profile
  101. ----------------------------------------------------
  102. export JAVA_HOME=/usr/lib/jdk/jdk-12
  103. export JRE_HOME=${JAVA_HOME}/jre
  104. export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
  105. export PATH=${JAVA_HOME}/bin:$PATH
  106. ----------------------------------------------------
  107. source /etc/profile
  108. sudo update-alternatives --install /usr/bin/java java /usr/lib/jdk/jdk-8/bin/java 300
  109. sudo update-alternatives --install /usr/bin/javac javac /usr/lib/jdk/jdk-8/bin/javac 300
  110. 16.下载maven wget http://mirror.bit.edu.cn/apache/maven/maven-3/3.6.3/binaries/apache-maven-3.6.3-bin.tar.gz
  111. tar -zxvf apache-maven-3.6.3-bin.tar.gz
  112. rm apache-maven-3.6.3-bin.tar.gz
  113. sudo gedit /etc/profile
  114. ------------------------
  115. export M2_HOME=/usr/local/apache-maven-3.6.1
  116. export PATH=${M2_HOME}/bin:$PATH
  117. ------------------------
  118. source /etc/profile
  119. 17.创建scala 项目
  120. new project -> maven (create from more archetype) ->scala-archetype-simple
  121. 配置maven镜像: https://blog.csdn.net/qq_39929929/article/details/103753905
  122. ----------------------------------------------------------CP3 Ubuntu Linux P23-43
  123. //Set Virtual_File setting ~> restore ~> Controler:IDE (no cd) ~> choose (open) ~>
  124. //Install Ubuntu -> launch ~~~> (clear CD & install Ubuntu) Install --->Launch Ubuntu
  125. //Install Plug-in P34
  126. //Set Default Typewriting system setting -> "text imput" -> "+" ->
  127. //Set terminal : (win)->ter-> drag->
  128. //Set terminal color : terminal (Configuration file preferences)->
  129. //share clipboard (equipment) ~> share_clipboard
  130. ----------------------------------------------------------CP4 The Installing Of Hadoop Single Node Cluster
  131. //4.1 install JDK
  132. 1.ctrl+alt+t
  133. 2.java-version
  134. 3.sudo apt-get update
  135. 4.sudo apt-get install default-jdk
  136. 5.update-alternatives --display java
  137. //4.2 Set SSH logging without code
  138. 1. install SSH : sudo apt-get install ssh
  139. 2. install rsync sudo apt-get install rsync
  140. 3.//产生秘匙 produce SSH Key: ssh-keygen -t dsa -P '' -f ~/.ssh/id.dsa
  141. 4.//查看秘匙 see the file of SSH Key: ll ~/.ssh
  142. 5.//将产生的key放入许可证文件 : cat ~/.ssh/id.dsa.pub >> ~/.ssh/authorized_keys
  143. //4.3 Download Hadoop 下载hadoop
  144. ---> https://archive.apache.org/dist/hadoop/common/
  145. ---> Index of /dist/hadoop/common/hadoop-2.6.0
  146. --->Copy Link Location
  147. ===> wget "~~~coped~~~"
  148. //使用weget下载hadoop2.6 wget https://archive.apache.org/dist/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
  149. //解压 sudo tar -zxvf hadoop-2.6.0.tar.gz
  150. //移动文件 sudo mv hadoop-2.6.0 /usr/local/hadoop
  151. --->see & comfirm ll /usr/local/hadoop
  152. ============================================================================================================
  153. bin/各项运行文件
  154. sbin/各项shell运行文件
  155. etc/etc/hadoop 子目录包含Hadoop配置文件,例 hadoop -env.sh,core-site.xml,yarn-site.xml,mapred-site.xml,hdfs.site.xml
  156. lib/Hadoop函数库
  157. logs/系统日志,可以查看运行状况,运行有问题可以线哦嗯日志找出错误的原因
  158. ============================================================================================================
  159. //4.4 Set Hadoop variance 设置hadoop环境变量
  160. --->Edit bash sudo gedit ~/.bashrc
  161. ---path of jdk which java -> ls -l /usr/bin/java -> ls -l /etc/alternatives/java ->
  162. --->/usr/lib/jvm/java-8-oracle
  163. --->Set JDK
  164. --------------------------------------------------------------------
  165. export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
  166. export HADOOP_HOME=/usr/local/hadoop
  167. export PATH=$PATH:$HADOOP_HOME/bin
  168. export PATH=$PATH:$HADOOP_HOME/sbin
  169. export HADOOP_MAPRED_HOME=$HADOOP_HOME
  170. export HADOOP_COMMON_HOME=$HADOOP_HOME
  171. export HADOOP_HDFS_HOME=$HADOOP_HOME
  172. export YARN_HOME=$HADOOP_HOME
  173. export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
  174. export HADOOP_OPTS="-DJava.library.path=$HADOOP_HOME/lib"
  175. export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
  176. --------------------------------------------------------------------
  177. --->Set HADOOP_HOME hadoop usr/local/hadoop export HADOOP_HOME=/usr/local/hadoop
  178. --->Set PATH
  179. export PATH=$PATH:$HADOOP_HOME/bin
  180. export PATH=$PATH:$HADOOP_HOME/sbin
  181. --->set Hadoop-env
  182. export HADOOP_MAPRED_HOME=$HADOOP_HOME
  183. export HADOOP_COMMON_HOME=$HADOOP_HOME
  184. export HADOOP_HDFS_HOME=$HADOOP_HOME
  185. export YARN_HOME=$HADOOP_HOME
  186. --->lib_link
  187. export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
  188. export HADOOP_OPTS="-DJava.library.path=$HADOOP_HOME/lib"
  189. export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
  190. --->Let bash work. source ~/.bashrc
  191. //4.5 modify Hadoop Setting_files
  192. --->修改hadoop-env.sh配置文件: sudo gedit /usr/local/hadoop/etc/hadoop/hadoop-env.sh
  193. //change 'export JAVA_HOME=${JAVA_HOME}' to export JAVA_HOME=/usr/lib/jvm/java-8-oracle
  194. --->修改core-site.xml配置文件 sudo gedit /usr/local/hadoop/etc/hadoop/core-site.xml
  195. #set Default-name of HDFS
  196. -------------------------------------
  197. <property>
  198. <name>fs.default.name</name>
  199. <value>hdfs://localhost:9000</value>
  200. <property>
  201. -------------------------------------
  202. --->修改 yarn-site.xml配置文件 sudo gedit /usr/local/hadoop/etc/hadoop/yarn-site.xml
  203. -------------------------------------------------------------------
  204. <property>
  205. <name>yarn.nodemanager.aux-services</name>
  206. <value>mapreduce_shuffle</value>
  207. </property>
  208. <property>
  209. <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  210. <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  211. </property>
  212. -------------------------------------------------------------------
  213. --->修改 mapred-site.xml配置文件
  214. #copy template-file (复制模板文件)
  215. sudo cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml
  216. --->编辑 mapred-site.xml配置文件 sudo gedit /usr/local/hadoop/etc/hadoop/mapred-site.xml
  217. --------------------------------------
  218. <property>
  219. <name>mapreduce.framework.name</name>
  220. <value>yarn</value>
  221. </property>
  222. --------------------------------------
  223. --->编辑 hdfs-site.xml配置文件 sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml
  224. 10.0.2.15
  225. ----------------------------------------------------------------------------
  226. <property>
  227. <name>dfs.replication</name>
  228. <value>3</value>
  229. </property>
  230. <property>
  231. <name>dfs.namenode.name.dir</name>
  232. <value> file:/usr/local/hadoop/hadoop_data/hdfs/namenode</value>
  233. </property>
  234. <property>
  235. <name>dfs.datanode.name.dir</name>
  236. <value> file:/usr/local/hadoop/hadoop_data/hdfs/datanode</value>
  237. </property>
  238. ----------------------------------------------------------------------------
  239. //launch without namenode https://www.cnblogs.com/lishpei/p/6136043.html
  240. //4.6 Creating & Formatting HDFS_dir 创建并格式化hdfs目录
  241. ---> 创建NameNode数据存储目录 sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/namenode
  242. ---> 创建DataNode数据存储目录 sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/datanode
  243. ---> 将Hadoop目录所有者更改为 hduser sudo chown zieox:zieox -R /usr/local/hadoop
  244. --->格式化hdfs hadoop namenode -format
  245. ---# 启动 NameNode sudo ./start-yarn.sh
  246. ---# 启动 NameNode sudo ./start-dfs-sh
  247. --->显示错误日志 cd /usr/local/hadoop/logs/
  248. --->启动 name_node ./yarn-daemon.sh start namenode
  249. //查看本机ip ifconfig cd /usr/local/hadoop/ lsof -i :50070
  250. //打开hadoop resource manager web 界面: localhost:8808
  251. //打开NameNode HDFS Web界面 http://172.28.30.12:50070
  252. -------------------------------r-c-------------------------------
  253. 1469 gedit hadoop-lufax-datanode-lufax.log
  254. 1470 mkdir -r /home/user/hadoop_tmp/dfs/data
  255. 1471 mkdir -p /home/user/hadoop_tmp/dfs/data
  256. 1472 sudo mkdir -p /home/user/hadoop_tmp/dfs/data
  257. 1473 sudo chown lufax:lufax /home/user/hadoop_tmp/dfs/data
  258. 1474 ./hadoop-daemon.sh start datanode
  259. 1475 cd ../
  260. 1476 ./sbin/hadoop-daemon.sh start datanode
  261. 1477 jps
  262. 1478 hadoop fs -ls /
  263. 1479 hadoop fs -put README.txt /
  264. 1480 history 20
  265. -----------------------------------------------------------------
  266. ----------------------------------------------------------CP5 Installing:Hadoop Multi Node Cluster
  267. //1. 复制虚拟机
  268. //设置data1虚拟机
  269. //编辑interfaces网络配置文件: sudo gedit /etc/network/interfaces
  270. ------------------------------------------
  271. #NAT interface //网卡1
  272. auto auth0
  273. iface eth0 inet dhcp
  274. #host only interface //网卡2
  275. auto eth0
  276. iface eth1 inet static
  277. address 192.168.56.101
  278. netmask 255.255.255.0
  279. network 192.168.56.0
  280. broadcast 192.168.56.255
  281. ------------------------------------------
  282. //设置hostname sudo gedit /etc/hostname ---> data1
  283. //设置hosts文件 sudo gedit /etc/hosts
  284. //加入主机ip地址
  285. ------------------------------------------
  286. 192.168.56.100 master
  287. 192.168.56.101 data1
  288. 192.168.56.102 data2
  289. 192.168.56.103 data3
  290. ------------------------------------------
  291. //修改localhost为master sudo gedit /usr/local/hadoop/etc/hadoop/core-site.xml
  292. sudo gedit /usr/local/hadoop/etc/hadoop/yarn-site.xml
  293. //设置resourcemanager主机与nodemanager的连接地址
  294. <property>
  295. <name>yarn.resourcemanager.scheduler.address</name>
  296. <value>master:8025</value>
  297. </property>
  298. //设置resourcemanager与applicationmaster的连接地址
  299. <property>
  300. <name>yarn.resourcemanager.address</name>
  301. <value>master:8030</value>
  302. </property>
  303. //设置resourcemanager与客户端的连接地址
  304. <property>
  305. <name>yarn.resourcemanager.address</name>
  306. <value>master:8050</value>
  307. </property>
  308. //查看YARN架构图 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html
  309. //设置mapred-site.xml sudo gedit /usr/local/hadoop/etc/hadoop/mapred-site.xml
  310. ------------------------------------------
  311. <property>
  312. <name>mapred.job.tracker</name>
  313. <value>master:54311</value>
  314. </property>
  315. ------------------------------------------
  316. //设置hdfs-site.xml sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml
  317. ------------------------------------------
  318. <property>
  319. <name>dfs.datanode.data.dir</name>
  320. <value> file:/usr/local/hadoop/hadoop_data/hdfs/datanode</value>
  321. </property>
  322. ------------------------------------------
  323. restart
  324. //确认网络设置 ifconfig
  325. //设置master节点
  326. sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml
  327. -----------------------------------------------------------------
  328. <property>
  329. <name>dfs.namenode.name.dir</name>
  330. <value> file:/usr/local/hadoop/hadoop_data/hdfs/namenode</value>
  331. </property>
  332. -----------------------------------------------------------------
  333. //编辑masters文件 sudo gedit /usr/local/hadoop/etc/hadoop/masters
  334. //编辑slaves文件 sudo gedit /usr/local/hadoop/etc/hadoop/slaves
  335. //master 连接到data1虚拟机 ssh data1
  336. //连接到data1创建HDFS相关目录
  337. --->删除hdfs所有目录 sudo rm -rf /usr/local/hadoop/hadoop_data/hdfs
  338. --->创建DataNode存储目录 mkdir -p /usr/local/hadoop/hadoop_data/hdfs/datanode
  339. --->将目录所有者改成zieox sudo chown -R zieox:zieox /usr/local/hadoop
  340. //中断data1连接回到master exit
  341. //master 连接到data2虚拟机 ssh data2
  342. --->删除hdfs所有目录 sudo rm -rf /usr/local/hadoop/hadoop_data/hdfs
  343. --->创建DataNode存储目录 mkdir -p /usr/local/hadoop/hadoop_data/hdfs/datanode
  344. --->将目录所有者改成zieox sudo chown -R zieox:zieox /usr/local/hadoop
  345. --->中断data2连接回到master exit
  346. //master 连接到data2虚拟机 ssh data3
  347. --->删除hdfs所有目录 sudo rm -rf /usr/local/hadoop/hadoop_data/hdfs
  348. --->创建DataNode存储目录 mkdir -p /usr/local/hadoop/hadoop_data/hdfs/datanode
  349. --->将目录所有者改成zieox sudo chown -R zieox:zieox /usr/local/hadoop
  350. --->中断data3连接回到master exit
  351. //创建并格式化NameNode HDFS目录
  352. --->删除之前的hdfs目录 sudo rm -rf /usr/local/hadoop/hadoop_data/hdfs
  353. --->创建DataNode存储目录 mkdir -p /usr/local/hadoop/hadoop_data/hdfs/datanode
  354. --->将目录所有者改成zieox sudo chown -R zieox:zieox /usr/local/hadoop
  355. --->格式化NameNode HDFS目录 hadoop namenode -format
  356. //启动Hadoop Multi Node Cluster
  357. --->分别启动HDFS & YARN start-dfs.sh start-yarn.sh start-all.sh
  358. ssh免密操作
  359. https://www.cnblogs.com/robert-blue/p/4133467.html
  360. //编辑固定IP地址
  361. scp id_rsa.pub zieox@master:/home/zieox
  362. cat id.dsa.pub >>~/.ssh/authorized_keys
  363. service sshd restart
  364. cd .ssh
  365. scp authorized_keys zieox@data1:/home/zieox
  366. scp authorized_keys zieox@data2:/home/zieox
  367. scp authorized_keys zieox@data3:/home/zieox
  368. http://master:8088/cluster
  369. sudo chown zieox:zieox *
  370. //Datanode启动不了 datanode的clusterID 和 namenode的clusterID 不匹配(master中把namenode的cluterID改的和datanodeID一致)
  371. https://www.cnblogs.com/artistdata/p/8410429.html
  372. sudo gedit /usr/local/hadoop/hadoop_data/hdfs/datanode/current/VERSION
  373. clusterID=CID-834686bf-02bf-4933-b2fb-0a2288e97cc9
  374. sudo gedit /usr/local/hadoop/hadoop_data/hdfs/namenode/current/VERSION
  375. ----------------------------------------------------------CP6 Hadoop HDFS Order
  376. //hdfs 常用命令
  377. http://hadoop.apache.org/docs/r1.0.4/cn/hdfs_shell.html
  378. -----------------------------------------HDFS基本命令-----------------------------------------
  379. hadoop fs-mkdir 创建hdfs目录
  380. hadoop fs-ls 列出hdfs目录
  381. hadoop fs-copyFromLocal 使用(copyFromLocal)复制本地文件到hdfs
  382. hadoop fs-put 使用(put)复制本地文件到hdfs
  383. hadoop fs-cat 列出hdfs目录下的文件内容
  384. hadoop fs-copyToLocal 使用(-copyToLocal)将hdfs上的文件复制到本地
  385. hadoop fs-get 使用(-get)将hdfs上的文件复制到本地
  386. hadoop fs-cp 复制hdfs文件
  387. hadoop fs-rm 删除hdfs文件
  388. ----------------------------------------------------------------------------------------------
  389. //创建hadoop目录
  390. hadoop fs -mkdir /user
  391. hadoop fs -mkdir /user/hduser
  392. hadoop fs -mkdir /user/hduser/test
  393. //hadoop: ls (list)
  394. hadoop fs -ls
  395. hadoop fs -ls /
  396. hadoop fs -ls /user
  397. hadoop fs -ls /user/hduser
  398. hadoop fs -ls -R /
  399. //创建多级目录 hadoop fs -mkdir -p /dir1/dir2/dir3
  400. //查看全部文件 hadoop fs -ls -R /
  401. ---------------------------------------------Output---------------------------------------------
  402. -rw-r--r-- 3 lufax supergroup 1366 2018-09-12 23:57 /README.txt
  403. drwxr-xr-x - lufax supergroup 0 2018-09-20 03:31 /dir1
  404. drwxr-xr-x - lufax supergroup 0 2018-09-20 03:31 /dir1/dir2
  405. drwxr-xr-x - lufax supergroup 0 2018-09-20 03:31 /dir1/dir2/dir3
  406. drwxr-xr-x - lufax supergroup 0 2018-09-20 03:23 /user
  407. drwxr-xr-x - lufax supergroup 0 2018-09-20 03:23 /user/hduser
  408. drwxr-xr-x - lufax supergroup 0 2018-09-20 03:23 /user/hduser/test
  409. -----------------------------------------------------------------------------------------------
  410. //上传文件至HDFS (copyFromLocal)
  411. hadoop fs -copyFromLocal /home/zieox/桌面/test1.txt /user/hduser/test
  412. hadoop fs -copyFromLocal /usr/local/hadoop/README.txt /user/hduser/test
  413. hadoop fs -copyFromLocal /user/local/hadoop/README.txt /user/hduser/test/test1.txt
  414. //打开查看文件 hadoop fs -cat /user/hduser/test/test1.txt |more
  415. //强制复制 hadoop fs -copyFromLocal -f /usr/local/hadoop/README.txt /user/hduser/test
  416. //查看目录 hadoop fs -ls /user/hduser/test
  417. // put:使用-put会直接覆盖文件 hadoop fs -put /usr/local/hadoop/README.txt /user/hduser/test/test1.txt
  418. //将屏幕上内容存储到HDFS文件 echo abc| hadoop fs -put - /user/hduser/test/echoin.txt
  419. //查看文件 hadoop fs -cat /user/hduser/test/echoin.txt
  420. //将本地目录的列表,存储到HDFS文件 ls /usr/local/hadoop | hadoop fs -put - /user/hduser/test/hadooplist.txt
  421. //将hdfs文件拷贝至本地 (copyToLocal)
  422. mkdir /home/zieox/桌面/test ~> cd /home/zieox/桌面/test ~> hadoop fs -copyToLocal /user/hduser/test/hadooplist.txt
  423. //将整个HDFS上的目录复制到本地计算机: hadoop fs -copyToLocal /user/hduser/test/ect
  424. //将HDFS上的文件复制到本地计算机 (get): hadoop fs -get /user/hduser/test/README.txt localREADME.txt
  425. ///复制与删除HDFS文件
  426. //在HDFS上创建测试目录 hadoop fs -mkdir /user/hduser/test/tmp
  427. //复制HDFS文件到HDFS测试目录 hadoop fs -cp /user/hduser/test/README.txt /user/hduser/test/tmp
  428. //查看测试目录 hadoop fs -ls /user/hduser/test/tmp
  429. //删除文件 hadoop fs -rm /user/hduser/test/test2.txt
  430. //删除文件目录 hadoop fs -rm -R /user/hduser/test
  431. ###///在Hadoop HDFS Web - Browse Directory http://master:50070
  432. ----------------------------------------------------------CP7 Hadoop MapReduce
  433. http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html
  434. //创建wordcount目录 mkdir -p ~/wordcount/input
  435. //进入wordcount文件夹 cd ~/wordcount
  436. //编辑java脚本 sudo gedit WordCount.Java
  437. //设置路径 sudo gedit ~/.bashrc
  438. --------------------------------------------------
  439. export PATH=${JAVA_HOME}/bin:${path}
  440. export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar
  441. --------------------------------------------------
  442. //生效bashrc source ~/.bashrc
  443. ***
  444. hadoop com.sun.tools.Javac.Main WordCount.Java
  445. jar wc.jar WordCount*.class
  446. //创建测试文本文件 cp /usr/local/hadoop/LICENSE.txt ~/wordcount/input
  447. ll ~/wordcount/input
  448. //hdfs操作
  449. hadoop fs -mkdir -p /user/hduser/wordcount/input
  450. hadoop fs -ls /user/hduser/wordcount/input
  451. cd ~/wordcount
  452. //运行WordCount程序
  453. hadoop jar wc.jar WordCount /user/hduser/wordcount/input/LICENSE.txt /user/hduser/wordcount/output
  454. //查看运行结果
  455. hadoop fs -ls /user/hduser/wordcount/output
  456. hadoop fs -cat /user/hduser/wordcount/output/part-r-00000|more
  457. ----------------------------------------------------------CP8 Installing Spark
  458. //启动hadoop服务
  459. 启动服务的脚本全部在 ./sbin 目录下
  460. start-all.sh 可以启动全部的服务,不建议!这样做!
  461. start-dfs.sh 启动DFS中的namenode 和 datanode 全部启动,不建议!这样做!
  462. //启动namenode:/usr/local/hadoop/sbin/hadoop-daemon.sh start namenode
  463. //启动datenode:/usr/local/hadoop/sbin/hadoop-daemon.sh start datanode
  464. //查看是否有datanode 的进程jps 然后就可以通过Web浏览器查看dfs了!http://localhost:50070
  465. //启动Yarn /usr/local/hadoop/sbin/start-yarn.sh
  466. -------------------------------------------------------------------------------------------------------------------
  467. 测试yarn
  468. /usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar pi 2 100
  469. //启动hdfs/usr/local/hadoop/sbin/start-dfs.sh
  470. bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.2.jar pi 2 100
  471. ssh-copy-id -i /root/.ssh/id_rsa.pub root@<lufax>
  472. -------------------------------------------------------------------------------------------------------------------
  473. //Install spark wget https://archive.apache.org/dist/spark/spark-2.3.1/spark-2.3.1-bin-hadoop2.6.tgz
  474. cd /home/lufax/下载
  475. tar -zxvf spark-2.3.1-bin-hadoop2.6.tgz -C /opt/spark/
  476. tar xzvf spark-2.3.1-bin-hadoop2.6.tgz
  477. sudo mv spark-2.3.1-bin-hadoop2.6 /usr/local/spark/
  478. gedit ~/.bashrc
  479. -----------------------------------
  480. export SPARK_HOME=/usr/local/spark
  481. export PATH=$PATH:$SPARK_HOME/bin
  482. -----------------------------------
  483. source ~/.bashrc
  484. spark-shell
  485. https://master:50070
  486. //设置spark-shell
  487. cd /usr/local/spark/conf
  488. cp log4j.properties.template log4j.properties
  489. sudo gedit log4j.properties
  490. -----------------set-----------------
  491. log4j.rootCategory=WARN
  492. -------------------------------------
  493. //本地运行spark
  494. //本地启动(3线程) spark-shell --master local[3]
  495. //读取本地文件 val tf = sctextFile("hdfs//master:9000/user/hduser/test/tes2.txt")
  496. //Spark读取与写入文件 https://blog.csdn.net/a294233897/article/details/80904305
  497. //在yarn上运行spark-shell
  498. SPARK_JAR=` /usr/local/spark/lib/jars/*.jar `
  499. HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
  500. MASTER=yarn-client
  501. /usr/local/spark/bin/spark-shell
  502. //用spark读取HDFS文件
  503. val tf = sc.textFile("hdfs://master:9000/user/hduser/test/README.txt")
  504. scala> tf.count --->res0: Long = 31
  505. http://172.22.230.18/
  506. //8.9 构建spark standalone cluster环境
  507. //复制模板文件来创建spark-env.sh
  508. cp /usr/local/spark/conf/spark-env.sh.template spark-env.sh
  509. sudo gedit spark-env.sh
  510. ---------------------------------------------------------------------------------------------
  511. export SPARK_MASTER_IP=master 设置master的IP或服务器名称
  512. export SPARK_WORKER_CORES=1 设置每个Worker使用的CPU核心
  513. export SPARK_WORKER_MEMORY=800m 设置每个Worker使用内存
  514. export SPARK_WORKER_INSTANCES=2 设置每个Worker实例
  515. ---------------------------------------------------------------------------------------------
  516. ssh data1
  517. sudo mkdir /usr/local/spark
  518. sudo chown -R zieox:zieox /usr/local/spark
  519. sudo scp -r /usr/local/spark zieox@data:/usr/local
  520. ssh data3
  521. ...
  522. sudo gedit /usr/local/spark/conf/slaves
  523. ------------
  524. data1
  525. data3
  526. ------------
  527. //启动:spark standalone cluster(启动works) /usr/local/spark/sbin/start-all.sh
  528. datanode=3
  529. SPARK_WORKER_INSTANCES=2
  530. worker=6
  531. //分别启动master&slaves
  532. //启动spark_master /usr/local/spark/sbin/start-master.sh
  533. //启动spark_slaves /usr/local/spark/sbin/start-slaves.sh
  534. //在Spark Standalone运行spark-shell程序 spark-shell --master spark://master:7077
  535. //查看Spark Standalone 的 Web_UI界面 http://master:8080/
  536. //读取本地文件
  537. val tf = spark.read.textFile("file:/home/zieox/桌面/tes2.txt")
  538. val tf = sc.textFile("file:/user/hduser/test/README.txt")
  539. //读取hdfs文件(确保works完全启动) val tf = spark.read.textFile("hdfs://master:9000/user/hduser/test/README.txt")
  540. --->scala> tf.count res0: Long = 31
  541. //停止Spark Stand alone cluster /usr/local/spark/sbin/stop-all.sh
  542. ----------------------------------------------------------CP9 Spark RDD
  543. spark-shell
  544. scala> val intRDD = sc.parallelize(List(1,2,3,1,3,2,3,4))
  545. --->intRDD: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:24
  546. scala> intRDD.collect() res0: Array[Int] = Array(1, 2, 3, 1, 3, 2, 3, 4)
  547. scala> val stringRDD=sc.parallelize(List("zieox","luccy","fucker"))
  548. --->stringRDD: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[1] at parallelize at <console>:24
  549. scala> stringRDD.collect() res1: Array[String] = Array(zieox, luccy, fucker)
  550. //map
  551. def addone(x:Int):Int = {return (x+1)}
  552. intRDD.map(addone).collect()
  553. intRDD.map(x=>x+1).collect()
  554. intRDD.map(_+1).collect()
  555. scala> stringRDD.map(x=>"name: "+x).collect --->res9: Array[String] = Array(name: zieox, name: luccy, name: fucker)
  556. //filter
  557. intRDD.filter(_<3).collect()
  558. scala> stringRDD.filter(x=>x.contains("eo")).collect() --->res10: Array[String] = Array(zieox)
  559. //distinct
  560. scala> intRDD.distinct().collect --->res11: Array[Int] = Array(4, 1, 2, 3)
  561. //randomSplit
  562. //以随机数的方式按照4:6的比例分割为两个RDD val sRDD=intRDD.randomSplit(Array(0.4,0.6))
  563. //groupBy
  564. val gDD = intRDD.groupBy(x=>{if (x%2==0) "even" else "odd"}).collect
  565. //9.3多个RDD“转换”运算
  566. val r1=sc.parallelize(List(1,2))
  567. val r2=sc.parallelize(List(3,4,5))
  568. val r3=sc.parallelize(List(1,6,7))
  569. //union (并集运算) r1.union(r2).union(r3).collect() & (r1++r2++r3).collect()
  570. //intersection (交集运算) r1.intersection(r3).collect
  571. //subtract (差集运算) r1.subtract(r2).collect() //在r1,不再r2
  572. //cartesian (笛卡尔积运算) r1.cartesian(r2).collect()
  573. //9.4 基本动作运算
  574. val r=sc.parallelize(List(1,2,3,4,5))
  575. scala> r.first --->res30: Int = 1
  576. scala> r.take(2) --->res32: Array[Int] = Array(1, 2)
  577. scala> r.takeOrdered(3) --->res33: Array[Int] = Array(1, 2, 3)
  578. scala> r.takeOrdered(3)(Ordering[Int].reverse) --->res34: Array[Int] = Array(5, 4, 3)
  579. --------基本统计--------
  580. r.stats 统计
  581. r.min 最小值
  582. r.max 最大值
  583. r.stdev 标准差
  584. r.count 计数
  585. r.sum 求和
  586. r.mean 平均
  587. ------------------------
  588. //9.5 RDD Key-Value 基本“转换”运算
  589. val kv=sc.parallelize(List((1,2),(1,3),(3,2),(3,4),(4,5)))
  590. kv.keys.collect()
  591. kv.filter{case (key,value)=>key<5}.collect
  592. //mapValues运算 kv.mapValues(x=>x*x).collect()
  593. //sortByKey (按key,从小到大排列) kv.sortByKey(true).collect()
  594. //reduceByKey kv.reduceByKey((x,y)=>x+y).collect //根据key进行reduce并对value执行func
  595. kv.reduceByKey(_+_).collect
  596. //9.6 多个RDD Key-Value “转换”运算
  597. val r1 = sc.parallelize(List((1,2),(1,3),(3,2),(3,4),(4,5)))
  598. val r2 = sc.parallelize(List((3,8)))
  599. scala> r1.join(r2).foreach(println) //join(by key) return(key with key's value)
  600. --->(3,(2,8))
  601. (3,(4,8))
  602. scala> r1.leftOuterJoin(r2).foreach(println) //join(by key-left_all) return (key's left_value-all)
  603. --->(1,(2,None))
  604. (1,(3,None))
  605. (3,(2,Some(8)))
  606. (4,(5,None))
  607. (3,(4,Some(8)))
  608. scala> r1.rightOuterJoin(r2).foreach(println) //join (by key-right_all) return (key's right_value-all)
  609. --->(3,(Some(2),8))
  610. (3,(Some(4),8))
  611. scala> r1.subtractByKey(r2).collect //join (exclude common_keys) return (keys) (join的逆反)
  612. --->res56: Array[(Int, Int)] = Array((4,5), (1,2), (1,3))
  613. //9.7 Key-Value “动作”运算
  614. val r1 = sc.parallelize(List((1,2),(1,3),(3,2),(3,4),(4,5)))
  615. scala> r1.first --->res0: (Int, Int) = (1,2)
  616. scala> r1.take(2) --->res2: Array[(Int, Int)] = Array((1,2), (1,3))
  617. scala> r1.first._1 --->res5: Int = 1
  618. //countByKey 根据key进行count
  619. scala> r1.countByKey --->res8: scala.collection.Map[Int,Long] = Map(4 -> 1, 1 -> 2, 3 -> 2)
  620. //collectAsMap 创建Key-Value对照表
  621. scala> r1.collectAsMap --->res11: scala.collection.Map[Int,Int] = Map(4 -> 5, 1 -> 3, 3 -> 4)
  622. //lookup运算(by_key)
  623. scala> r1.collect --->res16: Array[(Int, Int)] = Array((1,2), (1,3), (3,2), (3,4), (4,5))
  624. //9.8 Broadcast 广播变量
  625. //不使用广播变量的映射对照
  626. val kf = sc.parallelize(List((1,"apple"),(2,"banana"),(3,"orange")))
  627. scala> val kfm=kf.collectAsMap --->kfm: scala.collection.Map[Int,String] = Map(2 -> banana, 1 -> apple, 3 -> orange)
  628. scala> val id = sc.parallelize(List(2,1,3,1))
  629. scala> val fn = id.map(x=>kfm(x)).collect --->fn: Array[String] = Array(banana, apple, orange, apple)
  630. //使用广播变量
  631. val kf = sc.parallelize(List((1,"apple"),(2,"banana"),(3,"orange")))
  632. scala> val kfm=kf.collectAsMap --->kfm: scala.collection.Map[Int,String] = Map(2 -> banana, 1 -> apple, 3 -> orange)
  633. val bfm = sc.broadcast(kfm) //使用广播以节省内存与传送时间
  634. scala> val id = sc.parallelize(List(2,1,3,1))
  635. val fn = id.map(x=>bfm.value(x)).collect
  636. //9.9 accumulator 累加器
  637. val acc = sc.parallelize(List(2,1,3,1,5,1))
  638. val total = sc.accumulator(0.0)
  639. val num = sc.accumulator(0)
  640. acc.foreach(i=>{total+=1 ; num+=1})
  641. println("total=" + total.value + ",num=" + num.value)
  642. val avg = total.value/num.value
  643. //9.10 RDD Persistence持久化
  644. Spark RDD 持久化机制:可以用于将需要重复运算的RDD存储在内存中,以便大幅提升运算效率。
  645. ------------------------------------------------------------------------------------------------------------------------------------
  646. MEMORY_ONLY spark会将RDD对象以Java对象反串行化(序列化)在JVM的堆空间中,而不经过序列化处理。
  647. 如果RDD太大无法完全存储在内存中,多余的RDD partitions不会缓存在内存中,而是需要重新计算
  648. MEMORY_AND_DISK 尽量将RDD以Java对象反串行化在JVM的在内存中,如果内存缓存不下了,就将剩余分区缓存在磁盘中
  649. MEMORY_ONLY_SER 将RDD进行序列化处理(每个分区序列化为一个字节数组)然后缓存在内存中。
  650. 因为需要再进行反序列化,会多使用CPU计算资源,但是比较省内存的存储空间
  651. 多余的RDD partitions不会缓存在内存中,而是需要重新计算
  652. MEMORY_AND_DISK_SER 和MEMORY_ONLY_SER类似,多余的RDD partitions缓存在磁盘中
  653. DISK_ONLY 仅仅使用磁盘存储RDD的数据(未经序列化)
  654. MEMORY_ONLY_2,MEMORY_AND_DISK_2
  655. ------------------------------------------------------------------------------------------------------------------------------------
  656. RDD.persist
  657. RDD.unpersist
  658. val a = sc.parallelize(List(2,1,3,1,5,1))
  659. a.presist()
  660. //设置RDD.persist存储等级范例
  661. import org.apache.spark.storage.StorageLevel
  662. val intRDDMemoryAndDisk = sc.parallelize(List(2,1,3,1,5,1))
  663. intRDDMemoryAndDisk.persist(StorageLevel.MEMORY_AND_DISK)
  664. intRDDMemoryAndDisk.unpersist()
  665. //9.11 使用spark创建WordCount
  666. sudo gedit /home/zieox/桌面/tes.txt
  667. val tf = sc.textFile("file:/home/zieox/桌面/tes.txt") //读取
  668. val string = tf.flatMap(l=>l.split(" ")) //处理 (这里使用flatMap读取并创建stringRDD)
  669. val wordscount = string.map(w => (w,1)).reduceByKey(_+_) //计算每个单词出现的次数
  670. val wordscount = tf.flatMap(l=>l.split(" ")).map(w => (w,1)).reduceByKey(_+_)
  671. wordscount.saveAsTextFile("file:/home/zieox/桌面/output") //保存计算结果
  672. //9.12 Spark WordCount
  673. scala> tf.flatMap(i=>i.split(" ")).map((_,1)).collect
  674. --->res59: Array[(String, Int)] = Array((28,1), (march,1), (,,1), (its,1), (a,1), (raining,1), (day,1), (!,1))
  675. //WordsCount
  676. val wordscount=sc.textFile("file:/home/zieox/桌面/tes.txt").flatMap(l=>l.split(" ")).map((_,1)).reduceByKey(_+_).foreach(println)
  677. ----------------------------------------------------------CP10 Spark的集成开发环境
  678. //10.1 下载与安装eclipse Scala IDE
  679. 下载: http://scala-ide.org/
  680. 解压: tar -xzvf scala-SDK-4.7.0-vfinal-2.12-linux.gtk.x86_64.tar.gz
  681. 移动: mv eclipse /home
  682. 创建link:
  683. //10.2 下载项目所需的Library
  684. spark-assembly
  685. joda-Time
  686. jfreeChart
  687. jcommon
  688. //创建lib目录存放连接库 mkdir -p ~/workspace/Lib
  689. //复制spark.jar sudo cp /usr/local/spark/jars ~/workspace/Lib
  690. //进入jar下载网站 www.Java2s.com
  691. //下载joda-time
  692. cd ~/workspace/Lib ---> wegt http://www.java2s.com/Code/JarDownload/joda/joda-time-2.1.jar.zip ---> unzip -j joda-time-2.1.jar.zip
  693. //下载jfreechart
  694. wget http://www.java2s.com/Code/JarDownload/jfreechart/jfreechart.jar.zip ---> unzip -j jfreechart.jar.zip
  695. //下载saprk-core
  696. wget http://www.java2s.com/Code/JarDownload/spark/spark-core_2.9.2-0.6.1.jar.zip unzip -j spark-core_2.9.2-0.6.1.jar.zip
  697. wget http://www.java2s.com/Code/JarDownload/spark/spark-core_2.9.2-0.6.1-sources.jar.zip unzip -j spark-core_2.9.2-0.6.1-sources.jar.zip
  698. //下载jcommon
  699. wget http://www.java2s.com/Code/JarDownload/jcommon/jcommon-1.0.14.jar.zip
  700. wget http://www.java2s.com/Code/JarDownload/jcommon/jcommon-1.0.14-sources.jar.zip
  701. unzip -j jcommon-1.0.14-sources.jar.zip
  702. //删除zip文件以节省空间 rm *.zip
  703. //launch Eclipse
  704. //10.4 创建Spark项目 file -> new -> scala project -> addexternalJARs -> changeScalaVERSION
  705. //10.5 设置项目链接库(referenced Libraries --> build path --> configure build path...)
  706. //10.6 新建scala程序 new scala_object
  707. //10.7 创建WordCount测试文件
  708. mkdir -p ~/workspace/wordcount2/data
  709. cd ~/workspace/wordcount2/data
  710. cp /usr/local/hadoop/LICENSE.txt LICENSE.txt
  711. //10.8 创建WordCount.scala
  712. import org.apache.log4j.Logger
  713. import org.apache.log4j.Level
  714. import org.apache.spark.{SparkConf,SparkContext}
  715. import org.apache.spark.rdd.RDD
  716. import org.apache.hadoop.io.IOUtils;
  717. object wordcount {
  718. def main(args: Array[String]): Unit = {
  719. // 以這兩行設定不顯示 spark 內部的訊息
  720. Logger.getLogger("org").setLevel(Level.OFF)
  721. System.setProperty("spark.ui.showConsoleProgress", "false")
  722. // 清除 output folder
  723. println("執行RunWordCount")
  724. // 設定 application 提交到 MASTER 指向的 cluster 或是 local 執行的模式
  725. // local[4] 代表是在本地以 四核心的 CPU 執行
  726. val sc = new SparkContext(new SparkConf().setAppName("wordCount").setMaster("local[4]"))
  727. println("讀取文字檔...")
  728. val textFile = sc.textFile("hdfs://master:9000/user/hduser/test/README.txt")
  729. println("開始建立RDD...") // flapMap 是取出文字檔的每一行資料,並以 " " 進行 split,分成一個一個的 word
  730. // map 是將每一個 word 轉換成 (word, 1) 的 tuple
  731. // reduceByKey 會根據 word 這個 key,將後面的 1 加總起來,就會得到 (word, 數量) 的結果
  732. val countsRDD = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)
  733. println("儲存結果至文字檔...")
  734. try {countsRDD.saveAsTextFile("data/output") ; println("存檔成功")}
  735. catch {case e: Exception => println("輸出目錄已經存在,請先刪除原有目錄");}
  736. println("hello")
  737. }
  738. }
  739. 使用intellIDEA创建scalaProject https://www.cnblogs.com/luguoyuanf/p/19c1e4d88a094c07331e912f40ed46c7.html
  740. scalaProject maven配置
  741. ------------------------------------------------------------------------------------------------------------------------------
  742. <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  743. <modelVersion>4.0.0</modelVersion>
  744. <groupId>zieox</groupId>
  745. <artifactId>zieox</artifactId>
  746. <version>1.0-SNAPSHOT</version>
  747. <inceptionYear>2008</inceptionYear>
  748. <properties> <scala.version>2.11.4</scala.version> </properties>
  749. <repositories>
  750. <repository>
  751. <id>scala-tools.org</id>
  752. <name>Scala-Tools Maven2 Repository</name>
  753. <url>http://central.maven.org/maven2/</url>
  754. </repository>
  755. </repositories>
  756. <pluginRepositories>
  757. <pluginRepository>
  758. <id>scala-tools.org</id>
  759. <name>Scala-Tools Maven2 Repository</name>
  760. <url>https://mvnrepository.com/artifact</url>
  761. </pluginRepository>
  762. </pluginRepositories>
  763. <dependencies>
  764. <dependency>
  765. <groupId>org.scala-lang</groupId>
  766. <artifactId>scala-library</artifactId>
  767. <version>${scala.version}</version>
  768. </dependency>
  769. <dependency>
  770. <groupId>junit</groupId>
  771. <artifactId>junit</artifactId>
  772. <version>4.11</version>
  773. <scope>test</scope>
  774. </dependency>
  775. <dependency>
  776. <groupId>org.scala-lang</groupId>
  777. <artifactId>scala-reflect</artifactId>
  778. <version>${scala.version}</version>
  779. </dependency>
  780. <dependency>
  781. <groupId>org.scala-lang</groupId>
  782. <artifactId>scala-compiler</artifactId>
  783. <version>${scala.version}</version>
  784. </dependency>
  785. <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
  786. <dependency>
  787. <groupId>org.apache.spark</groupId>
  788. <artifactId>spark-core_2.11</artifactId>
  789. <version>2.4.0</version>
  790. </dependency>
  791. <!-- https://mvnrepository.com/artifact/joda-time/joda-time -->
  792. <dependency>
  793. <groupId>joda-time</groupId>
  794. <artifactId>joda-time</artifactId>
  795. <version>2.10.1</version>
  796. </dependency>
  797. <!-- https://mvnrepository.com/artifact/org.jfree/jcommon -->
  798. <dependency>
  799. <groupId>org.jfree</groupId>
  800. <artifactId>jcommon</artifactId>
  801. <version>1.0.23</version>
  802. </dependency>
  803. <!-- https://mvnrepository.com/artifact/org.jfree/jfreechart -->
  804. <dependency>
  805. <groupId>org.jfree</groupId>
  806. <artifactId>jfreechart</artifactId>
  807. <version>1.0.19</version>
  808. </dependency>
  809. </dependencies>
  810. <build>
  811. <sourceDirectory>src/main/scala</sourceDirectory>
  812. <testSourceDirectory>src/test/scala</testSourceDirectory>
  813. <plugins>
  814. <plugin>
  815. <groupId>org.scala-tools</groupId>
  816. <artifactId>maven-scala-plugin</artifactId>
  817. <executions>
  818. <execution>
  819. <goals>
  820. <goal>compile</goal>
  821. <goal>testCompile</goal>
  822. </goals>
  823. </execution>
  824. </executions>
  825. <configuration>
  826. <scalaVersion>${scala.version}</scalaVersion>
  827. <args>
  828. <arg>-target:jvm-1.5</arg>
  829. </args>
  830. </configuration>
  831. </plugin>
  832. <plugin>
  833. <groupId>org.apache.maven.plugins</groupId>
  834. <artifactId>maven-eclipse-plugin</artifactId>
  835. <configuration>
  836. <downloadSources>true</downloadSources>
  837. <buildcommands>
  838. <buildcommand>ch.epfl.lamp.sdt.core.scalabuilder</buildcommand>
  839. </buildcommands>
  840. <additionalProjectnatures>
  841. <projectnature>ch.epfl.lamp.sdt.core.scalanature</projectnature>
  842. </additionalProjectnatures>
  843. <classpathContainers>
  844. <classpathContainer>org.eclipse.jdt.launching.JRE_CONTAINER</classpathContainer>
  845. <classpathContainer>ch.epfl.lamp.sdt.launching.SCALA_CONTAINER</classpathContainer>
  846. </classpathContainers>
  847. </configuration>
  848. </plugin>
  849. </plugins>
  850. </build>
  851. <reporting>
  852. <plugins>
  853. <plugin>
  854. <groupId>org.scala-tools</groupId>
  855. <artifactId>maven-scala-plugin</artifactId>
  856. <configuration>
  857. <scalaVersion>${scala.version}</scalaVersion>
  858. </configuration>
  859. </plugin>
  860. </plugins>
  861. </reporting>
  862. </project>
  863. ------------------------------------------------------------------------------------------------------------------------------
  864. 用IDEA打jar包 https://www.cnblogs.com/blog5277/p/5920560.html
  865. //10.12 spark-submit 的详细介绍
  866. ------------------------------------------------------常用选项--------------------------------------------------------
  867. --master MASTER_URL 设置spark运行环境
  868. --driver-memory MEM Driver程序所使用的内存
  869. --executor-memory MEM executor程序所使用的内存
  870. --jars JARS 要运行的application会引用到的外部链接库
  871. --name NAME 要运行的application名称
  872. --class CLASS_NAME 要运行的application主要类名称
  873. ----------------------------------------------------------------------------------------------------------------------
  874. --master MASTER_URL设置选项
  875. local 在本地运行:只是用一个线程
  876. local[K] 在本地运行:使用K个线程
  877. local[*] 在本地运行:spark会尽量利用计算机上的多核CPU
  878. spark://HOST:PORT 在Spark Standalone Cluster 上运行,spark://master:7077
  879. mesos://HOST:PORT 在mesos cluster上运行(default_port:5050)
  880. yarn-client 在yarn-client上运行,必须设置HADOOP_CONF_DIR or YAEN_CONF_DIR环境变量
  881. yarn-cluster 在yarn-cluster上运行,必须设置HADOOP_CONF_DIR or YAEN_CONF_DIR环境变量
  882. ----------------------------------------------------------------------------------------------------------------------
  883. //10.13 在本地local模式运行WordCount程序
  884. 1.打jar包 https://blog.csdn.net/zrc199021/article/details/53999293
  885. 2.将jar放到usr/local/hadoop/bin 下
  886. 3.spark_submit
  887. spark-submit --driver-memory 2g --master local[4] --class WordCount /usr/local/hadoop/bin/zieoxscala.jar
  888. spark-submit
  889. --driver-memory 2g 设置dirver程序使用2G内存
  890. --master local[4] 本地运行使用4个线程
  891. --class WordCount 设置main类
  892. /usr/local/hadoop/bin/zieoxscala.jar jar路径
  893. //10.14 在hadoop yarn-client 运行wordcount程序
  894. //1.上传LICENSE.txt至HDFS
  895. hadoop fs -mkdir /data
  896. hadoop fs -copyFromLocal /home/zieox/桌面/LICENSE.txt data
  897. //2.修改错误标签
  898. sudo gedit ~/.bashrc
  899. 添加: export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
  900. 生效bashrc: source ~/.bashrc
  901. zieox@master:~/IdeaProjects/zieoxscala/out/artifacts/zieoxscala_jar$ cp zieoxscala.jar /usr/local/hadoop/bin
  902. spark-submit --driver-memory 2g --master local[4] --class wordcount --master yarn-client ~/IdeaProjects/zieoxscala/out/artifacts/zieoxscala_jar/zieoxscala.jar
  903. //查看执行结果 hadoop fs -cat /user/hduser/test/output/part-00000
  904. //IDEA项目远程调试hadoop入门(maven项目) https://www.cnblogs.com/zh-yp/p/7884084.html
  905. //10.15 在Spark Standalone Cluster 上运行wordcount程序
  906. spark-submit --driver-memory 2g --master local[4] --class wordcount --master spark://master:7077 ~/IdeaProjects/zieoxscala/out/artifacts/zieoxscala_jar/zieoxscala.jar
  907. ----------------------------------------------------------CP11 创建推荐引擎
  908. https://www.jianshu.com/p/b909d78f7d72
  909. mvn dependency:resolve -Dclassifier=sources
  910. https://www.cnblogs.com/flymercurial/p/7859595.html
  911. //mvlens数据集下载 https://grouplens.org/datasets/movielens/
  912. mkdir -p ~/workspace/recommend/data
  913. cd ~/workspace/recommend/data
  914. unzip -j ml-100k
  915. u.data 评分数据
  916. u.item 电影数据
  917. //11.5 使用spark-shell导入ml-100k
  918. //读取数据
  919. val rawData = sc.textFile("file:/home/zieox/workspace/recommend/data/u.data")
  920. rawData.first()
  921. rawData.take(5).foreach(println) //打印前5行
  922. rawData.map(_.split('\t')(1).toDouble).stats() //查看第二列统计信息
  923. //使用ALS.train进行训练
  924. import org.apache.spark.mllib.recommendation.ALS
  925. import org.apache.spark.mllib.recommendation.Rating
  926. val ratings = rawData.map(_.split('\t').take(3))
  927. val ratingRDD = ratings.map{case Array(user,movie,rating)=>Rating(user.toInt,movie.toInt,rating.toDouble)}
  928. //显示评分训练 p249
  929. ALS.train(ratings:RDD[Rating],rank:Int,iterations:Int,lambda:Double):MatrixFactorizationModel
  930. //Rating 数据源RDD,rank 原矩阵m*n 分解为 m*rank和rank*n矩阵,iterations 计算次数,lambda 建议值0.01,返回数据MatrixFactorizationModel
  931. //隐式训练
  932. ALS.trainlmplicit(ratings:RDD[Rating],rank:Int,iterations:Int,lambda:Double):MatrixFactorizationModel
  933. //进行显式训练
  934. val model = ALS.train(ratingsRDD,10,10,0.01)
  935. //11.8 使用模型进行推荐
  936. MatrixFactorizationModel.model.recommendProducts(user:Int,num:Int):Array[Rating] //使用模型推荐
  937. model.recommendProducts(196,5).mkString("\n") //针对用户推荐
  938. model.predict(196,464) //查看针对用户推荐评分 (查看向user:196 推荐464号产的推荐评分)
  939. MatrixFactorizationModel.model.recommendUsers(product,num) //针对产品推荐给用户
  940. model.recommendUsers(464,5).mkString("\n") //实际执行(针对电影推荐的用户)
  941. //11.9 显示推荐电影的名称
  942. val itemRDD = sc.textFile("file:/home/zieox/workspace/recommend/data/u.item") //导入文件item
  943. //创建ID_Movie对照表
  944. val movieTitle =
  945. itemRDD.map(line=>line.split("\\|") //处理 按照 ‘\\|’分列
  946. .take(2)) //取出前2个字段
  947. .map(array=>(array(0).toInt,array(1))) //对取出的array做处理,转化id为数字
  948. .collectAsMap() //创建ID_Movie对照表
  949. movieTitle.take(5).foreach(println) //查看前5条
  950. movieTitle(146) //查看某ID电影名称
  951. //显示推荐电影
  952. model.recommendProducts(196,5).map(rating => (rating.product,movieTitle(rating.product),rating.rating)).foreach(println))
  953. //11.10 创建Recommend项目P261
  954. https://www.cnblogs.com/flymercurial/p/7868606.html
  955. https://www.jianshu.com/p/61fac2245f1d
  956. import java.io.File
  957. import scala.io.Source
  958. import org.apache.log4j.Logger
  959. import org.apache.log4j.Level
  960. import org.apache.spark.SparkConf
  961. import org.apache.spark.SparkContext
  962. import org.apache.spark.SparkContext._
  963. import org.apache.spark.rdd
  964. import org.apache.spark.mllib.recommendation.{ALS,Rating,MatrixFactorizationModel}
  965. import scala.collection.immutable.Map
  966. object recommend {
  967. def PrepareData(): (RDD[Rating], Map[Int, String]) = {
  968. val sc = new SparkContext(new SparkConf().setAppName("Recommend").setMaster("local[4]"))
  969. print("开始读取用户评分数据中...")
  970. val rawUserData = sc.textFile("file:/home/zieox/workspace/recommend/data/u.data")
  971. val rawRatings = rawUserData.map(_.split("\t").take(3))
  972. val ratingsRDD = rawRatings.map{ case Array(user,movie,rating) => Rating(user.toInt,movie.toInt,rating.toDouble) }
  973. println("共计: "+ratingsRDD.count.toString()+"条 ratings")
  974. print("开始读取电影数据中...")
  975. val itemRDD = sc.textFile("file:/home/zieox/workspace/recommend/data/u.item")
  976. val movieTitle = itemRDD.map(line =>line.split("\\|").take(2)).map(array => (array(0).toInt,array(1))).collect().toMap
  977. val numRatings = ratingsRDD.count()
  978. val numUsers = ratingsRDD.map(_.user).distinct().count()
  979. val numMovies = ratingsRDD.map(_.product).distinct().count()
  980. println("共计: ratings:"+numRatings+" User "+numUsers+" Movie "+numMovies)
  981. return (ratingsRDD,movieTitle)
  982. }
  983. def RecommendMovies(model:MatrixFactorizationModel,movieTitle:Map[Int,String],inputUserID:Int) = {
  984. val RecommendMovie = model.recommendProducts(inputUserID,10)
  985. var i = 1
  986. println("针对用户id"+inputUserID+"推荐下列电影:")
  987. RecommendMovie.foreach{
  988. r => println(i.toString() + "." + movieTitle(r.product) + "评分: " + r.rating.toString())
  989. i += 1
  990. }
  991. }
  992. def RecommendUsers(model:MatrixFactorizationModel,movieTitle:Map[Int,String],inputMovieID:Int) = {
  993. val RecommendUser = model.recommendUsers(inputMovieID, 10)
  994. var i = 1
  995. println("针对电影 id" + inputMovieID + "电影名: " + movieTitle(inputMovieID.toInt) + "推荐下列用户id:" )
  996. RecommendUser.foreach{
  997. r => println(i.toString + "用户id:" + r.user + "评分:" + r.rating)
  998. i = i + 1
  999. }
  1000. }
  1001. def recommend(model:MatrixFactorizationModel,movieTitle:Map[Int,String]) = {
  1002. var choose = ""
  1003. while (choose != "3") {
  1004. print("请选择要推荐的类型 1.针对用户推荐电影 2.针对电影推荐感兴趣的用户 3.离开?")
  1005. choose = readLine()
  1006. if (choose == "1") {
  1007. print("请输入用户id?")
  1008. val inputUserID = readLine()
  1009. RecommendMovies(model,movieTitle,inputUserID.toInt)
  1010. }
  1011. else if (choose == "2") {
  1012. print("请输入电影的id?")
  1013. val inputMovieID = readLine()
  1014. RecommendUsers(model,movieTitle,inputMovieID.toInt)
  1015. }
  1016. }
  1017. }
  1018. def main(args:Array[String]) {
  1019. val (ratings,movieTitle) = PrepareData()
  1020. val model = ALS.train(ratings,5,20,0.1)
  1021. recommend(model,movieTitle)
  1022. }
  1023. }
  1024. //11.15 创建AlsEvaluation.scala 调校推荐引擎参数 http://blog.sina.com.cn/s/blog_1823e4e0f0102x0ov.html
  1025. import java.io.File
  1026. import scala.io.Source
  1027. import org.apache.log4j.{Level, Logger}
  1028. import org.apache.spark.mllib.recommendation.{ALS, MatrixFactorizationModel, Rating}
  1029. import org.apache.spark.rdd.RDD
  1030. import org.apache.spark.{SparkConf, SparkContext}
  1031. import org.joda.time.{DateTime, Duration}
  1032. import org.joda.time._
  1033. import org.joda.time.format._
  1034. import org.jfree.data.category.DefaultCategoryDataset
  1035. import org.apache.spark.mllib.regression.LabeledPoint
  1036. /* Created by Weipengfei on 2017/5/3 0003. ALS过滤算法调校参数 */
  1037. object AlsEvaluation {
  1038. /*设置日志及乱七八糟的配置*/
  1039. def SetLogger: Unit ={
  1040. System.setProperty("hadoop.home.dir", "/usr/local/hadoop")
  1041. Logger.getLogger("org").setLevel(Level.OFF)
  1042. Logger.getLogger("com").setLevel(Level.OFF)
  1043. System.setProperty("spark.ui.showConsoleProgress","false")
  1044. Logger.getRootLogger.setLevel(Level.OFF)
  1045. }
  1046. /*数据准备 @return 返回(训练数据,评估数据,测试数据)*/
  1047. def PrepareData():(RDD[Rating],RDD[Rating],RDD[Rating])={
  1048. val sc=new SparkContext(new SparkConf().setAppName("Recommend").setMaster("local[2]").set("spark.testing.memory","21474800000"))
  1049. //创建用户评分数据
  1050. print("开始读取用户评分数据中...")
  1051. val rawUserData=sc.textFile("file:/home/zieox/workspace/recommend/data/u.data")
  1052. val rawRatings=rawUserData.map(_.split("\t").take(3))
  1053. val ratingsRDD=rawRatings.map{case Array(user,movie,rating) => Rating( user.toInt ,movie.toInt,rating.toFloat)}
  1054. println("共计:"+ratingsRDD.count().toString+"条评分")
  1055. //创建电影ID和名称对应表
  1056. print("开始读取电影数据中...")
  1057. val itemRDD=sc.textFile("file:/home/zieox/workspace/recommend/data/u.item")
  1058. val moiveTitle=itemRDD.map(_.split("\\|").take(2)).map(array=>(array(0).toInt,array(1))).collect().toMap
  1059. //显示数据记录数
  1060. val numRatings=ratingsRDD.count()
  1061. val numUser=ratingsRDD.map(_.user).distinct().count()
  1062. val numMoive=ratingsRDD.map(_.product).distinct().count()
  1063. println("共计:评分"+numRatings+"条 用户"+numUser+"个 电影"+numMoive+"个")
  1064. //将数据分为三个部分并且返回
  1065. print("将数据分为:")
  1066. val Array(trainData,validationData,testData)=ratingsRDD.randomSplit(Array(0.8,0.1,0.1))
  1067. println("训练数据:"+trainData.count()+"条 评估数据:"+validationData.count()+"条 测试数据:"+testData.count()+"条")
  1068. (trainData,validationData,testData)
  1069. }
  1070. /*计算RMSE值 *@param model 训练模型 *@param validationData 评估数据 *@return RMSE值 */
  1071. def computeRmse(model: MatrixFactorizationModel, validationData: RDD[Rating]):(Double) ={
  1072. val num=validationData.count();
  1073. val predictedRDD=model.predict(validationData.map(r=>(r.user,r.product)))
  1074. val predictedAndVali=predictedRDD.map(p=>((p.user,p.product),p.rating)).join(validationData.map(r=>((r.user,r.product),r.rating))).values
  1075. math.sqrt(predictedAndVali.map(x=>(x._1-x._2)*(x._1-x._2)).reduce(_+_)/num)
  1076. }
  1077. /** 训练模型
  1078. * @param trainData 训练数据
  1079. * @param validationData 评估数据
  1080. * @param rank 训练模型参数
  1081. * @param numIterations 训练模型参数
  1082. * @param lambda 训练模型参数
  1083. * @return 模型返回的RMSE(该值越小,误差越小)值,训练模型所需要的时间
  1084. */
  1085. def trainModel(trainData: RDD[Rating], validationData: RDD[Rating], rank: Int, numIterations: Int, lambda: Double):(Double,Double)={
  1086. val startTime=new DateTime()
  1087. val model=ALS.train(trainData,rank,numIterations,lambda)
  1088. val endTime=new DateTime()
  1089. val Rmse=computeRmse(model,validationData)
  1090. val duration=new Duration(startTime,endTime)
  1091. println(f"训练参数:rank:$rank= 迭代次数:$numIterations%.2f lambda:$lambda%.2f 结果 Rmse $Rmse%.2f"+" 训练需要时间:"+duration.getMillis+"毫秒")
  1092. (Rmse,duration.getStandardSeconds)
  1093. }
  1094. /** 使用jfree.char评估单个参数,这里没有实现
  1095. * @param trainData 训练数据
  1096. * @param validationData 评估数据
  1097. * @param evaluateParameter 评估参数名称
  1098. * @param rankArray rank参数数组
  1099. * @param numIterationsArray 迭代次数参数数组
  1100. * @param lambdaArray lambda参数数组
  1101. */
  1102. def evaluateParameter(trainData:RDD[Rating],validationData:RDD[Rating],evaluateParameter:String,rankArray:Array[Int],numIterationsArray:Array[Int],lambdaArray:Array[Double])={
  1103. val dataBarChart = new DefaultCategoryDataset()
  1104. val dataLineChart = new DefaultCategoryDataset()
  1105. for(rank <- rankArray;numIterations <- numIterationsArray;lambda <- lambdaArray){
  1106. val (rmse,time) = trainModel(trainData,validationData,rank,numIterations,lambda)
  1107. val parameterData = evaluateParameter match{
  1108. case "rank" => rank;
  1109. case "numIterations" => numIterations;
  1110. case "lambda" => lambda
  1111. }
  1112. dataBarChart.addValue(rmse,evaluateParameter,parameterData.toString())
  1113. dataLineChart.addValue(time,"Time",parameterData.toString())
  1114. }
  1115. Chart.plotBarLineChart("ALS evaluations " + evaluateParameter,evaluateParameter,"RMSE",0.58,5,"Time",dataBarChart,dataLineChart)
  1116. }
  1117. /*
  1118. * 三个参数交叉评估,找出最好的参数组合
  1119. * @param trainData 训练数据
  1120. * @param validationData 评估数据
  1121. * @param rankArray rank参数数组
  1122. * @param numIterationsArray 迭代次数参数数组
  1123. * @param lambdaArray lambda参数数组
  1124. * @return 返回由最好参数组合训练出的模型
  1125. */
  1126. def evaluateAllParameter(trainData:RDD[Rating],validationData:RDD[Rating],rankArray:Array[Int],numIterationsArray:Array[Int],lambdaArray:Array[Double]): MatrixFactorizationModel = {
  1127. val evaluations=for(rank <- rankArray;numIterations <- numIterationsArray;lambda <- lambdaArray) yield {
  1128. val (rmse,time)=trainModel(trainData,validationData,rank,numIterations,lambda)
  1129. (rank,numIterations,lambda,rmse)
  1130. }
  1131. val Eval=(evaluations.sortBy(_._4))
  1132. val bestEval=Eval(0)
  1133. println("最佳模型参数:rank:"+bestEval._1+" 迭代次数:"+bestEval._2+" lambda:"+bestEval._3+" 结果rmse:"+bestEval._4)
  1134. val bestModel=ALS.train(trainData,bestEval._1,bestEval._2,bestEval._3)
  1135. (bestModel)
  1136. }
  1137. /*训练评估 *@param trainData 训练数据 *@param validationData 评估数据 *@return 返回一个最理想的模型 */
  1138. def trainValidation(trainData:RDD[Rating], validationData:RDD[Rating]):MatrixFactorizationModel={
  1139. println("------评估rank参数使用------")
  1140. evaluateParameter(trainData,validationData,"rank",Array(5,10,15,20,50,100),Array(10),Array(0.1))
  1141. println("------评估numIterations------")
  1142. evaluateParameter(trainData,validationData,"numIterations",Array(10),Array(5,10,15,2,25),Array(0.1))
  1143. println("------评估lambda------")
  1144. evaluateParameter(trainData,validationData,"lambda",Array(10),Array(10),Array(0.05,0.1,1,5,10.0))
  1145. println("------所有参数交叉评估找出最好的参数组合------")
  1146. val bestmodel=evaluateAllParameter(trainData,validationData,Array(5,10,15,20,50,100),Array(5,10,15,20,25),Array(0.01,0.05,0.1,1,5,10.0))
  1147. bestmodel
  1148. }
  1149. def main(args: Array[String]) {
  1150. SetLogger
  1151. println("========================数据准备阶段========================")
  1152. val (trainData,validationData,testData)=PrepareData()
  1153. trainData.persist();validationData.persist();testData.persist()
  1154. println("========================训练验证阶段========================")
  1155. val bestModel=trainValidation(trainData,validationData)
  1156. println("======================测试阶段===========================")
  1157. val testRmse=computeRmse(bestModel,testData)
  1158. println("使用bestModel测试testData,结果rmse="+testRmse)
  1159. }
  1160. }
  1161. import org.jfree.chart._
  1162. import org.jfree.data.xy._
  1163. import org.jfree.data.category.DefaultCategoryDataset
  1164. import org.jfree.chart.axis.NumberAxis
  1165. import org.jfree.chart.axis._
  1166. import java.awt.Color
  1167. import org.jfree.chart.renderer.category.LineAndShapeRenderer;
  1168. import org.jfree.chart.plot.DatasetRenderingOrder;
  1169. import org.jfree.chart.labels.StandardCategoryToolTipGenerator;
  1170. import java.awt.BasicStroke
  1171. object Chart {
  1172. def plotBarLineChart(Title: String, xLabel: String, yBarLabel: String, yBarMin: Double, yBarMax: Double, yLineLabel: String, dataBarChart : DefaultCategoryDataset, dataLineChart: DefaultCategoryDataset): Unit = {
  1173. //画出Bar Chart
  1174. val chart = ChartFactory
  1175. .createBarChart(
  1176. "", // Bar Chart 标题
  1177. xLabel, // X轴标题
  1178. yBarLabel, // Bar Chart 标题 y轴标题l
  1179. dataBarChart , // Bar Chart数据
  1180. org.jfree.chart.plot.PlotOrientation.VERTICAL,//画图方向垂直
  1181. true, // 包含 legend
  1182. true, // 显示tooltips
  1183. false // 不要URL generator
  1184. );
  1185. //取得plot
  1186. val plot = chart.getCategoryPlot();
  1187. plot.setBackgroundPaint(new Color(0xEE, 0xEE, 0xFF));
  1188. plot.setDomainAxisLocation(AxisLocation.BOTTOM_OR_RIGHT);
  1189. plot.setDataset(1, dataLineChart); plot.mapDatasetToRangeAxis(1, 1)
  1190. //画直方图y轴
  1191. val vn = plot.getRangeAxis(); vn.setRange(yBarMin, yBarMax); vn.setAutoTickUnitSelection(true)
  1192. //画折线图y轴
  1193. val axis2 = new NumberAxis(yLineLabel); plot.setRangeAxis(1, axis2);
  1194. val renderer2 = new LineAndShapeRenderer()
  1195. renderer2.setToolTipGenerator(new StandardCategoryToolTipGenerator());
  1196. //设置先画直方图,再画折线图以免折线图被盖掉
  1197. plot.setRenderer(1, renderer2);plot.setDatasetRenderingOrder(DatasetRenderingOrder.FORWARD);
  1198. //创建画框
  1199. val frame = new ChartFrame(Title,chart); frame.setSize(500, 500);
  1200. frame.pack(); frame.setVisible(true)
  1201. }
  1202. }
  1203. --------------------------------------------------------------------------
  1204. 最佳模型参数:rank:5 迭代次数:15 lambda:0.1 结果rmse:0.9178637704570859
  1205. ======================测试阶段===========================
  1206. 使用bestModel测试testData,结果rmse=0.9177717267204295
  1207. --------------------------------------------------------------------------
  1208. ----------------------------------------------------------CP12 StumbleUpon数据集
  1209. https://www.kaggle.com/c/stumbleupon/data
  1210. ----------------------------------------------------------CP13 决策树二元分类
  1211. object RunDecisionTreeBinary{
  1212. def main(args:Array[String]):Unit={
  1213. SetLogger()
  1214. val sc = new SparkContext(new SparkConf().setAppName("DecisionTreeBinary").setMaster("local[4]"))
  1215. println("==============preparing===============")
  1216. val (trainData,validationData,testData,categoriesMap)=PrepareData(sc)
  1217. trainDate.persist();validationData.persist();testData.persist();
  1218. println("==============evaluating==============")
  1219. val model=trainEvaluate(trainDate,validationData)
  1220. println("===============testing================")
  1221. val auc = evaluateModel(model,testData)
  1222. println("使用testdata测试的最佳模型,结果 AUC:"+auc)
  1223. println("============predicted data=============")
  1224. PredictData(sc,model,categoriesMap)
  1225. trainDate.Unpersist();validationData.Unpersist();testData.Unpersist();
  1226. }
  1227. }
  1228. def PrepareData(sc:SparkContext):(RDD[LabeledPoint],RDD[LabeledPoint],RDD[LabeledPoint],Map[String,Int])={
  1229. //1.导入并转换数据
  1230. print("开始导入数据...")
  1231. val rawDataWithHeader=sc.textFile("data/train.tsv")
  1232. val rawData=rawDataWithHeader.mapPartitionsWithIndex{(idx,iter)=> if (idx==0) iter.drop(1) else iter} //删除第一行表头
  1233. val lines=rawData.map(_.split("\t")) //读取每一行的数据字段
  1234. println("共计:"+lines.count.toString()+"条")
  1235. //2.创建训练评估所需数据的RDD[LabeledPoint]
  1236. val categoriesMap = lines.map(fields => fields(3)).distinct.collect.zipWithIndex.toMap //创建网页分类对照表
  1237. val LabeledPointRDD=lines.map{fields =>
  1238. val trFields=fields.map(_.replaceAll("\"",""))
  1239. val categoryFeaturesArray=Array.ofDim[Double](categoriesMap.size)
  1240. val categoryIdx=categoriesMap(fields(3))
  1241. categoryFeaturesArray(categoryIdx)=1
  1242. val numericalFeatures=trFields.slice(4,fields.size-1).map(d=>if (d=="?") 0.0 else d.toDouble)
  1243. val label = trFields(fields.size-1).toInt
  1244. LabeledPoint(label,Vectors.dense(categoryFeaturesArray++numericalFeatures))
  1245. }
  1246. //3.用随机方式将数据分为3个部分并返回
  1247. val Array(trainData,validationData,testData) = LabeledPointRDD.randomSplit(Array(0.8,0.1,0.1))
  1248. println("将数据分trainData: "+trainData.count() + " validationData:"+validationData.count()+" testData:"+testData.count())
  1249. return (trainData,validationData,testData,categoriesMap)
  1250. //训练评估阶段
  1251. def trainEvaluate(trainData:RDD[LabeledPoint],validationData:RDD[LabeledPoint]):DecisionTreeModel = {
  1252. print("开始训练...")
  1253. val (model,time) = trainModel(trainData,"entropy",10,10)
  1254. println("训练完成,所需时间:"+time+" 毫秒")
  1255. val AUC = evaluateModel(model,validationData)
  1256. println("评估结果 AUC = "+AUC)
  1257. return (model)
  1258. }
  1259. //训练DecisionTree模型Model
  1260. def trainModel(trainData:RDD[LabeledPoint],impurity:String,maxDepth:Int,maxBins:Int):(DescisionTreeModel,double)={
  1261. val startTime = new DateTime()
  1262. val model = DecisionTree.trainClassifier(trainData,2,Map[Int,Int](),impurity,maxDepth,maxBins)
  1263. val endTime=new DateTime()
  1264. val duration=new Duration(startTime,endTime)
  1265. (model,duration.getMillis())
  1266. }
  1267. //评估模型
  1268. def evaluateModel(model:DecisionTreeModel,validationData:RDD[LabeledPoint]):(Double)={
  1269. val secoundAndLabels=validationData.map{data=>
  1270. var predict=model.predict(data.features)
  1271. (predict,data.label)
  1272. }
  1273. val Metrics = new BinaryClassificationMetrics(scoreAndLabels)
  1274. val AUC = Metrics.areaUnderROC
  1275. (AUC)
  1276. }
  1277. //预测阶段
  1278. def PredictData(sc:SparkContext,model:DecisionTreeModel,categoriesMap:Map[String,Int]):Unit = {
  1279. //1.导入并转换数据
  1280. val rawDataWithHeader=sc.textFile("data/test.tsv")
  1281. val rawData = rawDataWithHeader.mapPartitionsWithIndex{ (idx,iter)=>if (idx==0) iter.drop(1) else iter}
  1282. val line = rawData.map(_.split("/t"))
  1283. println("共计: "+lines.count.toString()+" 条")
  1284. //2.创建训练评估所需数据RDD[LabeledPoint]
  1285. val dataRDD=lines.take(20).map{fields =>
  1286. val trFields = fields.map(_.replaceAll("\"",""))
  1287. val categoryFeaturesArray=Array.ofDim[Double](categoriesMap.size)
  1288. val categoryIdx=categoriesMap(fields(3))
  1289. categoryFeaturesArray(categoryIdx)=1
  1290. val numericalFeatures=trFields.slice(4,fields.size-1).map(d=>if (d=="?") 0.0 else d.toDouble)
  1291. val label = 0
  1292. //3.进行预测
  1293. val url = trFields(0)
  1294. val Features = Vectors.dense(categoryFeaturesArray++numericalFeatures)
  1295. val predictDesc = {predict match{ case 0 =>"暂时性网页"case 1 => "常青网页"}}
  1296. println("网址: "+url+"==>预测:"+predictDesc)
  1297. }
  1298. }
  1299. //评估模型
  1300. def evaluateModel(model:DecisionTreeModel,validationData:RDD[LabeledPoint]):(Double)={
  1301. val scoreAndLabels = validationData.map{data=>
  1302. var predict = model.predict(data.features)
  1303. (predict,data.label)
  1304. }
  1305. val Metrics = new BinaryClassificationMetrics(scoreAndLabels)
  1306. val AUC = Metrics.areaUnderROC
  1307. (AUC)
  1308. }
  1309. //预测模型
  1310. def PredictData(sc:SparkContext,model:DecisionTreeModel,categoriesMap:Map[String,Int]):Unit={
  1311. //1.导入并转换数据
  1312. val rawDataWithHeader = sc.textFile("data/train.tsv")
  1313. val rawData = rawDataWithHeader.mapPartitionsWithIndex{(idx,iter)=> if (idx == 0) iter.drop(1) else iter}
  1314. val lines = rawData.map(_.split("\t"))
  1315. println("共计:"+lines.count.toString()+"条")
  1316. //2.创建训练评估所需的数据RDD[LabeledPoint]
  1317. val dataRDD = lines.take(20).map{ fields =>
  1318. val trFields=fields.map(_.replaceAll("\"",""))
  1319. val categoryFeaturesArray=Array.ofDim[Double](categoriesMap.size)
  1320. val categoryIdx=categoriesMap(fields(3))
  1321. categoryFeaturesArray(categoryIdx)=1
  1322. val numericalFeatures=trFields.slice(4,fields.size).map(d=>if (d=="?") 0.0 else d.toDouble)
  1323. val label = 0
  1324. //3.进行预测
  1325. val url = trFields(0)
  1326. val Features = Vectors.dense(categoryFeaturesArray++numericalFeatures)
  1327. val predict = model.predict(Features).toInt
  1328. val predictDesc = {predict match{ case 0 =>"暂时性网页"case 1 => "常青网页"}}
  1329. println("网址: "+url+"==>预测:"+predictDesc)
  1330. }
  1331. }
  1332. //parametersTunning 参数调校函数
  1333. def parametersTunning(trainData:RDD[LabeledPoint],validationData:RDD[LabeledPoint]):DecisionTreeModel={
  1334. println("-------评估Impurity参数使用gini,entropy-------")
  1335. evaluateParameter(trainDta,validationData,"impurity",Array("gini","entropy"),Array(10),Array(10))
  1336. println("-------评估MaxDepth参数使用(3,5,10,15,20)-------")
  1337. evaluateParameter(trainDta,validationData,"MaxDepth",Array("gini"),Array(3,5,10,15,20),Array(10))
  1338. println("-------所有参数交叉评估找出最好的参数组合-------")
  1339. evaluateParameter(trainDta,validationData,,Array("gini","entropy"),Array(3,5,10,15,20))
  1340. return (bestModel)
  1341. }
  1342. //evaluateParameter 函数
  1343. def evaluateParameter(trainData:RDD[LabeledPoint],validationData:RDD[LabeledPoint],evaluateParameter:String,impurityArray:Array[String],maxdepthArray:Array[Int],maxBinsArray:Array[Int])={
  1344. var dataBarChart = new DefaultCategoryDataset()
  1345. var dataLineChart = new DefaultCategoryDataset()
  1346. for (impurity <- impurityArray; maxDepth <- maxdepthArray; maxBins <- maxBinsArray){
  1347. val (model,time)=trainModel(trainData,impurity,maxDepth,maxBins)
  1348. val auc = evaluateModel(model,validationData)
  1349. val parameterData = evaluateParameter match{
  1350. case "impurity"=>impurity;
  1351. case "maxDepth"=>maxDepth;
  1352. case "maxBins"=>maxBins
  1353. }
  1354. dataBarChart.addValue(auc,parameterData.toString())
  1355. dataLineChart.addValue(time,"Time",parameterData.toString())
  1356. }
  1357. Chart.plotBarLineChart("DecisionTree evaluations"+evaluateParameter,evaluateParameter,"AUC",0.58,0.7,"Time",dataBarChart,dataLineLineChart)
  1358. }
  1359. def evaluateAllParameter(trainData:RDD[LabeledPoint],validationData:RDD[LabeledPoint],impurityArray:Array[String],maxdepthArray:Array[Int],maxBinsArray:Array[Int]): DecisionTreeModel = {
  1360. val evaluationsArray =
  1361. for (impurity<-impurityArray;maxdepth<-maxdepthArray;maxBins<-maxBinsArray) yield {
  1362. val (model,time)=trainModel(trainData,impurity,maxDepth,maxBins)
  1363. val auc = evaluateModel(model,validationData)
  1364. (impurity,maxdepth,maxBins,auc)
  1365. }
  1366. val BestEval = (evaluationsArray.sortBy(_._4).reverse)(0)
  1367. println("调校后最佳参数:impurity:"+BestEval._1+" ,maxDepth:"+BestEval._2+" ,maxBins:"+BestEval._3)
  1368. return bestModel
  1369. }
  1370. ----------------------------------------------------------CP18 决策树回归分析
  1371. mkdir -p ~/workspace/Classfication/data
  1372. cd ~/workspace/
  1373. //Bike_Sharing数据
  1374. wget https://archive.ics.uci.edu/ml/machine-learning-databases/00275/Bike-Sharing-Dataset.zip
  1375. unzip -j Bike-Sharing-Dataset.zip
  1376. ----------------------------------------------------------CP19 使用 Apache Zeppelin 数据可视化
  1377. http://zeppelin.incubator.apache.org/
  1378. http://zeppelin.apache.org/
  1379. https://www.cnblogs.com/purstar/p/6294412.html
  1380. cd ~/下载
  1381. tar zxvf zeppelin-0.8.1-bin-all.tgz
  1382. sudo mv zeppelin-0.8.1-bin-all /usr/local/zeppelin
  1383. sudo chown -R zieox:zieox /usr/local/zeppelin
  1384. cd /usr/local/zeppelin/conf/
  1385. cp zeppelin-env.sh.template zeppelin-env.sh
  1386. cp zeppelin-site.xml.template zeppelin-site.xml
  1387. sudo gedit /usr/local/zeppelin/conf/zeppelin-env.sh
  1388. //配置zeppelin
  1389. ----------------------------------------------------------------------------
  1390. export JAVA_HOME=/usr/lib/jdk/jdk-8
  1391. export SPARK_HOME=/usr/local/spark
  1392. export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
  1393. export ZEPPELIN_INTP_JAVA_OPTS="-XX:PermSize=512M -XX:MaxPermSize=1024M"
  1394. ----------------------------------------------------------------------------
  1395. export SPARK_MASTER_IP=192.168.56.100
  1396. export SPARK_LOCAL_IP=192.168.56.100
  1397. ----------------------------------------------------------------------------
  1398. 在maven找到
  1399. cp jackson-annotations-2.4.4.jar /usr/local/zeppelin
  1400. cp jackson-core-2.4.4.jar /usr/local/zeppelin
  1401. cp jackson-databind-2.4.4.jar /usr/local/zeppelin
  1402. 替换
  1403. ./lib/jackson-annotations-2.5.0.jar
  1404. ./lib/jackson-core-2.5.3.jar
  1405. ./lib/jackson-databind-2.5.3.jar
  1406. //将这2个文件移至/usr/local/zeppelin/lib
  1407. mv jackson-annotations-2.4.4.jar jackson-core-2.4.4.jar jackson-databind-2.4.4.jar -t /usr/local/zeppelin/lib
  1408. //启动zeppelin /usr/local/zeppelin/bin/zeppelin-daemon.sh start
  1409. //登录zeppelin Web_UI界面 http://master:8181/#/
  1410. sudo mkdir -p /workspace/zeppelin/data
  1411. sudo mv ~/workspace/recommend/data/ml-100k.zip ml-100k.zip
  1412. sudo unzip -j ml-100k
  1413. //zeppelin 命令集
  1414. -------------------------------
  1415. %sh
  1416. ls -l /workspace/zeppelin/data
  1417. %sh
  1418. sudo head /workspace/zeppelin/data/u.user
  1419. -------------------------------
  1420. val usertext = sc.textFile("file:/home/zieox/workspace/recommend/data/u.user")
  1421. case class usertable(id:String,age:String,gender:String,occupation:String,zipcode:String)
  1422. val userRDD = usertext.map(s=>s.split("\\|")).map(s=>usertable(s(0),s(1),s(2),s(3),s(4)))
  1423. userRDD.toDF().registerTempTable("usertable")
  1424. println("input: "+userRDD.count+" s")
  1425. //在子目录需找名带jackson的文件 find .|grep jackson
  1426. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++CP10构建Hadoop集群 P280
  1427. //安装hadoop
  1428. 1. install java 安装java
  1429. 2. linux account可通linux帐号
  1430. 3. install Hadoop安装hadoop
  1431. 4. Set SSH设置无密匙SSH
  1432. 5. set Hadoop配置hadoop
  1433. 6. 格式化HDFS:hdfs namenode -format
  1434. 7. 启动和停止守护进程:
  1435. 启动守护进程:start-dfs.sh
  1436. 找到namenode主机名:hdfs getconf -namenodes / hdfs getconf -secondarynamenodes
  1437. 启动YARN守护进程:start-yarn.sh/stop-yarn.sh
  1438. 启动MapReduce守护进程:mr-jobhistory-daemon.sh start historyserver
  1439. 8. 创建用户目录:
  1440. hadoop fs -mkdir /user/username
  1441. hadoop fs -chown username:username /user/username
  1442. 设置容量空间:hdfs dfsadmin -setSpaceQuota 1T /user/us
  1443. hadoop配置:https://www.cnblogs.com/yinghun/p/6230436.html
  1444. //10.3.3 Hadoop守护进程的关键属性
  1445. 1.HDFS
  1446. dfs.datenode.date.dir 设定datenode存储数据块的目录列表
  1447. dfs.datenode.name.dir 编辑日志和文件系统映像 (支持namenode进行冗余备份)
  1448. dfs.datenode.checkpoint.dir 指定辅助文件系统的检查点目录
  1449. p295 HDFS守护进程关键属性
  1450. p297 Yarn守护进程关键属性
  1451. 2. Yarn
  1452. yarn.nodemanager.resource.memory-mb 设置内存分配量
  1453. hadoop -fs -expunge//删除已在回收站中超过最小时限的所有文件
  1454. //10.5 hadoop 基准测评程序 测试hadoop程序
  1455. p312 //使用TeraSort测评HDFS (基准测评)
  1456. //关于其他基准测评程序
  1457. TestSFSIO主要用于测试HDFSI/O性能
  1458. MRBench检验小型作业能否快速响应
  1459. NNBench测试namenode硬件的加载过程
  1460. Gridmix基准评测程序套装
  1461. SWIM用来为被测系统生成代表性的测试负载
  1462. TPCx-HS基于TeraSort的标准测评程序
  1463. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++CP11 管理hadoop
  1464. //进入和离开安全模式
  1465. hdfs dfsadmin [-safemode enter | leave | get | wait]
  1466. get:查看是否处于安全模式
  1467. wait:执行某条命令之前先退出安全模式
  1468. enter:进入安全模式
  1469. leave:离开安全模式
  1470. //启动审计日志
  1471. gedit hadoop-env.sh
  1472. export HDFS_AUDIT_LOGGER="INFO,RFAAUDIT"
  1473. //tools
  1474. //1.dfsadmin 命令
  1475. //Hadoop常用命令:
  1476. https://blog.csdn.net/suixinlun/article/details/81630902
  1477. https://blog.csdn.net/m0_38003171/article/details/79086780
  1478. 2.文件系统检查fsck工具hadoop fsck /
  1479. 3.datanode块扫描器:各个datanode运行一个块扫描器,定期检测节点上的所有块 dfs.datanode.scan.period.hopurs设置
  1480. 4.均衡器 balancer
  1481. 启动均衡器:start-balancer.sh
  1482. //设置日志级别
  1483. hadoop daemonlog -setlevel resource-manager-host:8088 \ org.apache.hadoop.yarn.server.resourcemanager DEBUG
  1484. //通过修改hadoop-env.sh配置 HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote.port=8804" 来使远程监控得以访问
  1485. //维护hadoop
  1486. 1.元数据备份:hdfs dfsadmin -fetchImage fsimage.backup
  1487. 2.数据备份:distcp
  1488. 3.文件系统检查:fsck
  1489. 4.文件系统均衡器
  1490. p332
  1491. //关于委任和解除节点
  1492. //1.委任新节点
  1493. 配置hdfs-site.xml指向namemode -> 配置 yarn-site.xml 文件,指向资源管理器 -> 启动datanode和资源管理器守护进程
  1494. //添加新节点步骤
  1495. 1. 将新节点IP地址添加到include文件中
  1496. 2. 运行 hdfs dfsadmin -refreshNodes,将审核过的一系列datanode集合更新至namenode信息
  1497. 3. 运行 yarn rmadmin -refreshNodes,将审核过的一系列节点管理器信息更新至资源管理器
  1498. 4. 以新节点更新slaves文件
  1499. 5. 启动新的datanode和节点管理器
  1500. 6. 检查新的datanode和节点管理器是否都出现在网页界面中
  1501. //2. 解除旧节点
  1502. 解除节点由exclude文件控制
  1503. //从集群中移除节点的步骤如下
  1504. 1. 将待解除的网络地址添加岛exclude文件中,不更新include文件
  1505. 2. 执行 hdfs dfsadmin -refreshNodes,使用一组新的审核过的datanode来更新namenode设置
  1506. 3. 执行 yarn rmsadmin -refreshNodes,使用一组新的审核过的节点管理器来更新资源管理器设置
  1507. 4. 转到网页界面,查看待解除datanode的管理状态是否已经变成“正在解除(Decommission In Progress)”
  1508. 5. 当所有datanode的状态变为“解除完毕”(Decommissioned)时,表明所有块都已经复制完毕
  1509. 6. 从include文件中移除这些节点
  1510. hdfs dfsadmin -refreshNodes
  1511. yarn rmsadmin -refreshNodes
  1512. 7. 从slaves文件中移除节点
  1513. //hadoop的升级
  1514. steps P334
  1515. start-dfs.sh -upgrade ...
  1516. //为python安装avro
  1517. easy_install avro
  1518. python3 ch-12-avro/src/main/py/write_pairs.py pairs.avro
  1519. //pig 宏
  1520. DEFINE max_by_group(X,groupby_key,max_field) RETURNS Y{
  1521. A = GROUP $X BY $group_key;
  1522. $Y=FOREACH A GENERATE group ,MAX($X.$max_field);
  1523. }
  1524. record= LOAD "lr/text.txt" AS (year:chararray,temperature:int,quality:int);
  1525. filter_records = FILTER records BY temperature != 9999 and quality IN(1,2,3)
  1526. macro_max_by_group_A = GROUP filtered_records by (year);
  1527. max_tmp = FOREACH macro_max_by_group_A GENERATE group,
  1528. MAX(filter_records.(temperature));
  1529. DUMP max_tmp
  1530. import "./max_tmp.macro"
  1531. ----------------------------------------------------------------------
  1532. linux系统进程 :gnome-system-monitor
  1533. //Parquet 是什么:
  1534. http://www.zhimengzhe.com/shujuku/other/21763.html
  1535. ---------------------------------------------------------------------- with hive
  1536. 查看linux版本:lsb_release -a
  1537. ----------------------------------------------------------------------hadoop with spark
  1538. sql中提取表
  1539. https://gitlab.com/matteo.redaelli/sqlu/-/blob/master/sqlu/__init__.py
  1540. gsp
  1541. http://www.sqlparser.com/download.php
  1542. //20200924
  1543. //IDEA git 添加项目
  1544. --ubuntu-
  1545. --ssh
  1546. ssh-keygen -t rsa => cd ~/.ssh => cat ~/.ssh/id_rsa.pub => <copytoGitLab>
  1547. idea - vcs - clone - >
  1548. //git发布流程
  1549. --dev
  1550. git拉取(pull)
  1551. =>文件放至git上的指定文件夹
  1552. =>git再次拉取
  1553. =>git提交(commit)
  1554. =>git同步(merge into current)
  1555. master文件夹同步:git拉取=>git合并=>git同步
  1556. git pull
  1557. new file
  1558. pull
  1559. commit
  1560. git-push

声明:本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:【wpsshop博客】
推荐阅读
相关标签
  

闽ICP备14008679号