赞
踩
需求:用脚本每天快速自动下载央视CCTV新闻联播,存为MP4文件,以便未能按时收看时学习跟进。
思路:
首先央视在这里每天会自动更新当天的新闻联播完整版:
https://tv.cctv.com/lm/xwlb/index.shtmlhttps://tv.cctv.com/lm/xwlb/index.shtmlshell 下 fetch 这个index.html 分析后,grep awk 一通 可得:
-
- fetch https://tv.cctv.com/lm/xwlb/index.shtml && cat index.shtml | grep alt= | grep href | grep shtml | awk -F href= '{print $2}' | awk '{print $1}' | head -1
- fetch: http://tv.cctv.com/lm/xwlb/index.shtml: size of remote file is not known
- index.shtml 31 kB 112 MBps 00s
- "https://tv.cctv.com/2022/10/09/VIDErRXFLeCOeS5Q03mANHiQ221009.shtml"
-
-
"https://tv.cctv.com/2022/10/09/VIDErRXFLeCOeS5Q03mANHiQ221009.shtml"
使用Windows 下的《央视视频下载器(China Red Commemorative Edition)》 软件,
https://www.jb51.net/softs/809665.html
可以直接下载得到1200码率的视频。
《新闻联播》_20221009_2100.mp4.
浏览器F12分析
https://tv.cctv.com/2022/10/09/VIDErRXFLeCOeS5Q03mANHiQ221009.shtml
解析得到 m3u8 位置在这里:
https://dh5.cntv.kcdnvip.com/asp/h5e/hls/main/0303000a/3/default/a0de2bcfaef048078be234f03bf020cc/main.m3u8?maxbr=2048&contentid=15120519184043
其中第一个 main.m3u8 是main.m3u8顶流适配(多码率适配)
第二个和第3\4 都是二级适配(单码率适配)
m3u8顶流适配的作用是跳转其他m3u8的地址,用来请求不同分辨率的ts文件
可以分析得到四个不同码率的流,分别是 2000、1200、850、450,分别对应超清、高清、标清和流畅画质。
已知
https://dh5.cntv.kcdnvip.com/asp/h5e/hls/main/0303000a/3/default/a0de2bcfaef048078be234f03bf020cc/main.m3u8?maxbr=2048&contentid=15120519184043
是加密的,如果直接下载会得到一个花屏画面的mp4,但也搜索找到了解决方案,就是:两个改动
1. 把域名中的dh5改为hls,
2. 链接中的h5e/ 去除
就可以得到不加密的的m3u8地址为:
https://dh5.cntv.myhwcdn.cn/asp/h5e/hls/main/0303000a/3/default/1995d2c63d574f41970d059945ee573a/main.m3u8?maxbr=2048&contentid=15120519184043
盘它:
ffmpeg -i "https://dh5.cntv.myhwcdn.cn/asp/h5e/hls/main/0303000a/3/default/1995d2c63d574f41970d059945ee573a/main.m3u8?maxbr=2048&contentid=15120519184043" -c copy -bsf:a aac_adtstoasc xwlbc.mp4
但是盘 main.m3u8 只能获得默认的 1200 分辨率版本。
- frame=45033 fps=3259 q=-1.0 Lsize= 257820kB time=00:30:01.31 bitrate=1172.5kbits/s speed= 130x
- video:249929kB audio:6460kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.558038%
这个好办,直接改盘2000.m3u8就可以下载2000超清版本:
- ffmpeg -i "https://hls.cntv.myhwcdn.cn/asp/hls/2000/0303000a/3/default/a0de2bcfaef048078be234f03bf020cc/2000.m3u8" -c copy -bsf:a aac_adtstoasc xwlb2k.mp4
-
- frame=44997 fps=365 q=-1.0 Lsize= 436935kB time=00:29:59.89 bitrate=1988.7kbits/s speed=14.6x
- video:407958kB audio:27547kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.328248%
试了一些不同月份的,有点规律:
https://hls.cntv.myhwcdn.cn/asp/hls/2000/0303000a/3/default/94dfb8525c114ee3966d6f2364d7e78c/2000.m3u8https://hls.cntv.kcdnvip.com/asp/hls/2000/0303000a/3/default/f7b2addd35094c648c70303f320ad161/2000.m3u8
https://hls.cntv.myhwcdn.cn/asp/hls/2000/0303000a/3/default/12e91d9fa3dc4baabb3099c7709c74e8/2000.m3u8
https://hls.cntv.myalicdn.com/asp/hls/2000/0303000a/3/default/f355b524fa464c7a829948ef090b3ed0/2000.m3u8
https://tv.cctv.com/2022/10/09/VIDErRXFLeCOeS5Q03mANHiQ221009.shtml
找到 对应的guid, guid=a0de2bcfaef048078be234f03bf020cc
其它固定直接拼出来就行了。
下载最新一期的新闻联播:
- cctvxwlb# cat /data/script/get_cctv_xwlb.sh
- #!/bin/sh
- mp4dir="/ftp/temp/cctvxwlb"
- #自动下载当天新闻联播
- #如果时间是新闻联播之前,则下载的是昨天的版本
- #所以脚本会安排在当天的凌晨或者下午20点以后比较合适
- hours=`date +%H`
- echo hours %hours
- if [ $hours -lt 20 ]
- then date=`date -v -1d +%Y%m%d`
- else
- date=`date -v -0d +%Y%m%d`
- fi
-
- echo date $date
- #下载https://tv.cctv.com/lm/xwlb/index.shtml
-
- fetch https://tv.cctv.com/lm/xwlb/index.shtml -o /tmp
-
- #分析index.shtml 找到VIDEVHzsa1SIW28YuoxFPFjP220809
-
- vid=`cat /tmp/index.shtml | grep alt= | grep href | grep shtml | awk -F href= '{print $2}' | awk '{print $1}' | head -1 | tr -d \"`
- echo vid $vid;
- vidfile=`echo $vid | awk -F "/" '{print $NF}'`
- echo vid file is: $vidfile ;
-
- #下载vid file
- fetch $vid -o /tmp
- #分析 vidfile $vid 找到 guid 和 title
-
- guids=`cat /tmp/$vidfile | grep guid | awk -F "=" '{print $2}' | head -1 | tr -d ";| |\"|'^M'" `
- `uid=`echo $guids | tr -d
- echo guid: $guid
- title=`cat /tmp/$vidfile | grep title | head -1 | awk -F "<|>" '{print $3'} | awk '{print $1$2}'`
- echo output title: $date.$title.mp4
-
- #拼接 https://hls.cntv.myhwcdn.cn/asp/hls/2000/0303000a/3/default/94dfb8525c114ee3966d6f2364d7e78c/2000.m3u8
- #域名有多个可以用, 比如myhwcdn.cn 和 kcdnvip.com
- myhwcdn="myhwcdn.cn"
- kcdnvip="kcdnvip.com"
- #alicdn 只能提供1200 码率的版本,删除 myalicdn="myalicdn.com"
-
- m3u8myhwcdn="https://hls.cntv.${myhwcdn}/asp/hls/2000/0303000a/3/default/${guid}/2000.m3u8"
- echo myhwcdn m3u8 file: $m3u8myhwcdn
-
- m3u8kcdnvip="https://hls.cntv.${kcdnvip}/asp/hls/2000/0303000a/3/default/${guid}/2000.m3u8"
- echo kcdnvip m3u8 file: $m3u8kcdnvip
-
- #m3u8myalicdn="https://hls.cntv.${myalicdn}/asp/hls/2000/0303000a/3/default/${guid}/2000.m3u8"
- #echo myalicdn m3u8 file: $m3u8myalicdn
-
- if [ `expr $date % 2` -eq 0 ];then
- echo oushu
- /usr/local/bin/ffmpeg -i "$m3u8myhwcdn" -c copy -bsf:a aac_adtstoasc $mp4dir/$date.$title.myhwcdn.1280x720p.mp4
- else
- echo jishu
- /usr/local/bin/ffmpeg -i "$m3u8kcdnvip" -c copy -bsf:a aac_adtstoasc $mp4dir/$date.$title.kcdnvip.1280x720p.mp4
- fi
-
- chmod 664 $mp4dir/*.mp4
- rm /tmp/index.shtml /tmp/VID*.shtml
- cctvxwlb# cat /data/script/get_cctv_xwlb_by_date.sh
- #!/bin/sh
- #判断第一个参数,如果没输入,则提示参数格式
- if [ -z $1 ];
- then echo "Missing Date";
- echo "Usage: $0 Date";
- echo " Example: $0 20221009";
- else
- #如果有参数,就开始查找指定日期的新闻联播
- #https://tv.cctv.com/lm/xwlb/day/20221003.shtml
-
- echo "...OK, Let's go!"
- echo "Inputed DateCode $1"
- date=`date -v -1d +%Y%m%d`
- date=$1
- echo date $date
-
- mp4dir="/ftp/temp/cctvxwlb"
-
- #https://tv.cctv.com/lm/xwlb/day/20221003.shtml
- dayurl=https://tv.cctv.com/lm/xwlb/day/$date.shtml
- echo $dayurl
- fetch $dayurl -o /tmp
-
- #分析$dayurl 找到VIDEVHzsa1SIW28YuoxFPFjP220809
-
- vid=`cat /tmp/$date.shtml | grep alt= | grep href | grep shtml | awk -F href= '{print $2}' | awk '{print $1}' | head -1 | tr -d \"`
- echo vid $vid;
- vidfile=`echo $vid | awk -F "/" '{print $NF}'`
- echo vid file is: $vidfile ;
-
- #下载vid file
- fetch $vid -o /tmp
- #分析 vidfile $vid 找到 guid 和 title
-
- guids=`cat /tmp/$vidfile | grep guid | awk -F "=" '{print $2}' | head -1 | tr -d ";| |\"|'^M'" `
- `uid=`echo $guids | tr -d
- echo guid: $guid
- title=`cat /tmp/$vidfile | grep title | head -1 | awk -F "<|>" '{print $3'} | awk '{print $1$2}'`
- echo output title: $date.$title.mp4
-
- #拼接 https://hls.cntv.myhwcdn.cn/asp/hls/2000/0303000a/3/default/94dfb8525c114ee3966d6f2364d7e78c/2000.m3u8
- #域名有多个可以用, 比如myhwcdn.cn 和 kcdnvip.com
- myhwcdn="myhwcdn.cn"
- kcdnvip="kcdnvip.com"
- myalicdn="myalicdn.com"
-
- m3u8myhwcdn="https://hls.cntv.${myhwcdn}/asp/hls/2000/0303000a/3/default/${guid}/2000.m3u8"
- echo myhwcdn m3u8 file: $m3u8myhwcdn
-
- m3u8kcdnvip="https://hls.cntv.${kcdnvip}/asp/hls/2000/0303000a/3/default/${guid}/2000.m3u8"
- echo kcdnvip m3u8 file: $m3u8kcdnvip
-
- m3u8myalicdn="https://hls.cntv.${myalicdn}/asp/hls/2000/0303000a/3/default/${guid}/2000.m3u8"
- echo myalicdn m3u8 file: $m3u8myalicdn
-
- if [ `expr $date % 3` -eq 0 ];then
- echo 0/3
- /usr/local/bin/ffmpeg -i "$m3u8myhwcdn" -c copy -bsf:a aac_adtstoasc $mp4dir/$date.$title.myhwcdn.1920x1080p.mp4
- elif [ `expr $date % 3` -eq 1 ];then
- echo 1/3
- /usr/local/bin/ffmpeg -i "$m3u8kcdnvip" -c copy -bsf:a aac_adtstoasc $mp4dir/$date.$title.kcdnvip.1920x1080p.mp4
- else
- echo 2/3
- /usr/local/bin/ffmpeg -i "$m3u8myalicdn" -c copy -bsf:a aac_adtstoasc $mp4dir/$date.$title.myalicdn.1920x1080p.mp4
- fi
-
- chmod 664 $mp4dir/*.mp4
- rm /tmp/$date.shtml /tmp/VID*.shtml
- fi;
某几天内的新闻联播:
- cctvxwlb# cat /data/script/get_cctv_xwlb_ndays.sh
- #!/bin/sh
- for i in $(seq 0803 0810)
- do
- date=`echo 20220$(expr $i \* 1)`;
- /data/script/get_cctv_xwlb_by_date.sh $date
- echo $date Done;
- done
效果图:
展示一下成果如下图所示:
鸣谢:
感谢lucida@CCF 在代码分析阶段提供的解题思路。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。