赞
踩
1.最好使用标准SQL函数而不是特定供应商Mysql等的函数以提高可移植性
2.大小写的运用,系统关键字小写,字段表名小写
3.灵活使用空格和缩进来增强可读性——两大法宝(空白隔道与垂直间距)
/* Spark Sql(offline) SQL ************************************************************************** ** 所属主题: 规范测试 ** 功能描述: 规范测试格式 ** Azkaban项目:CreateBaseData:xxx ** 创建者 : Mio ** 创建日期: 20190829 ************************************************************************** */ /*查找xxxtable 没有匹配的信息*/ with no_match as ( select a.sn from info_extend.view_dws_sn_sale_active a left join info_extend.dws_imei_sn b on b.date = from_unixtime(unix_timestamp() - 86400 * 1, 'yyyyMMdd') and a.sn = b.sn left join info.dim_sku s on a.sku = s.sku where a.first_active_time >= unix_timestamp("20190803000000", "yyyyMMddHHmmss") and a.first_active_time < unix_timestamp("20190804000000", "yyyyMMddHHmmss") /*只需求中国的*/ and a.sale_country_code = 'cn' and s.l2 in(195) and b.sn is null ) /*查询结果进行汇总*/ select sum(if(b.sale_country_code is not null, 1, 0)) c, sum(if(b.sale_country_code = 'cn', 1, 0)) cn_c, sum(if(b.sale_country_code is null, 1, 0)) no_c from no_match a left join info_extend.dws_imei_sn b on b.date = from_unixtime(unix_timestamp() - 86400 * 1, 'yyyyMMdd') and a.sn = b.sn;
1.SQL可读性比较高,每个with子句可以代表一段逻辑,可以在别的复杂sql中复用。
2. with子句结构清晰,每段with可以取个有意义的名称。
3.当你真正掌握with子句的时候,你会对它爱不释手。
4.同样的查询语句写了超过2次就可以考虑使用with来减少代码量。(同样的思维在其他编码中适用)
1.不使用with
/* Spark Sql(offline) SQL ************************************************************************** ** 所属主题: 规范测试 ** 功能描述: 规范测试格式 ** Azkaban项目:CreateBaseData:xxx ** 创建者 : Mio ** 创建日期: 20190829 ************************************************************************** */ /*查找xxxtable 没有匹配的信息*/ /*查询结果进行汇总*/ select sum(if(b.sale_country_code is not null, 1, 0)) c, sum(if(b.sale_country_code = 'cn', 1, 0)) cn_c, sum(if(b.sale_country_code is null, 1, 0)) no_c from ( select a.sn from info_extend.view_dws_sn_sale_active a left join info_extend.dws_imei_sn b on b.date = from_unixtime(unix_timestamp() - 86400 * 1, 'yyyyMMdd') and a.sn = b.sn left join info.dim_sku s on a.sku = s.sku where a.first_active_time >= unix_timestamp("20190803000000", "yyyyMMddHHmmss") and a.first_active_time < unix_timestamp("20190804000000", "yyyyMMddHHmmss") /*只需求中国的*/ and a.sale_country_code = 'cn' and s.l2 in(195) and b.sn is null )
2.使用with语句
/* Spark Sql(offline) SQL ************************************************************************** ** 所属主题: 规范测试 ** 功能描述: 规范测试格式 ** Azkaban项目:CreateBaseData:xxx ** 创建者 : Mio ** 创建日期: 20190829 ************************************************************************** */ /*查找xxxtable 没有匹配的信息*/ with no_match as ( select a.sn from info_extend.view_dws_sn_sale_active a left join info_extend.dws_imei_sn b on b.date = from_unixtime(unix_timestamp() - 86400 * 1, 'yyyyMMdd') and a.sn = b.sn left join info.dim_sku s on a.sku = s.sku where a.first_active_time >= unix_timestamp("20190803000000", "yyyyMMddHHmmss") and a.first_active_time < unix_timestamp("20190804000000", "yyyyMMddHHmmss") /*只需求中国的*/ and a.sale_country_code = 'cn' and s.l2 in(195) and b.sn is null ) /*查询结果进行汇总*/ select sum(if(b.sale_country_code is not null, 1, 0)) c, sum(if(b.sale_country_code = 'cn', 1, 0)) cn_c, sum( if(b.sale_country_code is null, 1, 0)) no_c from no_match a left join info_extend.dws_imei_sn b on b.date = from_unixtime(unix_timestamp() - 86400 * 1, 'yyyyMMdd') and a.sn = b.sn;
代码头部添加主题、功能描述、作者和日期等信息,并预留修改日志及标题栏,以便后续添加修改记录。注意每一行不超过80个字符。模板如下:
/* Spark Sql(offline) SQL
**************************************************************************
** 所属主题: 规范测试
** 功能描述: 规范测试格式
** Azkaban项目:CreateBaseData:xxx
** 创建者 : Mio
** 创建日期: 20190829
**************************************************************************
*/
/*计算xxx的id限制xxx*/ with patient_data as ( select patient_id, patient_name, hospital, drug_dosage from hospital_registry where (last_visit > now() - interval '14 days' or last_visit is null) and city = "Los Angeles" ), /*与xxx join 并且xxx 等于xxx*/ average_dosage AS ( select hospital, avg(drug_dosage) as average from patient_data group by hospital ) /*得出结果 或者插入xxx table*/ select count(hospital) from average_dosage;
• 每条SQL语句均应添加注释说明。
• 每条SQL语句的注释单独成行、放在语句前面。
• 字段注释紧跟在字段后面。
• 对不易理解的分支条件表达式加注释。
• 对重要的计算应说明其功能。
• 过长的函数实现,应将其语句按实现的功能分段加以概括性说明。
• 常量及变量注释时,应注释被保存值的含义(必须),合法取值的范围(可选)。
• 注释里面不要有分号(;)会被解释器误解为一个语句的结束,导致报错
1.比如有一个比较大的sql 需要多次join 数据。
2.在每次join的时候又存在不同的逻辑且逻辑相对比较复杂。
3.在where 条件里面有很多子查询得限制。
没有使用with:
select count(distinct w.waybill_id) as `妥投单量`, o.country as country_id, w.mihome as mihome_id, o.order_from as order_from, m.mihome_name as mihome_name, n.name as name, ad.country_name, date_format( from_utc_timestamp( to_utc_timestamp( from_unixtime(w.signed_time, 'yyyy-MM-dd HH:mm:ss'), 'Asia/Singapore' ), t.timezoneCode ), 'yyyy-MM' ) as `月` from ods_xmshop_xm_order o left join ods_xmshop_xm_waybill w on o.order_id = w.order_id left join info_excel.excel_IntlLOG_timezone as t on o.country = t.country_id left join info_excel.excel_IntlLOG_mihome AS m ON w.mihome = m.mihome_id AND is_active = 1 left join info.ods_xmwms3_xm_mihome as n on n.home_id = w.mihome left join dim_area as ad on ad.country_id = o.country where w.stock_status = 1 and w.push_stat & 2 = 2 and w.status IN (6000, 6001, 6002) and o.sales_type != 53 and signed_time >= 1546272000 GROUP BY o.country, w.mihome, date_format( from_utc_timestamp( to_utc_timestamp( from_unixtime(w.signed_time, 'yyyy-MM-dd HH:mm:ss'), 'Asia/Singapore' ), t.timezoneCode ), 'yyyy-MM' ), o.order_from, m.mihome_name, n.name, ad.country_name order by date_format( from_utc_timestamp( to_utc_timestamp( from_unixtime(w.signed_time, 'yyyy-MM-dd HH:mm:ss'), 'Asia/Singapore' ), t.timezoneCode ), 'yyyy-MM' ) desc limit 30;
使用with 拆分
/* Spark Sql(offline) SQL ************************************************************************** ** 所属主题: 规范测试 ** 功能描述: 规范测试格式 ** Azkaban项目:CreateBaseData:xxx ** 创建者 : Mio ** 创建日期: 20190829 ************************************************************************** */ /*订单数据关联运单数据*/ with xm_order_waybill_data as ( select o.order_from as order_from, o.country as country_id, w.mihome as mihome_id, o.country as country, w.signed_time, from ods_xmshop_xm_order o left join ods_xmshop_xm_waybill w on o.order_id = w.order_id /*限制库存状态.XXXX*/ where w.stock_status = 1 and w.push_stat & 2 = 2 and w.status IN (6000, 6001, 6002) and o.sales_type != 53 ), /*关联时区得数据*/ join_IntlLOG_timezone as ( select order_from, country_id, mihome_id, country, date_format( from_utc_timestamp( to_utc_timestamp( from_unixtime(w.signed_time, 'yyyy-MM-dd HH:mm:ss'), 'Asia/Singapore' ), b.timezoneCode ), 'yyyy-MM' ) as dt from xm_order_waybill_data a left join excel_IntlLOG_timezone b on a.country = b.country_id ), /*因为时区得多维需要再次join*/ join_IntlLOG_mihome as ( select order_from, country_id, mihome_id, country, dt m.mihome_name as mihome_name from join_IntlLOG_timezone a left join info_excel.excel_IntlLOG_mihome b on a.mihome = b.mihome /*限制=1*/ where is_active = 1 ), /*关联mihome的数据*/ join_xm_mihome as ( select order_from, country_id, mihome_id, country, dt, mihome_name n.name as name from join_IntlLOG_mihome a left join info.ods_xmwms3_xm_mihome b on a.home_id = b.mihome ) /*得出结论 需要去重统计单量*/ select order_from, country_id, mihome_id, country, dt, mihome_name, name, ad.country_name as country_name count(distinct w.waybill_id) as `妥投单量` from join_xm_mihome a left join dim_area b on country = b.country_id group by order_from, country_id, mihome_id, country, dt, mihome_name, name, ad.country_name
拆分优势
1.更容易控制和检查错误,定位错误更加快速。
2.一次构建一个查询步骤是SQL开发的最佳方案
3.结构清晰,注释清晰。
(多层嵌套尽量用with的语法进行拆分)
关于多层次嵌套子查询的简单例子
select o.ymd `日期`, s.project_name `项目ID`, concat(s.ddr_capacity, '+', s.emmc_capacity) `MEM`, s.color, o.sale_channel_2 `渠道2ID`, v.price, sum(o.out_num) `净出库数`, sum(o.active_num) `激活数`, sum(o.diff_country_active_num) `流出激活数` from ( select o.day ymd, s.sku, o.country_code sale_country_code, cc.country_name sale_country_name, v.sale_channel_1, v.sale_channel_name_1, v.sale_channel_2, v.sale_channel_name_2, sum(o.sale_num) sale_num, sum(o.outbound_all_num) outbound_num, sum(o.outbound_num) out_num, sum(o.signed_num) signed_num, sum(o.sale_money) sale_money, sum(o.outbound_all_money) outbound_money, sum(o.outbound_money) out_money, sum(o.signed_money) signed_money, 0 active_num, 0 diff_country_active_num from info.ods_xmrdata_gss_channel_data o inner join info.ods_xmrbasedata_sku_attribute s on o.goods_id = s.goods_id left join ( select area_code country_code, first(country_name) country_name from info.dim_area group by area_code ) cc on o.country_code = cc.country_code left join info.dim_sales_channel v on o.channel_2_id = v.sale_channel_2 where o.goods_id > 0 and ( (o.country_code <> 'cn') or ( o.country_code = 'cn' and o.channel_1_id not in (0, 67) and o.channel_2_id not in (0, 87) and o.channel_id not in (0, 272) ) ) and o.day >= '20170101' and o.day <= from_unixtime(unix_timestamp(), 'yyyyMMdd') group by o.day, s.sku, o.country_code, cc.country_name, v.sale_channel_1, v.sale_channel_name_1, v.sale_channel_2, v.sale_channel_name_2 union all select a.day ymd, a.sku, a.sale_country_code, a.sale_country_name, a.sale_channel_1, a.sale_channel_name_1, a.sale_channel_2, a.sale_channel_name_2, 0 sale_num, 0 outbound_num, 0 out_num, 0 signed_num, 0 sale_money, 0 outbound_money, 0 out_money, 0 signed_money, sum(a.active_num) active_num, sum( if( a.sale_country_code != a.active_country_code, a.active_num, 0 ) ) diff_country_active_num from info.dws_active_aggregration a where a.date = from_unixtime(unix_timestamp() - 86400 * 1, 'yyyyMMdd') and a.active_num != 0 and a.sku != '' and ( (a.sale_country_code <> 'cn') or ( a.sale_country_code = 'cn' and a.sale_channel_1 not in (0, 67) and a.sale_channel_2 not in (0, 87) and a.sale_channel_3 not in (0, 272) and a.imei_type = 1 ) ) and a.day >= '20170101' and a.day <= from_unixtime(unix_timestamp() - 86400 * 1, 'yyyyMMdd') group by a.day, a.sku, a.sale_country_code, a.sale_country_name, a.sale_channel_1, a.sale_channel_name_1, a.sale_channel_2, a.sale_channel_name_2 ) o inner join info.ods_xmrbasedata_sku_attribute s on o.sku = s.sku left join info_excel.excel_daily_manual_price v on o.sku = v.sku left join ( select area_code country_code, first(country_name) country_name from info.dim_area group by area_code ) c on s.sale_region = c.country_code where s.goods_id > 0 and s.sku != '' and o.sale_country_code = 'cn' group by o.ymd, s.project_name, concat(s.ddr_capacity, '+', s.emmc_capacity), s.color, o.sale_channel_2, v.price limit 20
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。