赞
踩
项目中有一张维表,维护的是历史节假日工作日的信息,估计在很多场合都有类似的需求。到了新年,需要生成新一年的数据,下面看看如何在维表中插入新一年的数据。
根据国务院发布的休假信息,查询对应的节假日信息,并做保存。
val holidays = Map("20230101" -> "元旦", "20230102" -> "元旦", "20230121" -> "春节", "20230122" -> "春节",
"20230123" -> "春节", "20230124" -> "春节", "20230125" -> "春节", "20230126" -> "春节", "20230127" -> "春节",
"20230405" -> "清明",
"20230429" -> "五一", "20230430" -> "五一", "20230501" -> "五一", "20230502" -> "五一", "20230503" -> "五一",
"20230622" -> "端午", "20230623" -> "端午", "20230624" -> "端午",
"20230929" -> "国庆", "20230930" -> "国庆", "20231001" -> "国庆", "20231002" -> "国庆",
"20231003" -> "国庆", "20231004" -> "国庆", "20231005" -> "国庆", "20231006" -> "国庆"
)
目前国内的节假日,一般都会有原本是周末的日期进行调休凑假期,同样也是根据国务院发布的信息,查询出对应的日期,并做保存。
val workdays = Set("20230128", "20230129", "20230423", "20230506", "20230625", "20231007", "20231008")
根据维表的信息(已经在数据库中建表,根据相应的表结构来确定case class结构),新建case class。
case class DimHolidayInfo(date: Int, date_s: String, is_holiday: Byte, holiday_name: String, is_workday: Byte, day_of_week: Byte, day_of_week_c: String);
val weekdays : Map[String, Byte] = Map(
"星期一" -> 1,
"星期二" -> 2,
"星期三" -> 3,
"星期四" -> 4,
"星期五" -> 5,
"星期六" -> 6,
"星期日" -> 7)
def GenData(spark: SparkSession, output: String) = {
import spark.implicits._
val (startdate, enddate) = ("20230101", "20231231")
val dateset = TimeUtils.genYmdSet(startdate, enddate)
val result: ArrayBuffer[DimHolidayInfo] = ArrayBuffer()
for(each <- dateset) {
val date = each.toInt
val (year, month, day) = (each.substring(0, 4), each.substring(4, 6), each.substring(6, 8))
val date_s = Array(year, month, day).mkString("-")
val isholiday: Byte = if (holidays.contains(each)) 1 else 0
val holidayname = if (holidays.contains(each)) holidays.getOrElse(each, "") else ""
val isworkday = isWorkDay(each)
val dayofweekc = TimeUtils.getWeekDay(each)
val dayofweek: Byte = if (weekdays.contains(dayofweekc)) weekdays.getOrElse(dayofweekc, -1) else -1
val obj = DimHolidayInfo(date, date_s, isholiday, holidayname, isworkday, dayofweek, dayofweekc)
result.append(obj)
}
spark.sparkContext.parallelize(result, 1).toDF()
.write
.mode("overwrite")
.parquet(output)
}
其中,TimeUtils.genYmdSet(startdate, enddate)生成一整年的时间序列,TimeUtils.getWeekDay(each)生成是星期几。
代码最后保存为parquet数据类型。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。