当前位置:   article > 正文

【随记】Flink 时间窗口的起始时间_flink 获取每个窗口起始时间

flink 获取每个窗口起始时间

话不多说,直接上手今天的主题,探索一个容易让人忽略和困惑的问题:Flink 时间窗口的起始时间

就以最简单的demo为例:

timeWindow(Time.seconds(5))

       上述定义一个步长为5s的滚动窗口,就以这个简单的入口进入Flink的源码开始探索

1)timeWindow的定义

  1. public WindowedStream<T, KEY, TimeWindow> timeWindow(Time size) {
  2. if (environment.getStreamTimeCharacteristic() == TimeCharacteristic.ProcessingTime) {
  3. return window(TumblingProcessingTimeWindows.of(size));
  4. } else {
  5. return window(TumblingEventTimeWindows.of(size));
  6. }
  7. }

       这段源码比较贴近大众,就是一个普通的判断,而且environment.getStreamTimeCharacteristic()这个东西我们再熟悉不过了,判断当前是ProcessingTime还是EventTime,当然除了EventTime还有IngestionTime,但是比较常用的还是ProcessingTime和EventTime,所以我们就非ProcessingTime即EventTime这样理解,因为生产环境比较常用的是EventTime,所以我们就进入else的代码继续查看

2)TumblingEventTimeWindows的定义

window(TumblingEventTimeWindows.of(size))这段代码,window利用TumblingEventTimeWindows来分配元素,所以我们要了解的核心是TumblingEventTimeWindows.of(size)的定义

  1. public static TumblingEventTimeWindows of(Time size) {
  2. return new TumblingEventTimeWindows(size.toMilliseconds(), 0);
  3. }
  4. protected TumblingEventTimeWindows(long size, long offset) {
  5. if (Math.abs(offset) >= size) {
  6. throw new IllegalArgumentException
  7. ("TumblingEventTimeWindows parameters must satisfy abs(offset) < size");
  8. }
  9. this.size = size;
  10. this.offset = offset;
  11. }

可以看到通过of方法我们构建了一个offset为0,size为5的TumblingEventTimeWindows对象,然后就是我们需要的核心方法,assignWindows,窗口分配元素的核心方法

  1. @Override
  2. public Collection<TimeWindow> assignWindows(Object element, long timestamp, WindowAssignerContext context) {
  3. if (timestamp > Long.MIN_VALUE) {
  4. // Long.MIN_VALUE is currently assigned when no timestamp is present
  5. long start = TimeWindow.getWindowStartWithOffset(timestamp, offset, size);
  6. return Collections.singletonList(new TimeWindow(start, start + size));
  7. } else {
  8. throw new RuntimeException(
  9. "Record has Long.MIN_VALUE timestamp (= no timestamp marker). " +
  10. "Is the time characteristic set to 'ProcessingTime',
  11. or did you forget to call " +
  12. "'DataStream.assignTimestampsAndWatermarks(...)'?");
  13. }
  14. }

重点来了

long start = TimeWindow.getWindowStartWithOffset(timestamp, offset, size);
  1. /**
  2. * Method to get the window start for a timestamp.
  3. *
  4. * @param timestamp epoch millisecond to get the window start.
  5. * @param offset The offset which window start would be shifted by.
  6. * @param windowSize The size of the generated windows.
  7. * @return window start
  8. */
  9. public static long getWindowStartWithOffset(long timestamp, long offset, long windowSize) {
  10. return timestamp - (timestamp - offset + windowSize) % windowSize;
  11. }

Method to get the window start for a timestamp.翻译过来就是这个方法用来获取窗口的开始时间戳

核心算法就是

timestamp - (timestamp - offset + windowSize) % windowSize

 上一段代码小测一下

  1. case class TestData(timestamp:Long,word:String)
  2. def main(args: Array[String]): Unit = {
  3. val env = StreamExecutionEnvironment.getExecutionEnvironment
  4. env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
  5. // 便于输出,设置并行度为1
  6. env.setParallelism(1)
  7. val socketStream = env.socketTextStream("localhost",9999)
  8. val windowedStream = socketStream
  9. .map(row=>TestData(row.split(" ")(0).toLong,row.split(" ")(0)))
  10. .assignTimestampsAndWatermarks(new
  11. BoundedOutOfOrdernessTimestampExtractor[TestData](Time.seconds(1)) {
  12. override def extractTimestamp(element: TestData): Long
  13. = element.timestamp * 1000
  14. })
  15. .keyBy(_.word)
  16. .timeWindow(Time.seconds(5))
  17. .reduce((r1,r2)=>TestData(r1.timestamp,"hello "+r2.word))
  18. windowedStream.print("window output is")
  19. socketStream.print("input data is")
  20. env.execute("window_test_job")
  21. }
  1. 准备一下测试数据
  2. 1599623712 word(2020-09-09 11:55:12
  3. 1599623715 word(2020-09-09 11:55:15
  1. 根据公式算出开始时间:
  2. 1599623712 - (1599623712 - 0 + 5) % 5 == 1599623710
  3. 也就是开始时间为 1599623710,步长为5s,也就是下次触发窗口计算为1599623715

验证一下:
    nc录入数据:

  1. 1599623712 word
  2. 1599623715 word

    控制台输出结果:

  1. input data is> 1599623712 word
  2. input data is> 1599623715 word
  3. window output is> TestData(1599623712,word)

结果验证了公式结果即为窗口的开始时间,ProcessingTime与之类似就不测试了,其实也可以看到公式的计算结果一般为自然时间的开始,如2020-09-09 11:55:12的开始时间为2020-09-09 11:55:10

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/weixin_40725706/article/detail/1019237
推荐阅读
相关标签
  

闽ICP备14008679号