当前位置:   article > 正文

Linux workqueue介绍

Linux workqueue介绍

Linux中的workqueue机制就是为了简化内核线程的创建。通过调用workqueue的接口就能创建内核线程。并且可以根据当前系统的CPU的个数创建线程的数量,使得线程处理的事务能够并行化。

工作队列(workqueue)是另外一种将工作推后执行的形式。工作队列可以把工作推后,交由一个内核线程去执行,也就是说,这个下半部分可以在进程上下文执行。最重要的就是工作队列允许被重新调度甚至睡眠。

为什么需要工作队列?

在内核代码中,经常会遇到不能或不合适马上调用某个处理过程,此时希望将该工作推给某个内核线程执行,这样做的原因有很多,比如:

  • 中断触发了某个过程的执行条件,而该过程执行时间较长或者会调用导致睡眠的函数,则该过程不应该在中断上下文中立即被调用。
  • 类似于中断,一些紧急性的任务不希望执行比较耗时的非关键过程,则需要把该过程提交到低优先级线程执行。比如一个轮询的通信接收线程,它需要快速完成检测和接收数据,而对数据的解析则应该交给低优先级线程慢慢处理。
  • 有时希望将一些工作集中起来以获取批处理的性能;或者合并缩减一些执行线程,减少资源消耗。

基于以上需求,人们开发除了工作队列这一机制。工作队列不光在操作系统内核中会用到,一些应用程序或协议栈也会实现自己的工作队列。

工作队列的概念

工作队列(workqueue):是将操作(或回调)延期异步执行的一种机制。工作队列可以把工作推后,交由一个内核线程去执行,并且工作队列是执行在线程上下文中,因此工作队列执行过程中可以被重新调度、抢占、睡眠。

工作项(work item):是工作队列中的元素,是一个回调函数和多个回调函数参数的集合,有时也会有额外的属性成员,总之通过一个结构体即可记录和描述一个工作项。

关键数据结构

work_struct

  1. struct work_struct {
  2. atomic_long_t data;
  3. struct list_head entry;
  4. work_func_t func;
  5. #ifdef CONFIG_LOCKDEP
  6. struct lockdep_map lockdep_map;
  7. #endif
  8. ANDROID_KABI_RESERVE(1);
  9. ANDROID_KABI_RESERVE(2);
  10. };

 workqueue_struct

如何传参?

func的参数是一个work_struct指针,指向的数据就是定义func的work_struct

看到这里,会有两个疑问:

第一:如何把用户的数据作为参数传递给func呢?

第二:如何实现延迟工作?

解决第一个问题:工作队列需要把work_struct定义在用户的数据结构中,然后通过container_of来得到用户数据。

对于第二个问题,新的工作队列把timer拿掉的用意是使得work_struct更加单纯。首先回忆一下以前的版本,只有在需要延迟执行工作时才会用到timer,普通情况下timer是没有意义的,所以之前的做法在一定程序上有些浪费资源。所以新版本中,将timer从work_struct中拿掉,然后又定义了一个新的结构delayed_work用于延迟执行。

  1. struct delayed_work {
  2. struct work_struct work;
  3. struct timer_list timer;
  4. };

API介绍

不是所有的驱动程序都必须有自己的工作队列。驱动程序可以使用内核提供的缺省工作队列。由于这个工作队列由很多驱动程序共享,任务可能会需要比较长一段时间才能开始执行。为了解决这一问题,工作函数中的延迟应该保持最小或者不要延时。

创建工作队列

每个工作队列由一个专门的线程(即一个工作队列一个线程),所有来自运行队列的任务在进程的上下文中运行(这样它们可以休眠)。驱动程序可以创建并使用它们自己的工作队列,或者使用内核的一个工作队列。

  1. //创建工作队列
  2. struct workqueue_struct *create_workqueue(const char *name);

 创建工作队列的任务

工作队列任务可以在编译时或者运行时创建。

  1. //编译时创建
  2. DECLARE_WORK(name, void (*function)(void *), void *data);
  3. //运行时创建
  4. INIT_WORK(struct work_struct *work, void (*function)(void *), void *data);

 将任务添加到工作队列中

  1. //添加到指定工作队列
  2. int queue_work(struct workqueue_struct *queue, struct work_struct *work);
  3. <br>
  4. int queue_delayed_work(struct workqueue_struct *queue, struct work_struct
  5. <br>
  6. *work, unsigned long delay);
  7. //添加到内核默认工作队列
  8. int schedule_work(struct work_struct *work);
  9. int schedule_delayed_work(struct work_struct *work, unsigned long delay);

delay:保证至少在经过一段给定的最小延迟时间以后,工作队列中的任务才可以真正执行。

队列和任务的清除操作

  1. //取消任务
  2. int cancel_delayed_work(struct work_struct *work);
  3. //清空队列中的所有任务
  4. void flush_workqueue(struct workqueue_struct *queue);
  5. //销毁工作队列
  6. void destroy_workqueue(struct workqueue_struct *queue);

 举例

  1. struct my_struct_t {
  2. char *name;
  3. struct work_struct my_work;
  4. };
  5. void my_func(struct work_struct *work)
  6. {
  7. struct my_struct_t *my_name = container_of(work, struct my_struct_t, my_work);
  8. printk(KERN_INFO “Hello world, my name is %s!\n”, my_name->name);
  9. }
  10. struct workqueue_struct *my_wq = create_workqueue(“my wq”);
  11. struct my_struct_t my_name;
  12. my_name.name = “Jack”;
  13. INIT_WORK(&(my_name.my_work), my_func);
  14. queue_work(my_wq, &my_work);

工作原理

workqueue是内核里面很重要的一个机制,特别是内核驱动,一般的小型任务(work)都不会自己起一个线程来处理,而是扔到workqueue中处理。workqueue的主要工作就是用进程上下文来处理内核中大量的小任务。

所以workqueue的主要涉及思想:一个是并行,多个work不要相互阻塞。另一个是节省资源,多个work尽量共享资源(进程、调度、内存),不要造成系统过多的资源浪费。

为了实现设计思想,workqueue的设计

实现也更新了很多版本。最新的workqueue实现叫做CMWQ(concurrency Managed Workqueue),也就是用更加只能的算法来实现“并行和节省”。新版本的workqueue创建函数改成alloc_workqueue(),旧版本的函数create_workqueue()逐渐会被废弃。

CMWQ的几个基本概念

关于workqueue中几个概念都是work相关的数据结构,非常容易混淆,大概可以这样理解。

1)work:工作

2)workqueue:工作集合。workqueue和work是一对多的关系

3)worker: 工人。在代码中worker对应一个work_thread()内核线程

4)worker_pool: 工人的集合。worker_pool和worker是一对多的关系

5)PWQ(pool_workqueue):中间人/中介,负责建立workqueue和worker_pool之间的关系,workqueue和pwq是一对多的关系,pwq和worker_pool是一对一的关系。

worker_pool

每个执行work的线程叫做worker,一组worker的结合叫做worker_pool。CMWQ的精髓就在worker_pool里面的worker的动态增减的管理上 manage_workers()。

CMWQ对worker_pool分成两类:

normal worker_pool,给通用的workqueue使用;

unbound worker_pool,给WQ_UNBOUND类型的workqueue使用;

normal worker_pool

默认work是在normal worker_pool中处理的。系统的规划是每个CPU创建两个normal worker_pool:一个Nomal的优先级(nice=0),一个高优先级(nice=HIGHPRI_NICE_LEVEL),对应创建出来的worker进程的nice不一样。

每个worker对应一个worker_thread()内核线程,一个worker_pool包含一个或者多个worker,worker_pool中worker的数量是根据worker_pool中work的负载来动态增减的。

我们可以通过ps aux | grep kworker命令来查看所有worker对应的内核线程,normal worker_pool对应内核线程(worker_thread())的命名规则是这样的:

  1. snprintf(id_buf, sizeof(id_buf), "%d:%d%s", pool->cpu, id,
  2. pool->attrs->nice < 0 ? "H" : "");
  3. worker->task = kthread_create_on_node(worker_thread, worker, pool->node,
  4. "kworker/%s", id_buf);

so 类似名字是 normal worker_pool:

  1. shell@PRO5:/ $ ps | grep "kworker"
  2. root 14 2 0 0 worker_thr 0000000000 S kworker/1:0H // cpu1 高优先级 worker_pool 的第 0 个 worker 进程
  3. root 17 2 0 0 worker_thr 0000000000 S kworker/2:0 // cpu2 低优先级 worker_pool 的第 0 个 worker 进程
  4. root 18 2 0 0 worker_thr 0000000000 S kworker/2:0H // cpu2 高优先级 worker_pool 的第 0 个 worker 进程
  5. root 23699 2 0 0 worker_thr 0000000000 S kworker/0:1 // cpu0 低优先级 worker_pool 的第 1 个 worker 进程

unbound worker_pool

大部分的work都是通过normal worker_pool来执行的(例如通过schedule_work()、schedule_work_on()压入到系统workqueue中的work),最后都是通过normal worker_pool中的worker来执行的。这些worker是和某个CPU绑定的,work一旦被worker开始执行,都是一直运行到某个CPU上的,不会切换CPU。

unbound worker_pool相对应的意思,就是worker可以在多个CPU上调度。但是它其实也是绑定的,只不过它绑定的单位不是CPU,而是node,所谓的node是对NUMA(Non uniform Memory Access Architecture)系统来说的,NUMA可能存在多个Node,每个node可能包含一个或者多个CPU。

unbound worker_pool对应内核线程(worker_thread())的命名规则是这样的:

  1. snprintf(id_buf, sizeof(id_buf), "u%d:%d", pool->id, id);
  2. worker->task = kthread_create_on_node(worker_thread, worker, pool->node,
  3. "kworker/%s", id_buf);

so 类似名字是 unbound worker_pool:

  1. shell@PRO5:/ $ ps | grep "kworker"
  2. root 23906 2 0 0 worker_thr 0000000000 S kworker/u20:2/* unbound pool 20 的第 2 个 worker 进程*/
  3. root 24564 2 0 0 worker_thr 0000000000 S kworker/u20:0/* unbound pool 20 的第 0 个 worker 进程*/
  4. root 24622 2 0 0 worker_thr 0000000000 S kworker/u21:1/* unbound pool 21 的第 1 个 worker 进程*/

worker

每个worker对应一个worker_thread()内核线程,一个worker_pool对应一个或者多个worker。多个worker从同一个链表中worker_pool->worklist获取work进行处理。

这其中有几个重点:

  • worker怎么处理work;
  • worker_pool怎么动态管理worker的数量;

worker处理work

处理 work 的过程主要在 worker_thread() -> process_one_work() 中处理,我们具体看看代码的实现过程。

kernel/workqueue.c: worker_thread() -> process_one_work()

  1. static int worker_thread(void *__worker)
  2. {
  3. struct worker *worker = __worker;
  4. struct worker_pool *pool = worker->pool;
  5. /* tell the scheduler that this is a workqueue worker */
  6. worker->task->flags |= PF_WQ_WORKER;
  7. woke_up:
  8. spin_lock_irq(&pool->lock);
  9. // (1) 是否 die
  10. /* am I supposed to die? */
  11. if (unlikely(worker->flags & WORKER_DIE)) {
  12. spin_unlock_irq(&pool->lock);
  13. WARN_ON_ONCE(!list_empty(&worker->entry));
  14. worker->task->flags &= ~PF_WQ_WORKER;
  15. set_task_comm(worker->task, "kworker/dying");
  16. ida_simple_remove(&pool->worker_ida, worker->id);
  17. worker_detach_from_pool(worker, pool);
  18. kfree(worker);
  19. return 0;
  20. }
  21. // (2) 脱离 idle 状态
  22. // 被唤醒之前 worker 都是 idle 状态
  23. worker_leave_idle(worker);
  24. recheck:
  25. // (3) 如果需要本 worker 继续执行则继续,否则进入 idle 状态
  26. // need more worker 的条件: (pool->worklist != 0) && (pool->nr_running == 0)
  27. // worklist 上有 work 需要执行,并且现在没有处于 running 的 work
  28. /* no more worker necessary? */
  29. if (!need_more_worker(pool))
  30. goto sleep;
  31. // (4) 如果 (pool->nr_idle == 0),则启动创建更多的 worker
  32. // 说明 idle 队列中已经没有备用 worker 了,先创建 一些 worker 备用
  33. /* do we need to manage? */
  34. if (unlikely(!may_start_working(pool)) && manage_workers(worker))
  35. goto recheck;
  36. /*
  37. * ->scheduled list can only be filled while a worker is
  38. * preparing to process a work or actually processing it.
  39. * Make sure nobody diddled with it while I was sleeping.
  40. */
  41. WARN_ON_ONCE(!list_empty(&worker->scheduled));
  42. /*
  43. * Finish PREP stage. We're guaranteed to have at least one idle
  44. * worker or that someone else has already assumed the manager
  45. * role. This is where @worker starts participating in concurrency
  46. * management if applicable and concurrency management is restored
  47. * after being rebound. See rebind_workers() for details.
  48. */
  49. worker_clr_flags(worker, WORKER_PREP | WORKER_REBOUND);
  50. do {
  51. // (5) 如果 pool->worklist 不为空,从其中取出一个 work 进行处理
  52. struct work_struct *work =
  53. list_first_entry(&pool->worklist,
  54. struct work_struct, entry);
  55. if (likely(!(*work_data_bits(work) & WORK_STRUCT_LINKED))) {
  56. /* optimization path, not strictly necessary */
  57. // (6) 执行正常的 work
  58. process_one_work(worker, work);
  59. if (unlikely(!list_empty(&worker->scheduled)))
  60. process_scheduled_works(worker);
  61. } else {
  62. // (7) 执行系统特意 scheduled 给某个 worker 的 work
  63. // 普通的 work 是放在池子的公共 list 中的 pool->worklist
  64. // 只有一些特殊的 work 被特意派送给某个 worker 的 worker->scheduled
  65. // 包括:1、执行 flush_work 时插入的 barrier work;
  66. // 2、collision 时从其他 worker 推送到本 worker 的 work
  67. move_linked_works(work, &worker->scheduled, NULL);
  68. process_scheduled_works(worker);
  69. }
  70. // (8) worker keep_working 的条件:
  71. // pool->worklist 不为空 && (pool->nr_running <= 1)
  72. } while (keep_working(pool));
  73. worker_set_flags(worker, WORKER_PREP);supposed
  74. sleep:
  75. // (9) worker 进入 idle 状态
  76. /*
  77. * pool->lock is held and there's no work to process and no need to
  78. * manage, sleep. Workers are woken up only while holding
  79. * pool->lock or from local cpu, so setting the current state
  80. * before releasing pool->lock is enough to prevent losing any
  81. * event.
  82. */
  83. worker_enter_idle(worker);
  84. __set_current_state(TASK_INTERRUPTIBLE);
  85. spin_unlock_irq(&pool->lock);
  86. schedule();
  87. goto woke_up;
  88. }
  89. | →
  90. static void process_one_work(struct worker *worker, struct work_struct *work)
  91. __releases(&pool->lock)
  92. __acquires(&pool->lock)
  93. {
  94. struct pool_workqueue *pwq = get_work_pwq(work);
  95. struct worker_pool *pool = worker->pool;
  96. bool cpu_intensive = pwq->wq->flags & WQ_CPU_INTENSIVE;
  97. int work_color;
  98. struct worker *collision;
  99. #ifdef CONFIG_LOCKDEP
  100. /*
  101. * It is permissible to free the struct work_struct from
  102. * inside the function that is called from it, this we need to
  103. * take into account for lockdep too. To avoid bogus "held
  104. * lock freed" warnings as well as problems when looking into
  105. * work->lockdep_map, make a copy and use that here.
  106. */
  107. struct lockdep_map lockdep_map;
  108. lockdep_copy_map(&lockdep_map, &work->lockdep_map);
  109. #endif
  110. /* ensure we're on the correct CPU */
  111. WARN_ON_ONCE(!(pool->flags & POOL_DISASSOCIATED) &&
  112. raw_smp_processor_id() != pool->cpu);
  113. // (8.1) 如果 work 已经在 worker_pool 的其他 worker 上执行,
  114. // 将 work 放入对应 worker 的 scheduled 队列中延后执行
  115. /*
  116. * A single work shouldn't be executed concurrently by
  117. * multiple workers on a single cpu. Check whether anyone is
  118. * already processing the work. If so, defer the work to the
  119. * currently executing one.
  120. */
  121. collision = find_worker_executing_work(pool, work);
  122. if (unlikely(collision)) {
  123. move_linked_works(work, &collision->scheduled, NULL);
  124. return;
  125. }
  126. // (8.2) 将 worker 加入 busy 队列 pool->busy_hash
  127. /* claim and dequeue */
  128. debug_work_deactivate(work);
  129. hash_add(pool->busy_hash, &worker->hentry, (unsigned long)work);
  130. worker->current_work = work;
  131. worker->current_func = work->func;
  132. worker->current_pwq = pwq;
  133. work_color = get_work_color(work);
  134. list_del_init(&work->entry);
  135. // (8.3) 如果 work 所在的 wq 是 cpu 密集型的 WQ_CPU_INTENSIVE
  136. // 则当前 work 的执行脱离 worker_pool 的动态调度,成为一个独立的线程
  137. /*
  138. * CPU intensive works don't participate in concurrency management.
  139. * They're the scheduler's responsibility. This takes @worker out
  140. * of concurrency management and the next code block will chain
  141. * execution of the pending work items.
  142. */
  143. if (unlikely(cpu_intensive))
  144. worker_set_flags(worker, WORKER_CPU_INTENSIVE);
  145. // (8.4) 在 UNBOUND 或者 CPU_INTENSIVE work 中判断是否需要唤醒 idle worker
  146. // 普通 work 不会执行这个操作
  147. /*
  148. * Wake up another worker if necessary. The condition is always
  149. * false for normal per-cpu workers since nr_running would always
  150. * be >= 1 at this point. This is used to chain execution of the
  151. * pending work items for WORKER_NOT_RUNNING workers such as the
  152. * UNBOUND and CPU_INTENSIVE ones.
  153. */
  154. if (need_more_worker(pool))
  155. wake_up_worker(pool);
  156. /*
  157. * Record the last pool and clear PENDING which should be the last
  158. * update to @work. Also, do this inside @pool->lock so that
  159. * PENDING and queued state changes happen together while IRQ is
  160. * disabled.
  161. */
  162. set_work_pool_and_clear_pending(work, pool->id);
  163. spin_unlock_irq(&pool->lock);
  164. lock_map_acquire_read(&pwq->wq->lockdep_map);
  165. lock_map_acquire(&lockdep_map);
  166. trace_workqueue_execute_start(work);
  167. // (8.5) 执行 work 函数
  168. worker->current_func(work);
  169. /*
  170. * While we must be careful to not use "work" after this, the trace
  171. * point will only record its address.
  172. */
  173. trace_workqueue_execute_end(work);
  174. lock_map_release(&lockdep_map);
  175. lock_map_release(&pwq->wq->lockdep_map);
  176. if (unlikely(in_atomic() || lockdep_depth(current) > 0)) {
  177. pr_err("BUG: workqueue leaked lock or atomic: %s/0x%08x/%d\n"
  178. " last function: %pf\n",
  179. current->comm, preempt_count(), task_pid_nr(current),
  180. worker->current_func);
  181. debug_show_held_locks(current);
  182. dump_stack();
  183. }
  184. /*
  185. * The following prevents a kworker from hogging CPU on !PREEMPT
  186. * kernels, where a requeueing work item waiting for something to
  187. * happen could deadlock with stop_machine as such work item could
  188. * indefinitely requeue itself while all other CPUs are trapped in
  189. * stop_machine. At the same time, report a quiescent RCU state so
  190. * the same condition doesn't freeze RCU.
  191. */
  192. cond_resched_rcu_qs();
  193. spin_lock_irq(&pool->lock);
  194. /* clear cpu intensive status */
  195. if (unlikely(cpu_intensive))
  196. worker_clr_flags(worker, WORKER_CPU_INTENSIVE);
  197. /* we're done with it, release */
  198. hash_del(&worker->hentry);
  199. worker->current_work = NULL;
  200. worker->current_func = NULL;
  201. worker->current_pwq = NULL;
  202. worker->desc_valid = false;
  203. pwq_dec_nr_in_flight(pwq, work_color);
  204. }

worker_pool 动态管理 worker

worker_pool 怎么来动态增减 worker,这部分的算法是 CMWQ 的核心。其思想如下:

  • worker_pool中的worker有3中状态:idle、running、suspend;
  • 如果worker_pool中有work需要处理,保持至少一个running worker来处理;
  • running worker在处理work的过程中进入了阻塞suspend状态,为了保持其他work的执行,需要唤醒新的idle worker来处理work;
  • 如果有work需要执行且running worker大于1个,会让多余的running worker进入idle状态。
  • 如果没有work需要执行,会让所有work进入idle状态;
  • 如果创建的worker过多,destroy_worker在300s(IDLE_WORKER_TIMEOUT)时间内没有再次运行的idle_worker。

workqueue

workqueue就是存放一组work的集合,基本可以分为两类:一类是系统创建的workqueue,一类是用户自己创建的workqueue。不论是系统还是用户的workqueue,如果没有指定WQ_UNBOUND,默认都是和normal worker_pool绑定。

系统wrokqueue

系统在初始化时创建了一批默认的workqueue:system_wq、system_highpri_wq、system_unbound_wq、system_freezable_wq、system_power_efficient_wq、system_freezable_power_efficient_wq。

像system_wq,就是schedule_work()默认使用的。

kernel/workqueue.c:init_workqueues()

  1. static int __init init_workqueues(void)
  2. {
  3. system_wq = alloc_workqueue("events", 0, 0);
  4. system_highpri_wq = alloc_workqueue("events_highpri", WQ_HIGHPRI, 0);
  5. system_long_wq = alloc_workqueue("events_long", 0, 0);
  6. system_unbound_wq = alloc_workqueue("events_unbound", WQ_UNBOUND,
  7. WQ_UNBOUND_MAX_ACTIVE);
  8. system_freezable_wq = alloc_workqueue("events_freezable",
  9. WQ_FREEZABLE, 0);
  10. system_power_efficient_wq = alloc_workqueue("events_power_efficient",
  11. WQ_POWER_EFFICIENT, 0);
  12. system_freezable_power_efficient_wq = alloc_workqueue("events_freezable_power_efficient",
  13. WQ_FREEZABLE | WQ_POWER_EFFICIENT,
  14. 0);
  15. }

workqueue 创建

详细过程见上几节的代码分析:alloc_workqueue() -> __alloc_workqueue_key() -> alloc_and_link_pwqs()。

queue_work()

将work压入到workqueue当中。

kernel/workqueue.c: queue_work() -> queue_work_on() -> __queue_work()

  1. static void __queue_work(int cpu, struct workqueue_struct *wq,
  2. struct work_struct *work)
  3. {
  4. struct pool_workqueue *pwq;
  5. struct worker_pool *last_pool;
  6. struct list_head *worklist;
  7. unsigned int work_flags;
  8. unsigned int req_cpu = cpu;
  9. /*
  10. * While a work item is PENDING && off queue, a task trying to
  11. * steal the PENDING will busy-loop waiting for it to either get
  12. * queued or lose PENDING. Grabbing PENDING and queueing should
  13. * happen with IRQ disabled.
  14. */
  15. WARN_ON_ONCE(!irqs_disabled());
  16. debug_work_activate(work);
  17. /* if draining, only works from the same workqueue are allowed */
  18. if (unlikely(wq->flags & __WQ_DRAINING) &&
  19. WARN_ON_ONCE(!is_chained_work(wq)))
  20. return;
  21. retry:
  22. // (1) 如果没有指定 cpu,则使用当前 cpu
  23. if (req_cpu == WORK_CPU_UNBOUND)
  24. cpu = raw_smp_processor_id();
  25. /* pwq which will be used unless @work is executing elsewhere */
  26. if (!(wq->flags & WQ_UNBOUND))
  27. // (2) 对于 normal wq,使用当前 cpu 对应的 normal worker_pool
  28. pwq = per_cpu_ptr(wq->cpu_pwqs, cpu);
  29. else
  30. // (3) 对于 unbound wq,使用当前 cpu 对应 node 的 worker_pool
  31. pwq = unbound_pwq_by_node(wq, cpu_to_node(cpu));
  32. // (4) 如果 work 在其他 worker 上正在被执行,把 work 压到对应的 worker 上去
  33. // 避免 work 出现重入的问题
  34. /*
  35. * If @work was previously on a different pool, it might still be
  36. * running there, in which case the work needs to be queued on that
  37. * pool to guarantee non-reentrancy.
  38. */
  39. last_pool = get_work_pool(work);
  40. if (last_pool && last_pool != pwq->pool) {
  41. struct worker *worker;
  42. spin_lock(&last_pool->lock);
  43. worker = find_worker_executing_work(last_pool, work);
  44. if (worker && worker->current_pwq->wq == wq) {
  45. pwq = worker->current_pwq;
  46. } else {
  47. /* meh... not running there, queue here */
  48. spin_unlock(&last_pool->lock);
  49. spin_lock(&pwq->pool->lock);
  50. }
  51. } else {
  52. spin_lock(&pwq->pool->lock);
  53. }
  54. /*
  55. * pwq is determined and locked. For unbound pools, we could have
  56. * raced with pwq release and it could already be dead. If its
  57. * refcnt is zero, repeat pwq selection. Note that pwqs never die
  58. * without another pwq replacing it in the numa_pwq_tbl or while
  59. * work items are executing on it, so the retrying is guaranteed to
  60. * make forward-progress.
  61. */
  62. if (unlikely(!pwq->refcnt)) {
  63. if (wq->flags & WQ_UNBOUND) {
  64. spin_unlock(&pwq->pool->lock);
  65. cpu_relax();
  66. goto retry;
  67. }
  68. /* oops */
  69. WARN_ONCE(true, "workqueue: per-cpu pwq for %s on cpu%d has 0 refcnt",
  70. wq->name, cpu);
  71. }
  72. /* pwq determined, queue */
  73. trace_workqueue_queue_work(req_cpu, pwq, work);
  74. if (WARN_ON(!list_empty(&work->entry))) {
  75. spin_unlock(&pwq->pool->lock);
  76. return;
  77. }
  78. pwq->nr_in_flight[pwq->work_color]++;
  79. work_flags = work_color_to_flags(pwq->work_color);
  80. // (5) 如果还没有达到 max_active,将 work 挂载到 pool->worklist
  81. if (likely(pwq->nr_active < pwq->max_active)) {
  82. trace_workqueue_activate_work(work);
  83. pwq->nr_active++;
  84. worklist = &pwq->pool->worklist;
  85. // 否则,将 work 挂载到临时队列 pwq->delayed_works
  86. } else {
  87. work_flags |= WORK_STRUCT_DELAYED;
  88. worklist = &pwq->delayed_works;
  89. }
  90. // (6) 将 work 压入 worklist 当中
  91. insert_work(pwq, work, worklist, work_flags);
  92. spin_unlock(&pwq->pool->lock);
  93. }

flush_work()

flush某个work,确保work执行完成。

怎么判断异步的work已经执行完成?这里面使用了一个技巧:在目标work后面插入一个新的work wq_barrier,如果wq_barrier执行完成,那么目标work肯定已经执行完成。

kernel/workqueue.c: queue_work() -> queue_work_on() -> __queue_work()

  1. /**
  2. * flush_work - wait for a work to finish executing the last queueing instance
  3. * @work: the work to flush
  4. *
  5. * Wait until @work has finished execution. @work is guaranteed to be idle
  6. * on return if it hasn't been requeued since flush started.
  7. *
  8. * Return:
  9. * %true if flush_work() waited for the work to finish execution,
  10. * %false if it was already idle.
  11. */
  12. bool flush_work(struct work_struct *work)
  13. {
  14. struct wq_barrier barr;
  15. lock_map_acquire(&work->lockdep_map);
  16. lock_map_release(&work->lockdep_map);
  17. if (start_flush_work(work, &barr)) {
  18. // 等待 barr work 执行完成的信号
  19. wait_for_completion(&barr.done);
  20. destroy_work_on_stack(&barr.work);
  21. return true;
  22. } else {
  23. return false;
  24. }
  25. }
  26. | →
  27. static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr)
  28. {
  29. struct worker *worker = NULL;
  30. struct worker_pool *pool;
  31. struct pool_workqueue *pwq;
  32. might_sleep();
  33. // (1) 如果 work 所在 worker_pool 为 NULL,说明 work 已经执行完
  34. local_irq_disable();
  35. pool = get_work_pool(work);
  36. if (!pool) {
  37. local_irq_enable();
  38. return false;
  39. }
  40. spin_lock(&pool->lock);
  41. /* see the comment in try_to_grab_pending() with the same code */
  42. pwq = get_work_pwq(work);
  43. if (pwq) {
  44. // (2) 如果 work 所在 pwq 指向的 worker_pool 不等于上一步得到的 worker_pool,说明 work 已经执行完
  45. if (unlikely(pwq->pool != pool))
  46. goto already_gone;
  47. } else {
  48. // (3) 如果 work 所在 pwq 为 NULL,并且也没有在当前执行的 work 中,说明 work 已经执行完
  49. worker = find_worker_executing_work(pool, work);
  50. if (!worker)
  51. goto already_gone;
  52. pwq = worker->current_pwq;
  53. }
  54. // (4) 如果 work 没有执行完,向 work 的后面插入 barr work
  55. insert_wq_barrier(pwq, barr, work, worker);
  56. spin_unlock_irq(&pool->lock);
  57. /*
  58. * If @max_active is 1 or rescuer is in use, flushing another work
  59. * item on the same workqueue may lead to deadlock. Make sure the
  60. * flusher is not running on the same workqueue by verifying write
  61. * access.
  62. */
  63. if (pwq->wq->saved_max_active == 1 || pwq->wq->rescuer)
  64. lock_map_acquire(&pwq->wq->lockdep_map);
  65. else
  66. lock_map_acquire_read(&pwq->wq->lockdep_map);
  67. lock_map_release(&pwq->wq->lockdep_map);
  68. return true;
  69. already_gone:
  70. spin_unlock_irq(&pool->lock);
  71. return false;
  72. }
  73. || →
  74. static void insert_wq_barrier(struct pool_workqueue *pwq,
  75. struct wq_barrier *barr,
  76. struct work_struct *target, struct worker *worker)
  77. {
  78. struct list_head *head;
  79. unsigned int linked = 0;
  80. /*
  81. * debugobject calls are safe here even with pool->lock locked
  82. * as we know for sure that this will not trigger any of the
  83. * checks and call back into the fixup functions where we
  84. * might deadlock.
  85. */
  86. // (4.1) barr work 的执行函数 wq_barrier_func()
  87. INIT_WORK_ONSTACK(&barr->work, wq_barrier_func);
  88. __set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(&barr->work));
  89. init_completion(&barr->done);
  90. /*
  91. * If @target is currently being executed, schedule the
  92. * barrier to the worker; otherwise, put it after @target.
  93. */
  94. // (4.2) 如果 work 当前在 worker 中执行,则 barr work 插入 scheduled 队列
  95. if (worker)
  96. head = worker->scheduled.next;
  97. // 否则,则 barr work 插入正常的 worklist 队列中,插入位置在目标 work 后面
  98. // 并且置上 WORK_STRUCT_LINKED 标志
  99. else {
  100. unsigned long *bits = work_data_bits(target);
  101. head = target->entry.next;
  102. /* there can already be other linked works, inherit and set */
  103. linked = *bits & WORK_STRUCT_LINKED;
  104. __set_bit(WORK_STRUCT_LINKED_BIT, bits);
  105. }
  106. debug_work_activate(&barr->work);
  107. insert_work(pwq, &barr->work, head,
  108. work_color_to_flags(WORK_NO_COLOR) | linked);
  109. }
  110. ||| →
  111. static void wq_barrier_func(struct work_struct *work)
  112. {
  113. struct wq_barrier *barr = container_of(work, struct wq_barrier, work);
  114. // (4.1.1) barr work 执行完成,发出 complete 信号。
  115. complete(&barr->done);
  116. }

Workqueue 对外接口函数

CMWQ 实现的 workqueue 机制,被包装成相应的对外接口函数。

schedule_work()

把work压入系统默认wq system_wq,WORK_CPU_UNBOUND指定worker为当前CPU绑定的normal work_pool创建的worker。

kernel/workqueue.c: schedule_work() -> queue_work_on() -> __queue_work()

  1. static inline bool schedule_work(struct work_struct *work)
  2. {
  3. return queue_work(system_wq, work);
  4. }
  5. | →
  6. static inline bool queue_work(struct workqueue_struct *wq,
  7. struct work_struct *work)
  8. {
  9. return queue_work_on(WORK_CPU_UNBOUND, wq, work);
  10. }

schedule_work_on() 

在schedule_work()基础上,可以指定work运行的CPU。

kernel/workqueue.c: schedule_work_on() -> queue_work_on() -> __queue_work()

  1. static inline bool schedule_work_on(int cpu, struct work_struct *work)
  2. {
  3. return queue_work_on(cpu, system_wq, work);
  4. }

schedule_delayed_work()

启动一个timer,在timer定时到了以后调用delayed_work_timer_fn()把work压入系统默认wq system_wq。

kernel/workqueue.c: schedule_work_on() -> queue_work_on() -> __queue_work()

  1. static inline bool schedule_delayed_work(struct delayed_work *dwork,
  2. unsigned long delay)
  3. {
  4. return queue_delayed_work(system_wq, dwork, delay);
  5. }
  6. | →
  7. static inline bool queue_delayed_work(struct workqueue_struct *wq,
  8. struct delayed_work *dwork,
  9. unsigned long delay)
  10. {
  11. return queue_delayed_work_on(WORK_CPU_UNBOUND, wq, dwork, delay);
  12. }
  13. || →
  14. bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
  15. struct delayed_work *dwork, unsigned long delay)
  16. {
  17. struct work_struct *work = &dwork->work;
  18. bool ret = false;
  19. unsigned long flags;
  20. /* read the comment in __queue_work() */
  21. local_irq_save(flags);
  22. if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
  23. __queue_delayed_work(cpu, wq, dwork, delay);
  24. ret = true;
  25. }
  26. local_irq_restore(flags);
  27. return ret;
  28. }
  29. ||| →
  30. static void __queue_delayed_work(int cpu, struct workqueue_struct *wq,
  31. struct delayed_work *dwork, unsigned long delay)
  32. {
  33. struct timer_list *timer = &dwork->timer;
  34. struct work_struct *work = &dwork->work;
  35. WARN_ON_ONCE(timer->function != delayed_work_timer_fn ||
  36. timer->data != (unsigned long)dwork);
  37. WARN_ON_ONCE(timer_pending(timer));
  38. WARN_ON_ONCE(!list_empty(&work->entry));
  39. /*
  40. * If @delay is 0, queue @dwork->work immediately. This is for
  41. * both optimization and correctness. The earliest @timer can
  42. * expire is on the closest next tick and delayed_work users depend
  43. * on that there's no such delay when @delay is 0.
  44. */
  45. if (!delay) {
  46. __queue_work(cpu, wq, &dwork->work);
  47. return;
  48. }
  49. timer_stats_timer_set_start_info(&dwork->timer);
  50. dwork->wq = wq;
  51. dwork->cpu = cpu;
  52. timer->expires = jiffies + delay;
  53. if (unlikely(cpu != WORK_CPU_UNBOUND))
  54. add_timer_on(timer, cpu);
  55. else
  56. add_timer(timer);
  57. }
  58. |||| →
  59. void delayed_work_timer_fn(unsigned long __data)
  60. {
  61. struct delayed_work *dwork = (struct delayed_work *)__data;
  62. /* should have been called from irqsafe timer with irq already off */
  63. __queue_work(dwork->cpu, dwork->wq, &dwork->work);
  64. }

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/盐析白兔/article/detail/810861
推荐阅读
相关标签
  

闽ICP备14008679号