当前位置:   article > 正文

C++多线程编程#pragma omp parallel

omp parallel

通常创建线程通过pthread_create来进行线程创建

创建线程

下面的程序,我们可以用它来创建一个 POSIX 线程:

  1. #include <pthread.h>
  2. pthread_create (thread, attr, start_routine, arg)

在这里,pthread_create 创建一个新的线程,并让它可执行。下面是关于参数的说明:

参数描述
thread指向线程标识符指针。
attr一个不透明的属性对象,可以被用来设置线程属性。您可以指定线程属性对象,也可以使用默认值 NULL。
start_routine线程运行函数起始地址,一旦线程被创建就会执行。
arg运行函数的参数。它必须通过把引用作为指针强制转换为 void 类型进行传递。如果没有传递参数,则使用 NULL。

创建线程成功时,函数返回 0,若返回值不为 0 则说明创建线程失败。

终止线程

使用下面的程序,我们可以用它来终止一个 POSIX 线程:

  1. #include <pthread.h>
  2. pthread_exit (status)

在这里,pthread_exit 用于显式地退出一个线程。通常情况下,pthread_exit() 函数是在线程完成工作后无需继续存在时被调用

这样创建线程较为复杂而繁琐,下面介绍通过#pragma omp parallel简单而高效的创建线程

#pragma omp parallel创建线程

#pragma omp parallel通过定义代码块创建多线程,如下面的方式指定哪部分代码创建多线程

  1. #include <omp.h>
  2. int main(){
  3. print(“The output:\n”);
  4. #pragma omp parallel /* define multi-thread section */
  5. {
  6. printf(“Hello World\n”);
  7. }
  8. /* Resume Serial section*/
  9. printf(“Done\n”);
  10. }

 下面是一个创建多线程的实例:

  1. #include<stdio.h>
  2. #include<stdlib.h>
  3. void main(int argc, int *argv[]){
  4. int width = 1280;
  5. int height = 1280;
  6. float *imageBuffer = new float[3 * width* height];
  7. #pragma omp parallel for num_threads(3)
  8. {
  9. int tid = omp_get_thread_num();
  10. for(int i=0;i< width * height;i++){
  11. imageBuffer[i] = 0;
  12. imageBuffer[width * height + i] = 255;
  13. imageBuffer[width * height * 2 + i] = 0;
  14. }
  15. }
  16. }

这种创建多线程的方式简单高效,但是有一点必须注意,#pragma omp parallel关键字创建多线程必须在编译时加上-fopenmp选

项,否则起不到并行的效果,

g++ a.cc -fopenmp

首先,如何使一段代码并行处理呢?omp中使用parallel制导指令标识代码中的并行段,形式为:

           #pragma omp parallel

           {

             每个线程都会执行大括号里的代码

            }

 如果想将for循环用多个线程去执行,可以用for制导语句

for制导语句是将for循环分配给各个线程执行,这里要求数据不存在依赖

 使用形式为:

1)#pragma omp parallel for

         for()

(2)#pragma omp parallel

        {//注意:大括号必须要另起一行

         #pragma omp for

          for()

        }

指定代码分块,每个分块开一个线程去执行,例如

  1. #pragma omp parallel sections // starts a new team
  2. {
  3. { Work1(); }
  4. #pragma omp section
  5. { Work2();
  6. Work3(); }
  7. #pragma omp section
  8. { Work4(); }
  9. }
  10. or
  11. #pragma omp parallel // starts a new team
  12. {
  13. //Work0(); // this function would be run by all threads.
  14. #pragma omp sections // divides the team into sections
  15. {
  16. // everything herein is run only once.
  17. { Work1(); }
  18. #pragma omp section
  19. { Work2();
  20. Work3(); }
  21. #pragma omp section
  22. { Work4(); }
  23. }
  24. //Work5(); // this function would be run by all threads.
  25. }

以shared,private的修饰为例:

  1. #include <stdlib.h> //malloc and free
  2. #include <stdio.h> //printf
  3. #include <omp.h> //OpenMP
  4. // Very small values for this simple illustrative example
  5. #define ARRAY_SIZE 8 //Size of arrays whose elements will be added together.
  6. #define NUM_THREADS 4 //Number of threads to use for vector addition.
  7. /*
  8. * Classic vector addition using openMP default data decomposition.
  9. *
  10. * Compile using gcc like this:
  11. * gcc -o va-omp-simple VA-OMP-simple.c -fopenmp
  12. *
  13. * Execute:
  14. * ./va-omp-simple
  15. */
  16. int main (int argc, char *argv[])
  17. {
  18. // elements of arrays a and b will be added
  19. // and placed in array c
  20. int * a;
  21. int * b;
  22. int * c;
  23. int n = ARRAY_SIZE; // number of array elements
  24. int n_per_thread; // elements per thread
  25. int total_threads = NUM_THREADS; // number of threads to use
  26. int i; // loop index
  27. // allocate spce for the arrays
  28. a = (int *) malloc(sizeof(int)*n);
  29. b = (int *) malloc(sizeof(int)*n);
  30. c = (int *) malloc(sizeof(int)*n);
  31. // initialize arrays a and b with consecutive integer values
  32. // as a simple example
  33. for(i=0; i<n; i++) {
  34. a[i] = i;
  35. }
  36. for(i=0; i<n; i++) {
  37. b[i] = i;
  38. }
  39. // Additional work to set the number of threads.
  40. // We hard-code to 4 for illustration purposes only.
  41. omp_set_num_threads(total_threads);
  42. // determine how many elements each process will work on
  43. n_per_thread = n/total_threads;
  44. // Compute the vector addition
  45. // Here is where the 4 threads are specifically 'forked' to
  46. // execute in parallel. This is directed by the pragma and
  47. // thread forking is compiled into the resulting exacutable.
  48. // Here we use a 'static schedule' so each thread works on
  49. // a 2-element chunk of the original 8-element arrays.
  50. #pragma omp parallel for shared(a, b, c) private(i) schedule(static, n_per_thread)
  51. for(i=0; i<n; i++) {
  52. c[i] = a[i]+b[i];
  53. // Which thread am I? Show who works on what for this samll example
  54. printf("Thread %d works on element%d\n", omp_get_thread_num(), i);
  55. }
  56. // Check for correctness (only plausible for small vector size)
  57. // A test we would eventually leave out
  58. printf("i\ta[i]\t+\tb[i]\t=\tc[i]\n");
  59. for(i=0; i<n; i++) {
  60. printf("%d\t%d\t\t%d\t\t%d\n", i, a[i], b[i], c[i]);
  61. }
  62. // clean up memory
  63. free(a); free(b); free(c);
  64. return 0;
  65. }

对于递归函数也可以使用task并行:

参考:http://akira.ruc.dk/~keld/teaching/IPDC_f10/Slides/pdf/4_Performance.pdf

openMP参考:https://www.cnblogs.com/mfryf/p/12744547.html

                        https://scc.ustc.edu.cn/zlsc/cxyy/200910/W020121113517997951933.pdf

参考:https://blog.csdn.net/zhongkejingwang/article/details/40350027

可参考:https://stackoverflow.com/questions/24417145/pragma-omp-parallel-num-threads-is-not-working

参考:https://people.cs.pitt.edu/~melhem/courses/xx45p/OpenMp.pdf 

本文内容由网友自发贡献,转载请注明出处:https://www.wpsshop.cn/w/笔触狂放9/article/detail/275960
推荐阅读
相关标签
  

闽ICP备14008679号