多核多线程自旋锁spinlock 与互斥量mutex性能分析_多核spinlock mutex

作者：小蓝xlanll | 2024-05-30 22:35:09

踩

多核spinlock mutex

多核多线程　`自旋锁（spinlock ）`与　`互斥量（mutex）`

mutex方式：（sleep-wait)

Mutex适合对锁操作非常频繁，临界区较大的场景，并且具有更好的适应性。

对锁操作非常频繁：
例如测试程序关键代码，每获取一次锁，执行 100000次累加操作（临界区较大），耗时比较长

               //--每获取一次锁，执行 100000次 累加 操作
               //--耗时比较长
               for (j = 0; j < 100000; j++) {

                    //--打印优先完成的线程ＩＤ
                   if (g_count++ == 123456789){
                           printf("Thread %lu wins!\n", (unsigned long)gettid());
                   }
               }
1
2
3
4
5
6
7
8
9

尽管相比spin lock它会花费更多的开销（主要是上下文切换），但是它能适合实际开发中复杂的应用场景，在保证一定性能的前提下提供更大的灵活度。

消耗时间的地方：(系统调用，mutex会在锁冲突时调用system wait）
上下文切换对已经拿着锁的那个线程性能也是有影响的，因为当该线程释放该锁时它需要通知操作系统去唤醒那些被阻塞的线程
在这里插入图片描述

spinlock 方式:（busy-wait)

spin lock 性能更好(花费更少的cpu指令)，但是它只适应用于临界区运行时间很短的场景。

临界区运行时间很短:
例如测试程序关键线程代码,每获取一次锁，执行 1次赋值和 pop_front 操作,耗时非常短

        //---取出列表第一个值
        //--每获取一次锁，执行 1次 赋值和 pop_front 操作
        //--耗时非常短
        i = the_list.front();
        the_list.pop_front();
1
2
3
4
5

而在实际软件开发中，除非程序员对自己的程序的锁操作行为非常的了解，否则使用spin lock不是一个好主意。
通常一个多线程程序中对锁的操作有数以万次，如果失败的锁操作(contended lock requests)过多的话就会浪费很多的时间进行空等待。

消耗时间的地方：
两个线程分别运行在两个核上，大部分时间只有一个线程能拿到锁，所以另一个线程就一直在它运行的core上进行忙等待，CPU占用率一直是100%
在这里插入图片描述

测试代码(临界区运行时间很短)：

1 编译spin lock版本 : g++ -o spin_version -DUSE_SPINLOCK spinlockvsmutex1.cc -lpthread
2 编译mutex 版本 : g++ -o mutex_version spinlockvsmutex1.cc -lpthread

// Name: spinlockvsmutex1.cc
// Source: http://www.alexonlinux.com/pthread-mutex-vs-pthread-spinlock
// Compiler(spin lock version): g++ -o spin_version -DUSE_SPINLOCK spinlockvsmutex1.cc -lpthread
// Compiler(mutex version): g++ -o mutex_version spinlockvsmutex1.cc -lpthread
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <errno.h>
#include <sys/time.h>
#include <list>
#include <pthread.h>

#define LOOPS 50000000

using namespace std;

list<int> the_list;


//-- spinlock 或者 mutex
#ifdef USE_SPINLOCK
pthread_spinlock_t spinlock;
#else
pthread_mutex_t mutex;
#endif


//Get the thread id
pid_t gettid() { return syscall( __NR_gettid ); }

void *consumer(void *ptr)
{
    int i;

    //--打印线程ＩＤ
    printf("Consumer Thread ID %lu\n", (unsigned long)gettid());

    while (1)
    {
#ifdef USE_SPINLOCK
        pthread_spin_lock(&spinlock);
#else
        pthread_mutex_lock(&mutex);
#endif

        //--列表为空，结束
        if (the_list.empty())
        {
#ifdef USE_SPINLOCK
            pthread_spin_unlock(&spinlock);
#else
            pthread_mutex_unlock(&mutex);
#endif
            break;
        }

        //---取出列表第一个值
        //--每获取一次锁，执行 1次 赋值和 pop_front 操作
        //--耗时非常短
        i = the_list.front();
        the_list.pop_front();

#ifdef USE_SPINLOCK
        pthread_spin_unlock(&spinlock);
#else
        pthread_mutex_unlock(&mutex);
#endif
    }

    return NULL;
}

int main()
{
    int i;
    pthread_t thr1, thr2;
    struct timeval tv1, tv2;

#ifdef USE_SPINLOCK
    pthread_spin_init(&spinlock, 0);
#else
    pthread_mutex_init(&mutex, NULL);
#endif

    // Creating the list content...
    //--生产者,创建列表内容
    for (i = 0; i < LOOPS; i++)
        the_list.push_back(i);

    // Measuring time before starting the threads...
    //--启动线程前时间
    gettimeofday(&tv1, NULL);

    //--创建两个消费者线程
    pthread_create(&thr1, NULL, consumer, NULL);
    pthread_create(&thr2, NULL, consumer, NULL);

    //--主线程等待两个消费者线程结束
    pthread_join(thr1, NULL);
    pthread_join(thr2, NULL);

    // Measuring time after threads finished...
    //--线程结束时间
    gettimeofday(&tv2, NULL);

    if (tv1.tv_usec > tv2.tv_usec)
    {
        tv2.tv_sec--;
        tv2.tv_usec += 1000000;
    }

    //--打印耗时
    printf("Result - %ld.%ld\n", tv2.tv_sec - tv1.tv_sec,
        tv2.tv_usec - tv1.tv_usec);

#ifdef USE_SPINLOCK
    pthread_spin_destroy(&spinlock);
#else
    pthread_mutex_destroy(&mutex);
#endif

    return 0;
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124

运行


wmx@wmx-ubuntu:~/workspace/Multi-core scheduling/spinlockvsmutex1$ time ./mutex_version 
Consumer TID 17599
Consumer TID 17600
Result - 23.863041

real	0m26.455s
user	0m30.596s
sys	0m19.712s

1
2
3
4
5
6
7
8
9
10

wmx@wmx-ubuntu:~/workspace/Multi-core scheduling/spinlockvsmutex1$ time ./spin_version 
Consumer TID 17606
Consumer TID 17607
Result - 3.743531

real	0m6.293s
user	0m9.719s
sys	0m0.317s

1
2
3
4
5
6
7
8
9

可以看见spin lock的版本在该程序中表现出来的性能更好。另外值得注意的是sys时间，mutex版本花费了更多的系统调用时间，这就是因为mutex会在锁冲突时调用system wait造成的。

测试代码(锁冲突程度非常剧烈的实例程序)：

//Name: svm2.c
//Source: http://www.solarisinternals.com/wiki/index.php/DTrace_Topics_Locks
//Compile(spin lock version): gcc -o spin -DUSE_SPINLOCK svm2.c -lpthread
//Compile(mutex version): gcc -o mutex svm2.c -lpthread
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <sys/syscall.h>

//--线程数量
#define        THREAD_NUM     2

pthread_t g_thread[THREAD_NUM];


//-- spinlock 或者 mutex
#ifdef USE_SPINLOCK
pthread_spinlock_t g_spin;
#else
pthread_mutex_t g_mutex;
#endif


__uint64_t g_count;

pid_t gettid()
{
    return syscall(SYS_gettid);
}

void *run_amuck(void *arg)
{
       int i, j;

       //--打印线程ＩＤ
       printf("Thread %lu started.\n", (unsigned long)gettid());

       //--10000次请求锁
       for (i = 0; i < 10000; i++) {
#ifdef USE_SPINLOCK
           pthread_spin_lock(&g_spin);
#else
               pthread_mutex_lock(&g_mutex);
#endif
               //--每获取一次锁，执行 100000次 累加 操作
               //--耗时比较长
               for (j = 0; j < 100000; j++) {

                    //--打印优先完成的线程ＩＤ
                   if (g_count++ == 123456789){
                           printf("Thread %lu wins!\n", (unsigned long)gettid());
                   }
               }
#ifdef USE_SPINLOCK
           pthread_spin_unlock(&g_spin);
#else
               pthread_mutex_unlock(&g_mutex);
#endif
       }

       //--打印线程ＩＤ
       printf("Thread %lu finished!\n", (unsigned long)gettid());

       return (NULL);
}

int main(int argc, char *argv[])
{
       int i, threads = THREAD_NUM;

       //--打印线程数量
       printf("Creating %d threads...\n", threads);

       //-- spinlock 或者 mutex
#ifdef USE_SPINLOCK
       pthread_spin_init(&g_spin, 0);
#else
       pthread_mutex_init(&g_mutex, NULL);
#endif

       for (i = 0; i < threads; i++)
               pthread_create(&g_thread[i], NULL, run_amuck, (void *) i);


        /*!
        * @brief pthread_join
        *   主线程会一直等待直到等待的线程结束自己才结束
        *   对线程的资源进行回收
        */
       for (i = 0; i < threads; i++)
               pthread_join(g_thread[i], NULL);

       printf("Done.\n");

       return (0);
}


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98

测试

wmx@wmx-ubuntu:~/workspace/Multi-core scheduling/spinlockvsmutex2$ time ./mutex  
Creating 2 threads...
Thread 18389 started.
Thread 18390 started.
Thread 18389 wins!
Thread 18389 finished!
Thread 18390 finished!
Done.

real	0m3.150s
user	0m3.202s
sys	0m0.020s

1
2
3
4
5
6
7
8
9
10
11
12
13

wmx@wmx-ubuntu:~/workspace/Multi-core scheduling/spinlockvsmutex2$ time ./spin 
Creating 2 threads...
Thread 18403 started.
Thread 18404 started.
Thread 18403 wins!
Thread 18403 finished!
Thread 18404 finished!
Done.

real	0m3.107s
user	0m4.641s
sys	0m0.004s

1
2
3
4
5
6
7
8
9
10
11
12
13

这个程序的特征就是临界区非常大，这样两个线程的锁竞争会非常的剧烈。当然这个是一个极端情况，实际应用程序中临界区不会如此大，锁竞争也不会如此激烈。测试结果显示mutex版本性能更好
spin lock耗费了更多的user time。这就是因为两个线程分别运行在两个核上，大部分时间只有一个线程能拿到锁，所以另一个线程就一直在它运行的core上进行忙等待，CPU占用率一直是100%