当前位置:   article > 正文

Pandas-数据操作-数值型(二):累计统计函数【cumsum、cumprod、cummax、cummin】【计算前1/2/3/…/n个数的和、积、最大值、最小值】_pandas cumsum

pandas cumsum

一、累计统计函数

函数作用
cumsum计算前1/2/3/…/n个数的和
cummax计算前1/2/3/…/n个数的最大值
cummin计算前1/2/3/…/n个数的最小值
cumprod计算前1/2/3/…/n个数的积
import numpy as np
import pandas as pd

# np.nan :空值
df = pd.DataFrame({'key1': np.arange(10),
                   'key2': np.random.rand(10) * 10})
print("df = \n", df)
print('-' * 200)

key1_cumsum = df['key1'].cumsum()
key2_cumsum = df['key2'].cumsum()

print("key1_cumsum = \n{0} \ntype(key1_cumsum) = {1}".format(key1_cumsum, type(key1_cumsum)))
print('-' * 50)
print("key2_cumsum = \n{0} \ntype(key2_cumsum) = {1}".format(key2_cumsum, type(key2_cumsum)))
print('-' * 50)
df['key1_cumsum'] = df['key1'].cumsum()
df['key2_cumsum'] = df['key2'].cumsum()
print("添加cumsum样本的累计和之后: df = \n", df)
print('-' * 200)

key1_cumprod = df['key1'].cumprod()
key2_cumprod = df['key2'].cumprod()

print("key1_cumprod = \n{0} \ntype(key1_cumprod) = {1}".format(key1_cumprod, type(key1_cumprod)))
print('-' * 50)
print("key2_cumprod = \n{0} \ntype(key2_cumprod) = {1}".format(key2_cumprod, type(key2_cumprod)))
print('-' * 50)
df['key1_cumprod'] = key1_cumprod
df['key2_cumprod'] = key2_cumprod
print("添加cumprod样本的累计积之后: df = \n", df)
print('-' * 200)

# cummax,cummin分别求累计最大值,累计最小值,会填充key1,和key2的值,返回新的对象
df1 = df.cummax()
df2 = df.cummin()

print("df = \n", df)
print('-' * 50)
print("df1 = df.cummax() = \n", df1)
print('-' * 50)
print("df2 = df.cummin() = \n", df2)
print('-' * 200)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43

打印结果:

df = 
    key1      key2
0     0  5.946567
1     1  6.500338
2     2  0.517269
3     3  6.888832
4     4  0.029891
5     5  6.908777
6     6  4.522801
7     7  6.755125
8     8  6.676930
9     9  3.002233
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
key1_cumsum = 
0     0
1     1
2     3
3     6
4    10
5    15
6    21
7    28
8    36
9    45
Name: key1, dtype: int32 
type(key1_cumsum) = <class 'pandas.core.series.Series'>
--------------------------------------------------
key2_cumsum = 
0     5.946567
1    12.446905
2    12.964174
3    19.853006
4    19.882897
5    26.791673
6    31.314474
7    38.069599
8    44.746529
9    47.748762
Name: key2, dtype: float64 
type(key2_cumsum) = <class 'pandas.core.series.Series'>
--------------------------------------------------
添加cumsum样本的累计和之后: df = 
    key1      key2  key1_cumsum  key2_cumsum
0     0  5.946567            0     5.946567
1     1  6.500338            1    12.446905
2     2  0.517269            3    12.964174
3     3  6.888832            6    19.853006
4     4  0.029891           10    19.882897
5     5  6.908777           15    26.791673
6     6  4.522801           21    31.314474
7     7  6.755125           28    38.069599
8     8  6.676930           36    44.746529
9     9  3.002233           45    47.748762
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
key1_cumprod = 
0    0
1    0
2    0
3    0
4    0
5    0
6    0
7    0
8    0
9    0
Name: key1, dtype: int32 
type(key1_cumprod) = <class 'pandas.core.series.Series'>
--------------------------------------------------
key2_cumprod = 
0        5.946567
1       38.654696
2       19.994865
3      137.741271
4        4.117176
5       28.444652
6      128.649488
7      869.043329
8     5802.541623
9    17420.580379
Name: key2, dtype: float64 
type(key2_cumprod) = <class 'pandas.core.series.Series'>
--------------------------------------------------
添加cumprod样本的累计积之后: df = 
    key1      key2  key1_cumsum  key2_cumsum  key1_cumprod  key2_cumprod
0     0  5.946567            0     5.946567             0      5.946567
1     1  6.500338            1    12.446905             0     38.654696
2     2  0.517269            3    12.964174             0     19.994865
3     3  6.888832            6    19.853006             0    137.741271
4     4  0.029891           10    19.882897             0      4.117176
5     5  6.908777           15    26.791673             0     28.444652
6     6  4.522801           21    31.314474             0    128.649488
7     7  6.755125           28    38.069599             0    869.043329
8     8  6.676930           36    44.746529             0   5802.541623
9     9  3.002233           45    47.748762             0  17420.580379
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
df = 
    key1      key2  key1_cumsum  key2_cumsum  key1_cumprod  key2_cumprod
0     0  5.946567            0     5.946567             0      5.946567
1     1  6.500338            1    12.446905             0     38.654696
2     2  0.517269            3    12.964174             0     19.994865
3     3  6.888832            6    19.853006             0    137.741271
4     4  0.029891           10    19.882897             0      4.117176
5     5  6.908777           15    26.791673             0     28.444652
6     6  4.522801           21    31.314474             0    128.649488
7     7  6.755125           28    38.069599             0    869.043329
8     8  6.676930           36    44.746529             0   5802.541623
9     9  3.002233           45    47.748762             0  17420.580379
--------------------------------------------------
df1 = df.cummax() = 
    key1      key2  key1_cumsum  key2_cumsum  key1_cumprod  key2_cumprod
0     0  5.946567            0     5.946567             0      5.946567
1     1  6.500338            1    12.446905             0     38.654696
2     2  6.500338            3    12.964174             0     38.654696
3     3  6.888832            6    19.853006             0    137.741271
4     4  6.888832           10    19.882897             0    137.741271
5     5  6.908777           15    26.791673             0    137.741271
6     6  6.908777           21    31.314474             0    137.741271
7     7  6.908777           28    38.069599             0    869.043329
8     8  6.908777           36    44.746529             0   5802.541623
9     9  6.908777           45    47.748762             0  17420.580379
--------------------------------------------------
df2 = df.cummin() = 
    key1      key2  key1_cumsum  key2_cumsum  key1_cumprod  key2_cumprod
0     0  5.946567            0     5.946567             0      5.946567
1     0  5.946567            0     5.946567             0      5.946567
2     0  0.517269            0     5.946567             0      5.946567
3     0  0.517269            0     5.946567             0      5.946567
4     0  0.029891            0     5.946567             0      4.117176
5     0  0.029891            0     5.946567             0      4.117176
6     0  0.029891            0     5.946567             0      4.117176
7     0  0.029891            0     5.946567             0      4.117176
8     0  0.029891            0     5.946567             0      4.117176
9     0  0.029891            0     5.946567             0      4.117176
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Process finished with exit code 0
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
  • 134
  • 135
  • 136

二、累计统计函数怎么用?

在这里插入图片描述
以上这些函数可以对series和dataframe操作

这里我们按照时间的从前往后来进行累计

  • 排序
    # 排序之后,进行累计求和
    data = data.sort_index()
    
    • 1
    • 2
  • 对p_change进行求和
    stock_rise = data['p_change']
    # plot方法集成了前面直方图、条形图、饼图、折线图
    stock_rise.cumsum()
    
    2015-03-02      2.62
    2015-03-03      4.06
    2015-03-04      5.63
    2015-03-05      7.65
    2015-03-06     16.16
    2015-03-09     16.37
    2015-03-10     18.75
    2015-03-11     16.36
    2015-03-12     15.03
    2015-03-13     17.58
    2015-03-16     20.34
    2015-03-17     22.42
    2015-03-18     23.28
    2015-03-19     23.74
    2015-03-20     23.48
    2015-03-23     23.74
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20

使用matplotlib画出连续求和的结果:

在这里插入图片描述

如果要使用plot函数,需要导入matplotlib.

import matplotlib.pyplot as plt
# plot显示图形
stock_rise.cumsum().plot()
# 需要调用show,才能显示出结果
plt.show()
  • 1
  • 2
  • 3
  • 4
  • 5
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/不正经/article/detail/204717
推荐阅读
相关标签
  

闽ICP备14008679号