当前位置:   article > 正文

DataFrame 遍历访问方法_dataframe如何访问

dataframe如何访问

DataFrame 遍历访问方法

1. 数据准备

(1)测试数据

构建一个有index的dataframe 数据。

import numpy as np
import pandas as pd

ts = pd.Series(np.random.randn(10), index=pd.date_range('2020-1-1', periods=10))
df = pd.DataFrame(np.random.randn(10, 4), index=ts.index, columns=list('ABCD')) 
df
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

在这里插入图片描述

(2)pandas版本

检查pandas版本

print(pd.__version__)
  • 1

2.0.3

2.访问方法

常用的一共五种方法,可以遍历dataframe数据。

(1)iterrows

通过iterrows方法,可以提取index,行记录。

for index ,row in df.iterrows() :
    print(index,row['A'],row['D']) 

2020-01-01 00:00:00 0.3641823474478886 0.7420267293577939
2020-01-02 00:00:00 -0.9086858514122141 -0.21529516253391381
2020-01-03 00:00:00 1.0707335521425283 -0.8495555020555525
2020-01-04 00:00:00 -0.9104436159077746 -1.7704251732279581
2020-01-05 00:00:00 1.6091084193842462 0.5594481402153169
2020-01-06 00:00:00 0.04828934029765889 -2.078443945278677
2020-01-07 00:00:00 -0.7111418530010771 -1.29587734532037
2020-01-08 00:00:00 0.20754578301393778 -0.39078747556747734
2020-01-09 00:00:00 1.0997255380859803 0.4272308690661768
2020-01-10 00:00:00 0.28544790543277 -0.37501666198259165
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13

看一下数据类型,index 是pandas类型的子类
row是series ,可以通过列名调用。

print(type(index))
print(type(row['A']))
print(type(row))

<class 'pandas._libs.tslibs.timestamps.Timestamp'>
<class 'numpy.float64'>
<class 'pandas.core.series.Series'>
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
(2)loc

通过index索引,组合列名访问,用loc方法

for row in df.index:
    print(df.loc[row]['A'])

0.3641823474478886
-0.9086858514122141
1.0707335521425283
-0.9104436159077746
1.6091084193842462
0.04828934029765889
-0.7111418530010771
0.20754578301393778
1.0997255380859803
0.28544790543277    
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
(3)iloc

通过shape取行数,用iloc行标,结合列名,遍历数据

for row_id in range(df.shape[0]):
    print(df.iloc[row_id]['B'])

0.2437495579604519
0.2828630441432169
0.5036532101096077
-0.9921045754369142
-0.18953453071322154
-0.17631832794049856
-1.1557403411733949
-1.9230766108049244
0.9827603665898592
1.5838796545007081
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
(4)itertuples

通过itertuples方法将行转换为tuple 类型,然后访问。
0列是索引,1列对应列名A ,3列对应列名C

for tup in df.itertuples():
    print(tup[0],tup[1],tup[3])

2020-01-01 00:00:00 0.3641823474478886 -0.5538779087811666
2020-01-02 00:00:00 -0.9086858514122141 -1.7114951319715501
2020-01-03 00:00:00 1.0707335521425283 -0.48885052901155274
2020-01-04 00:00:00 -0.9104436159077746 -0.9516150263977505
2020-01-05 00:00:00 1.6091084193842462 -1.0851994280481798
2020-01-06 00:00:00 0.04828934029765889 0.9085265155873162
2020-01-07 00:00:00 -0.7111418530010771 2.1446364650140746
2020-01-08 00:00:00 0.20754578301393778 0.4748462568719993
2020-01-09 00:00:00 1.0997255380859803 -1.0555296783745742
2020-01-10 00:00:00 0.28544790543277 2.288507229443556
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13

直接打印元组数据,效果如下:

for tup in df.itertuples():
    print(tup)

Pandas(Index=Timestamp('2020-01-01 00:00:00'), A=0.3641823474478886, B=0.2437495579604519, C=-0.5538779087811666, D=0.7420267293577939)
Pandas(Index=Timestamp('2020-01-02 00:00:00'), A=-0.9086858514122141, B=0.2828630441432169, C=-1.7114951319715501, D=-0.21529516253391381)
Pandas(Index=Timestamp('2020-01-03 00:00:00'), A=1.0707335521425283, B=0.5036532101096077, C=-0.48885052901155274, D=-0.8495555020555525)
Pandas(Index=Timestamp('2020-01-04 00:00:00'), A=-0.9104436159077746, B=-0.9921045754369142, C=-0.9516150263977505, D=-1.7704251732279581)
Pandas(Index=Timestamp('2020-01-05 00:00:00'), A=1.6091084193842462, B=-0.18953453071322154, C=-1.0851994280481798, D=0.5594481402153169)
Pandas(Index=Timestamp('2020-01-06 00:00:00'), A=0.04828934029765889, B=-0.17631832794049856, C=0.9085265155873162, D=-2.078443945278677)
Pandas(Index=Timestamp('2020-01-07 00:00:00'), A=-0.7111418530010771, B=-1.1557403411733949, C=2.1446364650140746, D=-1.29587734532037)
Pandas(Index=Timestamp('2020-01-08 00:00:00'), A=0.20754578301393778, B=-1.9230766108049244, C=0.4748462568719993, D=-0.39078747556747734)
Pandas(Index=Timestamp('2020-01-09 00:00:00'), A=1.0997255380859803, B=0.9827603665898592, C=-1.0555296783745742, D=0.4272308690661768)
Pandas(Index=Timestamp('2020-01-10 00:00:00'), A=0.28544790543277, B=1.5838796545007081, C=2.288507229443556, D=-0.37501666198259165)    
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
(5)values

通过pandas的values属性,访问数据。
0123分别对应ABCD列,效果如下:

for row in df.values:
    print(row[0], '  ', row[1], '  ', row[2], '  ', row[3])

0.3641823474478886    0.2437495579604519    -0.5538779087811666    0.7420267293577939
-0.9086858514122141    0.2828630441432169    -1.7114951319715501    -0.21529516253391381
1.0707335521425283    0.5036532101096077    -0.48885052901155274    -0.8495555020555525
-0.9104436159077746    -0.9921045754369142    -0.9516150263977505    -1.7704251732279581
1.6091084193842462    -0.18953453071322154    -1.0851994280481798    0.5594481402153169
0.04828934029765889    -0.17631832794049856    0.9085265155873162    -2.078443945278677
-0.7111418530010771    -1.1557403411733949    2.1446364650140746    -1.29587734532037
0.20754578301393778    -1.9230766108049244    0.4748462568719993    -0.39078747556747734
1.0997255380859803    0.9827603665898592    -1.0555296783745742    0.4272308690661768
0.28544790543277    1.5838796545007081    2.288507229443556    -0.37501666198259165 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13

注意:row并不是list,是numpy.ndarray

for row in df.values:
    print(row)
print(type(row))   

[ 0.36418235  0.24374956 -0.55387791  0.74202673]
[-0.90868585  0.28286304 -1.71149513 -0.21529516]
[ 1.07073355  0.50365321 -0.48885053 -0.8495555 ]
[-0.91044362 -0.99210458 -0.95161503 -1.77042517]
[ 1.60910842 -0.18953453 -1.08519943  0.55944814]
[ 0.04828934 -0.17631833  0.90852652 -2.07844395]
[-0.71114185 -1.15574034  2.14463647 -1.29587735]
[ 0.20754578 -1.92307661  0.47484626 -0.39078748]
[ 1.09972554  0.98276037 -1.05552968  0.42723087]
[ 0.28544791  1.58387965  2.28850723 -0.37501666]
<class 'numpy.ndarray'>
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
(6)iteritems

网上还有不少介绍,还可以通过iteritems方法访问,但是报错。

for index, col in df.iteritems():
    print(index,col.iloc[0])
报错信息如下:
AttributeError: 'DataFrame' object has no attribute 'iteritems'
  • 1
  • 2
  • 3
  • 4

网上查询,是原来pandas低版本有iteritems方法,据说是在1.5.X版本上有,未验证。
2.0.X版本上肯定不支持此功能。

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/天景科技苑/article/detail/903453
推荐阅读
相关标签
  

闽ICP备14008679号