当前位置:   article > 正文

pandas.read_csv() 参数 header整理_pandas read_csv header

pandas read_csv header

pandas.read_csv() 官方文档

 

header : int, list of int, default ‘infer’

指定行数用来作为列名,数据开始行数。如果文件中没有列名,则默认为0,否则设置为None。如果明确设定header=0 就会替换掉原来存在列名。header参数可以是一个list例如:[0,1,3],这个list表示将文件中的这些行作为列标题(意味着每一列有多个标题),介于中间的行将被忽略掉(例如本例中的2;本例中的数据1,2,4行将被作为多级标题出现,第3行数据将被丢弃,dataframe的数据从第5行开始。)。

注意:如果skip_blank_lines=True 那么header参数忽略注释行和空行,所以header=0表示第一行数据而不是文件的第一行。

举例如下:

导入pandas

import pandas as pd  

1 数据有列名

 AgeGenderEducationEducationFieldMaritalStatusIncomeOverTime
037Male4Life SciencesDivorced5993No
154Female4Life SciencesDivorced10502No
234Male3Life SciencesSingle6074Yes
339Female1Life SciencesMarried12742No
428Male3MedicalDivorced2596No
524Female1MedicalMarried4162Yes
629Male5OtherSingle3983No
736Male2MedicalMarried7596No
833Female4MedicalMarried2622No
934Female4Technical DegreeSingle6687No
1024Male1Human ResourcesMarried1555No

1.1 header默认,文件中没有列名,则默认为0,否则设置为None。

  1. data = pd.read_csv('./train.csv')
  2. print(data.head(5))

输出结果:

  1. Age Gender Education EducationField MaritalStatus Income OverTime
  2. 0 37 Male 4 Life Sciences Divorced 5993 No
  3. 1 54 Female 4 Life Sciences Divorced 10502 No
  4. 2 34 Male 3 Life Sciences Single 6074 Yes
  5. 3 39 Female 1 Life Sciences Married 12742 No
  6. 4 28 Male 3 Medical Divorced 2596 No

1.2 header=0, header 等于n,则第n行作为列名,Dataframe 从n+1行的数据开始。

  1. data = pd.read_csv('./train.csv', header=0)
  2. print(data.head(5))

输出结果:

  1. Age Gender Education EducationField MaritalStatus Income OverTime
  2. 0 37 Male 4 Life Sciences Divorced 5993 No
  3. 1 54 Female 4 Life Sciences Divorced 10502 No
  4. 2 34 Male 3 Life Sciences Single 6074 Yes
  5. 3 39 Female 1 Life Sciences Married 12742 No
  6. 4 28 Male 3 Medical Divorced 2596 No

1.3 header=1, header 等于n,则第n行作为列名,Dataframe 从n+1行的数据开始。

  1. data = pd.read_csv('./train.csv', header=1)
  2. print(data.head(5))

输出结果:

  1. 37 Male 4 Life Sciences Divorced 5993 No
  2. 0 54 Female 4 Life Sciences Divorced 10502 No
  3. 1 34 Male 3 Life Sciences Single 6074 Yes
  4. 2 39 Female 1 Life Sciences Married 12742 No
  5. 3 28 Male 3 Medical Divorced 2596 No
  6. 4 24 Female 1 Medical Married 4162 Yes

1.4 header=[2] 和 header=2 效果一样

  1. data = pd.read_csv('./train.csv', header=[2])
  2. print(data.head(5))

输出结果:

  1. 54 Female 4 Life Sciences Divorced 10502 No
  2. 0 34 Male 3 Life Sciences Single 6074 Yes
  3. 1 39 Female 1 Life Sciences Married 12742 No
  4. 2 28 Male 3 Medical Divorced 2596 No
  5. 3 24 Female 1 Medical Married 4162 Yes
  6. 4 29 Male 5 Other Single 3983 No

1.5 header=[0, 2, 3],表示将文件中的 第 0, 2, 3 行 作为列标题(意味着每一列有多个标题)。数据的0,2,3行将被作为多级标题出现,  第1行数据将被丢弃,dataframe的数据从第4行开始。

  1. data = pd.read_csv('./train.csv', header=[0,2,3])
  2. print(data.head(5))

输出结果:

  1. Age Gender Education EducationField MaritalStatus Income OverTime
  2. 54 Female 4 Life Sciences Divorced 10502 No
  3. 34 Male 3 Life Sciences Single 6074 Yes
  4. 0 39 Female 1 Life Sciences Married 12742 No
  5. 1 28 Male 3 Medical Divorced 2596 No
  6. 2 24 Female 1 Medical Married 4162 Yes
  7. 3 29 Male 5 Other Single 3983 No
  8. 4 36 Male 2 Medical Married 7596 No

1.6 header = None

  1. data = pd.read_csv('./train.csv', header=None)
  2. print(data.head(5))

输出结果:

  1. 0 1 2 3 4 5 6
  2. 0 Age Gender Education EducationField MaritalStatus Income OverTime
  3. 1 37 Male 4 Life Sciences Divorced 5993 No
  4. 2 54 Female 4 Life Sciences Divorced 10502 No
  5. 3 34 Male 3 Life Sciences Single 6074 Yes
  6. 4 39 Female 1 Life Sciences Married 12742 No

 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/我家小花儿/article/detail/202278
推荐阅读
相关标签
  

闽ICP备14008679号