赞
踩
header : int, list of int, default ‘infer’
指定行数用来作为列名,数据开始行数。如果文件中没有列名,则默认为0,否则设置为None。如果明确设定header=0 就会替换掉原来存在列名。header参数可以是一个list例如:[0,1,3],这个list表示将文件中的这些行作为列标题(意味着每一列有多个标题),介于中间的行将被忽略掉(例如本例中的2;本例中的数据1,2,4行将被作为多级标题出现,第3行数据将被丢弃,dataframe的数据从第5行开始。)。
注意:如果skip_blank_lines=True 那么header参数忽略注释行和空行,所以header=0表示第一行数据而不是文件的第一行。
举例如下:
导入pandas库
import pandas as pd
1 数据有列名
Age | Gender | Education | EducationField | MaritalStatus | Income | OverTime | |
0 | 37 | Male | 4 | Life Sciences | Divorced | 5993 | No |
1 | 54 | Female | 4 | Life Sciences | Divorced | 10502 | No |
2 | 34 | Male | 3 | Life Sciences | Single | 6074 | Yes |
3 | 39 | Female | 1 | Life Sciences | Married | 12742 | No |
4 | 28 | Male | 3 | Medical | Divorced | 2596 | No |
5 | 24 | Female | 1 | Medical | Married | 4162 | Yes |
6 | 29 | Male | 5 | Other | Single | 3983 | No |
7 | 36 | Male | 2 | Medical | Married | 7596 | No |
8 | 33 | Female | 4 | Medical | Married | 2622 | No |
9 | 34 | Female | 4 | Technical Degree | Single | 6687 | No |
10 | 24 | Male | 1 | Human Resources | Married | 1555 | No |
1.1 header默认,文件中没有列名,则默认为0,否则设置为None。
- data = pd.read_csv('./train.csv')
- print(data.head(5))
输出结果:
- Age Gender Education EducationField MaritalStatus Income OverTime
- 0 37 Male 4 Life Sciences Divorced 5993 No
- 1 54 Female 4 Life Sciences Divorced 10502 No
- 2 34 Male 3 Life Sciences Single 6074 Yes
- 3 39 Female 1 Life Sciences Married 12742 No
- 4 28 Male 3 Medical Divorced 2596 No
1.2 header=0, header 等于n,则第n行作为列名,Dataframe 从n+1行的数据开始。
- data = pd.read_csv('./train.csv', header=0)
- print(data.head(5))
输出结果:
- Age Gender Education EducationField MaritalStatus Income OverTime
- 0 37 Male 4 Life Sciences Divorced 5993 No
- 1 54 Female 4 Life Sciences Divorced 10502 No
- 2 34 Male 3 Life Sciences Single 6074 Yes
- 3 39 Female 1 Life Sciences Married 12742 No
- 4 28 Male 3 Medical Divorced 2596 No
1.3 header=1, header 等于n,则第n行作为列名,Dataframe 从n+1行的数据开始。
- data = pd.read_csv('./train.csv', header=1)
- print(data.head(5))
输出结果:
- 37 Male 4 Life Sciences Divorced 5993 No
- 0 54 Female 4 Life Sciences Divorced 10502 No
- 1 34 Male 3 Life Sciences Single 6074 Yes
- 2 39 Female 1 Life Sciences Married 12742 No
- 3 28 Male 3 Medical Divorced 2596 No
- 4 24 Female 1 Medical Married 4162 Yes
1.4 header=[2] 和 header=2 效果一样
- data = pd.read_csv('./train.csv', header=[2])
- print(data.head(5))
输出结果:
- 54 Female 4 Life Sciences Divorced 10502 No
- 0 34 Male 3 Life Sciences Single 6074 Yes
- 1 39 Female 1 Life Sciences Married 12742 No
- 2 28 Male 3 Medical Divorced 2596 No
- 3 24 Female 1 Medical Married 4162 Yes
- 4 29 Male 5 Other Single 3983 No
1.5 header=[0, 2, 3],表示将文件中的 第 0, 2, 3 行 作为列标题(意味着每一列有多个标题)。数据的0,2,3行将被作为多级标题出现, 第1行数据将被丢弃,dataframe的数据从第4行开始。
- data = pd.read_csv('./train.csv', header=[0,2,3])
- print(data.head(5))
输出结果:
- Age Gender Education EducationField MaritalStatus Income OverTime
- 54 Female 4 Life Sciences Divorced 10502 No
- 34 Male 3 Life Sciences Single 6074 Yes
- 0 39 Female 1 Life Sciences Married 12742 No
- 1 28 Male 3 Medical Divorced 2596 No
- 2 24 Female 1 Medical Married 4162 Yes
- 3 29 Male 5 Other Single 3983 No
- 4 36 Male 2 Medical Married 7596 No
1.6 header = None
- data = pd.read_csv('./train.csv', header=None)
- print(data.head(5))
输出结果:
- 0 1 2 3 4 5 6
- 0 Age Gender Education EducationField MaritalStatus Income OverTime
- 1 37 Male 4 Life Sciences Divorced 5993 No
- 2 54 Female 4 Life Sciences Divorced 10502 No
- 3 34 Male 3 Life Sciences Single 6074 Yes
- 4 39 Female 1 Life Sciences Married 12742 No
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。