当前位置:   article > 正文

pandas操作excel-基础部分_pandas创建excel

pandas创建excel

B站链接【Python自动化办公--Pandas玩转Excel(全30集)-哔哩哔哩】 https://b23.tv/Xk1r46d

1 创建文件 

DataFrame:数据帧,相当于excel的sheet

to_excel:写入表格

2 读取文件 
print(df)

 输出df

 

 设置行索引为id,避免自动生成索引

  1. import pandas as pd
  2. df = pd.DataFrame({'id':[1,2,3], 'name':['zhang', 'liu', 'qin']})
  3. # df.to_excel('D:\output.xlsx')
  4. df.set_index('id')
  5. print(df)
  6. print('Down!')

  1. import pandas as pd
  2. df = pd.DataFrame(pd.read_excel('D:\数据解析需求\需求.xlsx'))
  3. print(df.shape)
  4. print(df.columns)
  5. print(df.head(2))
  6. print(df.tail())

shape:文件大小

columns:列title

head:打印前5行数据,可自定义

tail:打印后五行数据

3 行、列、单元格 

 如果第一行为无效数据,可指定数据第一行的位置

如果第一行为空,可以不用指定

df = pd.DataFrame(pd.read_excel('D:\数据解析需求\需求.xlsx', header=1))
  1. df = pd.DataFrame(pd.read_excel('D:\output.xlsx', header=None))
  2. df.columns = ['sum', 'id', 'time']
  3. print(df.columns)
  4. df.to_excel('D:\output.xlsx')

 如果没有header,可以调用columns创建,但保存的时候会自动生成索引

df.set_index('id', inplace=True)

在原来的dataframe上改,不生成索引

df = pd.DataFrame(pd.read_excel('D:\output.xlsx', index_col='id'))

设置id为索引,不会再自动生成

  1. import pandas as pd
  2. d = {'x':2, 'y':34, 'z':99}
  3. s1 = pd.Series(d)
  4. print(s1.index)

生成序列对象,属性有index

  1. import pandas as pd
  2. s1 = pd.Series([1,2,3], index=[1,2,3], name='A')
  3. s2 = pd.Series([10,20,30], index=[1,2,3], name='B')
  4. s3 = pd.Series([100,200,300], index=[1,2,3], name='C')
  5. df = pd.DataFrame([s1, s2, s3])
  6. print(df)

 

可以分别把值和索引告诉它

  1. import pandas as pd
  2. s1 = pd.Series([1,2,3], index=[1, 2, 3], name='A')
  3. s2 = pd.Series([10,20,30], index=[1, 2, 3], name='B')
  4. s3 = pd.Series([100,200,300], index=[1, 2, 3], name='C')
  5. df = pd.DataFrame({s1.name: s1, s2.name: s2, s3.name: s3})
  6. print(df)

如果要以列的方式加进去,要用dict的方式

  1. df = pd.DataFrame(pd.read_excel('D:\output.xlsx'))
  2. print(df)

 

数据不靠边的输出形式

  1. df = pd.DataFrame(pd.read_excel('D:\output.xlsx', skiprows=4, usecols='D: F'))
  2. print(df)
  3. print(type(df['sum']))

skiprows跳过行数

usrcols使用的列

4 数据区域读取填充数字 
  1. df = pd.DataFrame(pd.read_excel('D:\output.xlsx', skiprows=4, usecols='C: F'))
  2. for i in df.index:
  3. df['id'].at[i] = i+1
  4. print(df)

 

使用at[]方法自动填充

print(df['id'])

 

df = pd.DataFrame(pd.read_excel('D:\output.xlsx', skiprows=4, usecols='C: F', index_col=None, dtype={'id': str}))

 

把id类型设置成str

5 填充日期序列 
  1. import pandas as pd
  2. from datetime import date, timedelta
  3. df = pd.DataFrame(pd.read_excel('D:\output.xlsx', skiprows=4, usecols='C: F', index_col=None, dtype={'id': str}))
  4. start = date(2018, 1, 1)
  5. for i in df.index:
  6. df['id'].at[i] = i+1
  7. df['score'].at[i] = 'Yes' if i%2==0 else 'No'
  8. df['date'].at[i] = start + timedelta(days=i)
  9. print(df)

 

自动填充score和日期

  1. def add_month(d,md):
  2. yd = md // 12
  3. m = d.month + md % 12
  4. if m != 12:
  5. yd += m//12
  6. m = m%12
  7. return date(d.year + yd, m, d.day)
  8. df = pd.DataFrame(pd.read_excel('D:\output.xlsx', skiprows=4, usecols='C: F', index_col=None,
  9. dtype={'id': str}))
  10. start = date(2018, 1, 31)
  11. for i in df.index:
  12. df['id'].at[i] = i+1
  13. df['score'].at[i] = 'Yes' if i%2==0 else 'No'
  14. df['date'].at[i] = add_month(start, i)
  15. print(df)

填充月份的时候注意超出界限会报错

6 函数填充,计算列 
  1. import pandas as pd
  2. books = pd.read_excel('d:/test.xlsx', index_col='id')
  3. # 操作列
  4. books['price'] = books['score'] * books['discount']
  5. books['price'] = books['score'] * 0.8
  6. books['price'] = 0.8 * books['score']
  7. # 操作单元格
  8. # for i in books.index:
  9. # books['price'].at[i] = books['score'].at[i] * books['discount'].at[i]
  10. print(books)

books['score'] += 50

  1. def add_2(x):
  2. return x+2
  3. # books['score'] = books['score'].apply(add_2)
  4. books['score'] = books['score'].apply(lambda x: x+2)

7 排序,多重排序 
  1. # products.sort_values(by='score', inplace=True)
  2. # 降序排序:
  3. products.sort_values(by='score', inplace=True, ascending=False)

  1. products.sort_values(by=['price', 'score'], inplace=True, ascending=False)
  2. # 两列按不同方法排序
  3. products.sort_values(by=['price', 'score'], inplace=True, ascending=[True, False])

 

8 数据筛选、过滤 
  1. import pandas as pd
  2. def score_a(s):
  3. return 60<=s<=100
  4. def price_y(p):
  5. return p=='yes'
  6. df = pd.read_excel('d:/test.xlsx', index_col='id')
  7. # df = df.loc[df['score'].apply(score_a)].loc[df['price'].apply(price_y)]
  8. # df = df.loc[df.score.apply(score_a)].loc[df.price.apply(price_y)]
  9. df = df.loc[df.score.apply(lambda s:60<=s<=100)].loc[df.price.apply(lambda p:p=='yes')]
  10. print(df)

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/知新_RL/article/detail/444167
推荐阅读
相关标签
  

闽ICP备14008679号