赞
踩
title: D08|Pandas DataFrame插入、关联、修改、删除
author: Adolph Lee
categories: 数据挖掘基础
tags:
insert(self, loc, column, value, allow_duplicates=False)
import pandas as pdimport numpy as npmain_df = pd.DataFrame(np.arange(0,30).reshape(6,5),index=['a','c','d','e','f','b'],columns=['A','B','C','D','E'])sub_df = pd.DataFrame(np.arange(100,118).reshape(6,3),columns=['A','B','C'])main_df.insert(3,'F',sub_df.loc[:,['A']].values)print(main_df)
append(self, other, ignore_index=False, verify_integrity=False, sort=None)
import pandas as pdimport numpy as npmain_df = pd.DataFrame(np.arange(0,30).reshape(6,5),index=['a','c','d','e','f','b'],columns=['A','B','C','D','E'])sub_df = pd.DataFrame(np.arange(100,118).reshape(6,3),columns=['A','B','C'])main_df = main_df.append(sub_df.loc[[5],:],ignore_index=True,sort=False)print(main_df)
join(self, other, on=None, how=’left’, lsuffix=’’, rsuffix=’’, sort=False)
同通过行索引或指定列,关联另一个列表或Series中的元素,返回一个新列表。类似sql的join。可以将其描述为一个DataFrame列或行索引与另一个DataFrame的行索引关联。
import pandas as pdimport numpy as npleft_df = pd.DataFrame(np.arange(0,30).reshape(6,5),columns=['A','B','C','D','E'])right_df = pd.DataFrame(np.arange(0,18).reshape(6,3),columns=['A','F','G'])print(left_df,'',right_df)# 左DataFrame的‘A’列关联右DataFrame 行索引new_df = left_df.join(right_df,on='A',how='left',lsuffix='_left',rsuffix='_right')print(new_df) # 可以看到右DataFrame第0行和第5行被关联成功# 左DataFrame的行索引关联右DataFrame的行索引new_df = left_df.join(right_df,lsuffix='_left',rsuffix='_right')print(new_df)# 左DataFrame的'A'列与右DataFrame的'A'列关联new_df = left_df.join(right_df.set_index('A'),on='A') # 将右DataFrame的’A‘列设置为索引即可print(new_df)new_df = left_df.set_index('A').join(right_df.set_index('A')) # 或将两个DataFrame的行索引都设置为需要关联的列print(new_df)
merge(self, right, how=’inner’, on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=(‘_x’, ‘_y’), copy=True, indicator=False, validate=None)
这个方法参数很多,不要被吓到,其实它跟join类似,比join强大的地方在于,它可以指定任意列进行关联,而不需要将其转换为索引的形式,它同时也兼容了join的功能。
import pandas as pdimport numpy as npleft_df = pd.DataFrame(np.arange(0,30).reshape(6,5),columns=['F','B','A','D','E'])right_df = pd.DataFrame(np.arange(0,18).reshape(6,3),columns=['A','F','G'])print(left_df,'',right_df)# 左DataFrame的‘B’列关联右DataFrame‘F’列new_df = left_df.merge(right_df,left_on='B',right_on='F',how='left',suffixes=('_left','_right'))print(new_df)# 左DataFrame的‘F’列关联右DataFrame‘F’列 # 当使用左/右关联时,作为关联列的Key在左右侧名称相同时,会只保留左侧/右侧。new_df = left_df.merge(right_df,on='A',how='left',suffixes=('_left','_right'))print(new_df)
update(self, other, join=’left’, overwrite=True, filter_func=None, errors=’ignore’)
关联并替换原有的值,通过列索引关联,并替换相同列索引对应的值
import pandas as pdimport numpy as npleft_df = pd.DataFrame(np.arange(0,30).reshape(6,5),columns=['F','B','A','D','E'])right_df = pd.DataFrame(np.arange(100,118).reshape(6,3),index=[0,2,4,6,8,10],columns=['A','F','G'])print(left_df,'',right_df)left_df.update(right_df)# 建议自己替换一下参数,观察替换结果的变化print(left_df)
import pandas as pdimport numpy as npleft_df = pd.DataFrame(np.arange(0,30).reshape(6,5),columns=['F','B','A','D','E'])print(left_df)left_df.at[1,'A'] = 999print(left_df)left_df.iloc[1:6,[3,4]] = 999print(left_df)
import pandas as pdimport numpy as npleft_df = pd.DataFrame(np.arange(0,30).reshape(6,5),columns=['F','B','A','D','E'])pop_A = left_df.pop('A')print(pop_A)
drop(self, labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors=’raise’)
通过指定行列索引来删除行或列
import pandas as pdimport numpy as npleft_df = pd.DataFrame(np.arange(0,30).reshape(6,5),columns=['F','B','A','D','E'])left_df.drop(columns='A',inplace=True)print(left_df)
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。