赞
踩
- df = pd.read_csv('./data/CDNOW_master.txt',header=None,sep='\s+',names=['user_id','order_dt','order_product','order_amount']) #sep='\s+' 分割间隔 一个或多个空格
- df.head()
- df.shape
- (69659, 4)
- #查看数据类型
- df.info()
-
- <class 'pandas.core.frame.DataFrame'>
- RangeIndex: 69659 entries, 0 to 69658
- Data columns (total 4 columns):
- user_id 69659 non-null int64
- order_dt 69659 non-null int64
- order_product 69659 non-null int64
- order_amount 69659 non-null float64
- dtypes: float64(1), int64(3)
- memory usage: 2.1 MB
- #order_dt转换成时间序列,且加一列为购买商品的月份
- df['order_dt'] = pd.to_datetime(df['order_dt'],format="%Y%m%d")
- df.head()
- df['month'] = df['order_dt'].astype('datetime64[M]')
- df.head()
df.describe() #对数据源中的数值型数据的描述
- #用户每月花费的总金额
- df.groupby(by='month')['order_amount'].sum()
- month
- 1997-01-01 299060.17
- 1997-02-01 379590.03
- 1997-03-01 393155.27
- 1997-04-01 142824.49
- 1997-05-01 107933.30
- 1997-06-01 108395.87
- 1997-07-01 122078.88
- 1997-08-01 88367.69
- 1997-09-01 81948.80
- 1997-10-01
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。