赞
踩
Instacart是一款在线订购日用商品的app,数据集提供了约3百万条订单记录,这里分两部分做一下简单的分析
第一部分:描述统计
第二部分:关联分析(Market-Basket)肖月:挖掘Kaggle数据集·Instacart订单分析(二)zhuanlan.zhihu.com
第一部分:描述统计
先预览梳理下数据,做一些基本的描述性统计,画图比较下影响商品销量和回购的几个因素
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os,sys
from itertools import combinations, groupby
from collections import Counter
color = sns.color_palette()
数据集内容:
这是一套描述不同时段顾客购买行为的关系数据集,先把数据集导入进来看下基本信息,进行数据清洗。
from subprocess import check_output
print(check_output(["ls","../market_sells_orders/input"]).decode("utf8"))
df_order_products = pd.read_csv('../market_sells_orders/input/order_products__prior.csv')
print('order_products contains %s orders with columns:'%len(df_order_products))
print(' '+', '.join(df_order_products.columns.values))
df_orders = pd.read_csv('../market_sells_orders/input/orders.csv')
df_orders = df_orders[df_orders['eval_set']=='prior']
df_orders.drop(columns=['eval_set'],inplace=True)
print('orders contains %s orders with columns:'%len(df_orders))
print(' '+', '.join(df_orders.columns.values))
df_aisles = pd.read_csv('../market_sells_orders/input/aisles.csv')
print('aisles contains %s aisles with columns:'%len(df_aisles))
print(' '+', '.join(df_aisles.columns.values))
df_department = pd.read_csv('../market_sells_orders/input/departments.csv')
print('department contains %s departments with columns:'%len(df_department))
print(' '+', '.join(df_department.columns.values))
df_products = pd.read_csv('../market_sells_orders/input/products.csv')
print('products contains %s products with columns:'%len(df_products))
print('
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。