当前位置:   article > 正文

Python数据分析——白葡萄酒实训_分别用numpy和pandas改写“白葡萄酒品质探索”

分别用numpy和pandas改写“白葡萄酒品质探索”

一:数据收集

变量名含义
fixed acidity固定酸度
volatile acidity挥发性酸度
citric acidity柠檬酸
residual sugar剩余糖
chlorides氧化物
free sulfur dioxide游离的二氧化碳
total sulfur dioxide总二氧化硫
density密度
PH
sulphates酸碱盐
alcohol酒精
quality品质

完整数据
实训之前我们需要先下载数据,我这里是下载完之后文件名是:white_wine.csv

二:读取数据

1:显示前5行数据

import csv

f = open("white_wine.csv",'r')
reader = csv.reader(f)

data = []
for row in reader:
    data.append(row)

for i in range(5):
    print(data[i])
f.close()
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12

前5行数据

三:数据处理

1:查看白葡萄酒总共分为几种品质等级

import csv

f = open("white_wine.csv",'r')
reader = csv.reader(f)

data = []
for row in reader:
    data.append(row)

quality_list = []
for row in data[1:]:
    quality_list.append(int(row[ -1]))

quality_count = set(quality_list)

print("白葡萄酒共有%s种等级, 分别为:%r"
      %(len(quality_count), quality_count))

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
白葡萄酒共有7种等级, 分别为:{3, 4, 5, 6, 7, 8, 9}
  • 1

2:统计等级及其数量

import csv

f = open("white_wine.csv",'r')
reader = csv.reader(f)

data = []
for row in reader:
    data.append(row)


content_dict = {}
for row in data[1:]:
    quality = int(row[-1])
    if quality not in content_dict.keys():
        content_dict[quality] = [row]
    else:
        content_dict[quality].append(row)

for key in content_dict:
    print('等级为%d, 数量为%d' %(key, len(content_dict[key])))

f.close()

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
等级为6, 数量为1539
等级为5, 数量为1020
等级为7, 数量为616
等级为8, 数量为123
等级为4, 数量为115
等级为3, 数量为14
等级为9, 数量为4
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

条形图展示

import csv
import numpy as np
import matplotlib.pyplot as plt

f = open("white_wine.csv",'r')
reader = csv.reader(f)

data = []
for row in reader:
    data.append(row)


content_dict = {}
for row in data[1:]:
    quality = int(row[-1])
    if quality not in content_dict.keys():
        content_dict[quality] = [row]
    else:
        content_dict[quality].append(row)

x = []
y = []
for key in content_dict:
    x.append(key)
    y.append(len(content_dict[key]))

plt.bar(x, y)
plt.show()
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28

在这里插入图片描述

3:计算每个数据集中fixed acidity的均值

import csv

f = open("white_wine.csv",'r')
reader = csv.reader(f)

data = []
for row in reader:
    data.append(row)


content_dict = {}
for row in data[1:]:
    quality = int(row[-1])
    if quality not in content_dict.keys():
        content_dict[quality] = [row]
    else:
        content_dict[quality].append(row)

mean_list = []
for key,value in content_dict.items():
    sum = 0
    for row in value:
        sum += float(row[0])

    mean_list.append((key, sum / len(value)))
for item in mean_list:
    print(item[0],",", item[1])

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
6 : 6.812085769980511
5 : 6.907843137254891
7 : 6.755844155844158
8 : 6.708130081300811
4 : 7.052173913043476
3 : 7.535714285714286
9 : 7.5
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/喵喵爱编程/article/detail/748929
推荐阅读
相关标签
  

闽ICP备14008679号