当前位置:   article > 正文

python画箱线图采坑总结_箱线图为什么下面的线没有了

箱线图为什么下面的线没有了

我现在是有5组数据,我想画成下图这种形式,因为我想给每一个图都指定一个上下边缘,然后不在这个范围内的数就画成异常值。在网上找资料找了很久,都没有找到方法,其实是我自己没搞懂箱线图的原理。

在这里插入图片描述如下图所示,每一个箱线图都有上边缘,下边缘,箱体,异常值组成,箱体的上边是上四分位数,下边是下四分位数,中间是中位数
箱形图有5个参数:
下边缘(Q1),
下四分位数(Q2),又称“第一四分位数”,等于该样本中所有数值由小到大排列后第25%的数字;
中位数(Q3),又称“第二四分位数”等于该样本中所有数值由小到大排列后第50%的数字;
上四分位数(Q4),又称“第三四分位数”等于该样本中所有数值由小到大排列后第75%的数字;
上边缘(Q5),
异常值:超过上边缘或者下边缘的值
千万不要跟我一样以为上边缘是最大值,下边缘是最小值
上下边缘的确定是Q2-1.5IQR和Q4+1.5IQR,其中IQR=Q4-Q2;

在这里插入图片描述下面我们来用程序验证一下

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data={'neutral':[55,52,52,52,51,51,50,50,50,48,48,48,47,47,47,47,47,46,46,46,46,45,45,45,45,44,44,44,44,44,44,44,43,43,43,43,43,42,42,42,42,42,42,41,41,41,41,41,41,41,40,40,40,40,40,40,40,40,39,39,39,38,38,38,38,38,38,38,38,38,38,37,37,37,37,37,37,37,37,37,37,37,37,37,36,36,36,36,36,36,36,36,36,36,36,36,35,35,35,35,35,35,35,35,35,35,34,34,34,34,34,34,34,34,34,34,34,33,33,33,33,33,33,33,33,32,32,32,32,32,32,32,32,32,32,31,31,31,31,31,31,31,30,30,30,30,30,30,30,30,30,29,29,29,29,29,29,28,28,28,28,28,28,28,27,27,27,27,27,27,27,27,27,26,26,26,26,26,25,25,25,25,25,25,24,24,24,24,23,22,21,21,20,20,20,20,20,18,16,12]}
df = pd.DataFrame(data)
print(df.describe())
df.plot.box(title="Consumer spending in each country",whis=1.5)
plt.grid(linestyle="--", alpha=0.3)
plt.show()
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

结果展示
在这里插入图片描述我发现程序有一个异常点(这个异常点看坐标应该是12),而且上边缘是这个数列的最大值,下边缘不是这个数列的最小值,我们来计算一下
IQR=Q4-Q2=40.25-30=10.25
Q1=Q2-1.5IQR=30-1.5×10.25=14.625,那么下边缘应该是14.625,12超出了这个范围,所以被判为异常点

Q5=Q4+1.5IQR=40.25+15.375=55.625,这个数列所有的数都没有超过这个上边缘(最大只有55),所以上边没有异常点

现在我们改一下这个数列,再来看看结果

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data={'neutral':[80,56,52,52,51,51,50,50,50,48,48,48,47,47,47,47,47,46,46,46,46,45,45,45,45,44,44,44,44,44,44,44,43,43,43,43,43,42,42,42,42,42,42,41,41,41,41,41,41,41,40,40,40,40,40,40,40,40,39,39,39,38,38,38,38,38,38,38,38,38,38,37,37,37,37,37,37,37,37,37,37,37,37,37,36,36,36,36,36,36,36,36,36,36,36,36,35,35,35,35,35,35,35,35,35,35,34,34,34,34,34,34,34,34,34,34,34,33,33,33,33,33,33,33,33,32,32,32,32,32,32,32,32,32,32,31,31,31,31,31,31,31,30,30,30,30,30,30,30,30,30,29,29,29,29,29,29,28,28,28,28,28,28,28,27,27,27,27,27,27,27,27,27,26,26,26,26,26,25,25,25,25,25,25,24,24,24,24,23,22,21,21,20,20,20,20,20,18,16,13]}
df = pd.DataFrame(data)
print(df.describe())
df.plot.box(title="Consumer spending in each country",whis=1.5)
plt.grid(linestyle="--", alpha=0.3)
plt.show()
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

如下图,13不是异常点,80,56是异常点

在这里插入图片描述但是,这个系数是可以改的,可以改成2.0试试

此时Q1=Q2-2IQR

Q5=Q4+2IQR

df.plot.box(title="Consumer spending in each country",whis=1.5)
  • 1
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data={'neutral':[80,56,52,52,51,51,50,50,50,48,48,48,47,47,47,47,47,46,46,46,46,45,45,45,45,44,44,44,44,44,44,44,43,43,43,43,43,42,42,42,42,42,42,41,41,41,41,41,41,41,40,40,40,40,40,40,40,40,39,39,39,38,38,38,38,38,38,38,38,38,38,37,37,37,37,37,37,37,37,37,37,37,37,37,36,36,36,36,36,36,36,36,36,36,36,36,35,35,35,35,35,35,35,35,35,35,34,34,34,34,34,34,34,34,34,34,34,33,33,33,33,33,33,33,33,32,32,32,32,32,32,32,32,32,32,31,31,31,31,31,31,31,30,30,30,30,30,30,30,30,30,29,29,29,29,29,29,28,28,28,28,28,28,28,27,27,27,27,27,27,27,27,27,26,26,26,26,26,25,25,25,25,25,25,24,24,24,24,23,22,21,21,20,20,20,20,20,18,16,13]}
df = pd.DataFrame(data)
print(df.describe())
df.plot.box(title="Consumer spending in each country",whis=2.0)
plt.grid(linestyle="--", alpha=0.3)
plt.show()
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

在这里插入图片描述不知道可不可以上下限的倍数设的不一样啊
https://www.bilibili.com/video/BV1Jt4y1i76Q?from=search&seid=12376886745584686578
这个视频中说的,不是很懂它的意思,我也没试出来这之间的关系

df.plot.box(title="Consumer spending in each country",whis=(20,100))
  • 1
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/喵喵爱编程/article/detail/744235
推荐阅读
相关标签
  

闽ICP备14008679号