当前位置:   article > 正文

用Python+matplotlib足球运动员的射门数据可视化(绘制散点图)_understat

understat

射门数据的可视化,本质上就是散点图,只是点的大小按期望进球值(预测进球概率)变化,提高了直观性和可视性。

一、https://understat.com联赛数据网

足球运动员的射门数据来自https://understat.com,进入主页,搜索姆巴佩“Mbappe”(见图1)。

图1 https://understat.com联赛数据网主页搜索

进入基利安·姆巴佩(Kylian Mbappé)页面,姆巴佩的player_id=3423,所以他的页面网址是https://understat.com/player/3423。https://understat.com/网站提供自2014/2015赛季至现在的联赛数据(爬取网页为https://understat.com/player/{player_id},其中C罗的player_id为2371,梅西的player_id为2097,内马尔的player_id为2099,姆巴佩player_id为3423),包括射门位置(X, Y)、预期进球(进球概率)(xG)、射门结果(result)、射门方式(shotType)、赛季(season)。

射门结果(result)包括:被截(被球员拦截)、进球、射偏、救球(被守门员扑救)、柱射(射在门柱上)。

射门类型(shotType)包括:头球射门、左脚射门、右脚射门及身体其他部位射门。

射门结果Result分为五种:1)Goal(进球);2)Shoton post(射在门柱上);3)Savedshot(守门员守住了);4)Blockedshot(被拦截);5)Missedshot(射偏)。

姆巴佩的数据从2015/2016赛季开始,目录是2022、2023赛季(见图2)。

图2 Kylian Mbappé页面

二、网页分析

单击鼠标右键查看原代码,发现有多个超长字符串变量在<script>...</script>标签中。

按顺序第四个<script>是射门数据(见图3)。

图3 页面代码(局部)

要抓取的是 

<script>

    var shotData = JSON.parse('...')

</script>

结构中引号中的内容。内容为JSON结构数据,注意:JSON是字符串形式,尽管很像字典,但不是Python字典,对Python就是字符串,但可以用json模块进行转换。

json.loads()==>将JSON字符串转为字典或字典列表

json.dumps()==>将字典或字典列表转为JSON字符串

JSON可以有两种表示结构:对象和数组

对象结构以"{"大括号开始,以"}"大括号结束。中间部分由以","来分割开键值对(key/value)代码表示如下:

{  

     key1:value1,     

     key2:value2,   

         ...  

}  

其中:关键字需要是不变类型,比如:字符串;而值可以是其他任何数据,比如:字符串,数值,布尔值,对象或者是null。

数组结构以"["方括号开始,"]"方括号结束。中间部分用","分割对象。代码表示如下:

[

  {

     key1:value1,

     key2:value2

  },

  {

    key3:value3,

      key4:value4

  }

]

可用用Python的以字典为元素的列表表示(Python二维数据)。

三、数据提取与解码

本次爬取的网页用的是JSON数组结构,转换成Python结构后为列表,元素为字典。

截取变量中的头尾两小节数据(C罗的数据),列于下面作前期分析,从数据看是字符串形式的Python单字节十六进制数(十进制值大于32且小于128,ASCII码)+数据,需先转化为Python字节流,再解码为JSON串,然后用json.loads()转换为Python字典列表。

>>> a = r'\x5B\x7B\x22id\x22\x3A\x2232535\x22,\x22minute\x22\x3A\x2218\x22,\x22result\x22\x3A\x22SavedShot\x22,\x22X\x22\x3A\x220.845\x22,\x22Y\x22\x3A\x220.49900001525878906\x22,\x22xG\x22\x3A\x220.06659495085477829\x22,\x22player\x22\x3A\x22Cristiano\x20Ronaldo\x22,\x22h_a\x22\x3A\x22h\x22,\x22player_id\x22\x3A\x222371\x22,\x22situation\x22\x3A\x22SetPiece\x22,\x22season\x22\x3A\x222014\x22,\x22shotType\x22\x3A\x22RightFoot\x22,\x22match_id\x22\x3A\x225834\x22,\x22h_team\x22\x3A\x22Real\x20Madrid\x22,\x22a_team\x22\x3A\x22Cordoba\x22,\x22h_goals\x22\x3A\x222\x22,\x22a_goals\x22\x3A\x220\x22,\x22date\x22\x3A\x222014\x2D08\x2D25\x2019\x3A00\x3A00\x22,\x22player_assisted\x22\x3A\x22Luka\x20Modric\x22,\x22lastAction\x22\x3A\x22Pass\x22\x7D,\x7B\x22id\x22\x3A\x22422004\x22,\x22minute\x22\x3A\x2223\x22,\x22result\x22\x3A\x22SavedShot\x22,\x22X\x22\x3A\x220.885\x22,\x22Y\x22\x3A\x220.5\x22,\x22xG\x22\x3A\x220.7612988352775574\x22,\x22player\x22\x3A\x22Cristiano\x20Ronaldo\x22,\x22h_a\x22\x3A\x22h\x22,\x22player_id\x22\x3A\x222371\x22,\x22situation\x22\x3A\x22Penalty\x22,\x22season\x22\x3A\x222020\x22,\x22shotType\x22\x3A\x22RightFoot\x22,\x22match_id\x22\x3A\x2215790\x22,\x22h_team\x22\x3A\x22Juventus\x22,\x22a_team\x22\x3A\x22Inter\x22,\x22h_goals\x22\x3A\x223\x22,\x22a_goals\x22\x3A\x222\x22,\x22date\x22\x3A\x222021\x2D05\x2D15\x2016\x3A00\x3A00\x22,\x22player_assisted\x22\x3Anull,\x22lastAction\x22\x3A\x22Standard\x22\x7D\x5D'

>>> b = eval("b'" + a + "'")                      # 将字符串放入b'...'中,用eval()转换为字节流

>>> b

b'[{"id":"32535","minute":"18","result":"SavedShot","X":"0.845","Y":"0.49900001525878906","xG":"0.06659495085477829","player":"CristianoRonaldo","h_a":"h","player_id":"2371","situation":"SetPiece","season":"2014","shotType":"RightFoot","match_id":"5834","h_team":"RealMadrid","a_team":"Cordoba","h_goals":"2","a_goals":"0","date":"2014-08-2519:00:00","player_assisted":"Luka Modric","lastAction":"Pass"},{"id":"422004","minute":"23","result":"SavedShot","X":"0.885","Y":"0.5","xG":"0.7612988352775574","player":"CristianoRonaldo","h_a":"h","player_id":"2371","situation":"Penalty","season":"2020","shotType":"RightFoot","match_id":"15790","h_team":"Juventus","a_team":"Inter","h_goals":"3","a_goals":"2","date":"2021-05-1516:00:00","player_assisted":null,"lastAction":"Standard"}]'

>>> type(b)                                     # 测试结果为字节流

<class 'bytes'>

>>> b.decode()                               # decode()解码为字符串,因为是ASCII码所有编码都兼容

'[{"id":"32535","minute":"18","result":"SavedShot","X":"0.845","Y":"0.49900001525878906","xG":"0.06659495085477829","player":"CristianoRonaldo","h_a":"h","player_id":"2371","situation":"SetPiece","season":"2014","shotType":"RightFoot","match_id":"5834","h_team":"RealMadrid","a_team":"Cordoba","h_goals":"2","a_goals":"0","date":"2014-08-2519:00:00","player_assisted":"LukaModric","lastAction":"Pass"},{"id":"422004","minute":"23","result":"SavedShot","X":"0.885","Y":"0.5","xG":"0.7612988352775574","player":"CristianoRonaldo","h_a":"h","player_id":"2371","situation":"Penalty","season":"2020","shotType":"RightFoot","match_id":"15790","h_team":"Juventus","a_team":"Inter","h_goals":"3","a_goals":"2","date":"2021-05-1516:00:00","player_assisted":null,"lastAction":"Standard"}]'

其中重要数据包含射门位置(X、Y)、预期进球(xG)、射门结果(result)、赛季(season)。预期进球即预测进球概念,xG=1则100%进球,X、Y为相对值,值介于0~1,matplotlib绘图则是0~100,所以要放大100倍,result=Goal为进球,season=2014表示2014/2015赛季。

>>> import json                           # 导入json模块

>>> json.loads(b.decode())          # JSON数据转换为字典列表

[{'id':'32535', 'minute': '18', 'result': 'SavedShot', 'X': '0.845', 'Y':'0.49900001525878906', 'xG': '0.06659495085477829', 'player': 'Cristiano Ronaldo','h_a': 'h', 'player_id': '2371', 'situation': 'SetPiece', 'season': '2014','shotType': 'RightFoot', 'match_id': '5834', 'h_team': 'Real Madrid', 'a_team':'Cordoba', 'h_goals': '2', 'a_goals': '0', 'date': '2014-08-25 19:00:00','player_assisted': 'Luka Modric', 'lastAction': 'Pass'}, {'id': '422004','minute': '23', 'result': 'SavedShot', 'X': '0.885', 'Y': '0.5', 'xG':'0.7612988352775574', 'player': 'Cristiano Ronaldo', 'h_a': 'h', 'player_id':'2371', 'situation': 'Penalty', 'season': '2020', 'shotType': 'RightFoot','match_id': '15790', 'h_team': 'Juventus', 'a_team': 'Inter', 'h_goals': '3','a_goals': '2', 'date': '2021-05-15 16:00:00', 'player_assisted': None,'lastAction': 'Standard'}]

>>> json.loads(b)                         # 其实不解码也能转换为字典列表

[{'id':'32535', 'minute': '18', 'result': 'SavedShot', 'X': '0.845', 'Y':'0.49900001525878906', 'xG': '0.06659495085477829', 'player': 'CristianoRonaldo', 'h_a': 'h', 'player_id': '2371', 'situation': 'SetPiece', 'season':'2014', 'shotType': 'RightFoot', 'match_id': '5834', 'h_team': 'Real Madrid','a_team': 'Cordoba', 'h_goals': '2', 'a_goals': '0', 'date': '2014-08-2519:00:00', 'player_assisted': 'Luka Modric', 'lastAction': 'Pass'}, {'id':'422004', 'minute': '23', 'result': 'SavedShot', 'X': '0.885', 'Y': '0.5', 'xG':'0.7612988352775574', 'player': 'Cristiano Ronaldo', 'h_a': 'h', 'player_id':'2371', 'situation': 'Penalty', 'season': '2020', 'shotType': 'RightFoot','match_id': '15790', 'h_team': 'Juventus', 'a_team': 'Inter', 'h_goals': '3','a_goals': '2', 'date': '2021-05-15 16:00:00', 'player_assisted': None,'lastAction': 'Standard'}]

>>> type(json.loads(b))                # 结果为列表

<class 'list'>

好了!有了上面的分析和基础知识后,就要开始爬网页,爬网页用requests模块的get()方法,从网页中提取<script>...</script>标签的内容用BeautifulSoup4模块的BeautifulSoup类的find_all()方法。

四、matplotlib中的绘制散点图——scatter()方法

pyplot模块中的scatter()函数用于绘制散点图,其语法格式如下:

matplotlib.pyplot.scatter(x, y, s=None, c=None, marker=None, camp=None, 

       norm=None, vmin=None, vmax=None,alpha=None, linewidths=None, 

       verts=None, edgecolors=None, hold=None, data=None,**kwargs)

式中常用的参数含义如下:

x,y:表示 x 轴和 y 轴对应的数据。

s:指定点的大小。若传入的是一维数组,则表示每个点的大小。

c:指定散点的颜色,若传入的是一维数组,则表示每个点的颜色。

marker:表示绘制的散点类型(控制点的形状),见表1。

alpha:控制点的透明度,接受0~1之间的小数。在数据量大的时候设置较小的alpha值,然后调整一下s值,这样产生重叠效果使得数据的聚集特征会很好地显示出来。

cmap:调整渐变色或者颜色列表的种类。

表1 marker设置与对应符号及说明

五、完整代码

完整代码如下:

  1. #############################################
  2. # 设计 Zhang Ruilin 创建 2021-01-10 18:35 #
  3. # 修订 2022-12-28 10:13 #
  4. # Matplotlib 绘制足球运动员的射门数据分布图 #
  5. #############################################
  6. import requests # 爬网页工具
  7. from bs4 import BeautifulSoup # 分析网页、提取信息工具
  8. import json # JSON转字典、字典转JSON
  9. import pandas as pd # 大数据处理工具
  10. import matplotlib.pyplot as plt # 类似matlab的绘图工具包
  11. import numpy as np # 科学计算数学函数库
  12. import matplotlib as mpl
  13. import mplsoccer # 绘制足球场工具
  14. # 基利安·姆巴佩(Kylian Mbappé)的player-id为3423
  15. url = 'https://understat.com/player/3423' # 请求数据
  16. html = requests.get(url) # 爬取网页
  17. # 解析处理数据
  18. soup_parse = BeautifulSoup(html.content, 'lxml') # 提取内容
  19. scripts = soup_parse.find_all('script') # 查找script标签返回一个列表类型
  20. strings = scripts[3].string # 取含shotsData变量的结果,转字符串
  21. _start = strings.index("('")+2 # 起点为JSON.parse('后的字符
  22. _end = strings.index("')") # 终止为\x5D')的'前,不含“'”
  23. json_data = strings[_start:_end] # 截取变量中''之间部分(JSON数据)
  24. json_data = eval("b'"+json_data+"'") # 将十六进制字符串\xYY转为字节流
  25. data = json.loads(json_data) # 转换为字典列表
  26. # 处理数据, 包含射门位置(X,Y)、预期进球(xG)、射门结果(result)、赛季(season)
  27. x, y, xg, result, season = [], [], [], [], []
  28. for _dic in data: # 提取X、Y、xG、result、season
  29. x.append(_dic['X'])
  30. y.append(_dic['Y'])
  31. xg.append(_dic['xG'])
  32. result.append(_dic['result'])
  33. season.append(_dic['season'])
  34. columns = ['X', 'Y', 'xG', 'Result', 'Season']
  35. df_data = pd.DataFrame([x, y, xg, result, season], index=columns)
  36. df_data = df_data.T # 对数据进行行列交换(转置)
  37. df_data = df_data.apply(pd.to_numeric, errors='ignore') # 将数值字符串转换为数值型
  38. df_data['X'] = df_data['X'].apply(lambda x: x*100) # 放大100倍,得到最终结果
  39. df_data['Y'] = df_data['Y'].apply(lambda x: x*100) # 原数据为相对数据0~1
  40. # df_data.to_csv(r'd:/Mbappé_shooting.csv') # 保存为文件
  41. background, text_color = 'lightgray', 'black' # 定义背景色(浅灰色)、文字色(黑色)
  42. mpl.rcParams['text.color'] = text_color # 设置文字颜色
  43. mpl.rcParams['font.sans-serif'] = ['simsun'] # 设置默认字体为宋体
  44. mpl.rcParams['legend.fontsize'] = 15 # 图例字号15磅
  45. fig, ax = plt.subplots(figsize=(7, 5.6)) # 新建画布7×5.6英寸
  46. ax.axis('off') # 关闭坐标轴(不显示坐标轴)
  47. fig.set_facecolor(background) # 用背景色填充
  48. pitch = mplsoccer.VerticalPitch(half=True, pitch_type='opta', line_zorder=3,
  49. pitch_color='grass') # 画垂直方向半个足球场
  50. axes = fig.add_axes((0.05, 0.06, 0.9, 0.9)) # 绘图范围。左下角(0.05, 0.06),
  51. axes.patch.set_facecolor(background) # ↑宽、高各为90%
  52. pitch.draw(ax=axes)
  53. season=2021 # 设置赛季。范围2014~运行年-1
  54. df = df_data.loc[df_data['Season'] == season] # 筛选指定赛季数据
  55. # 某赛季, 球员射门位置未得分散点图(df['Result']!='Goal'), 青色,透明度0.5
  56. pitch.scatter(df[df['Result'] != 'Goal']['X'], df[df['Result'] != 'Goal']['Y'],
  57. s=np.sqrt(df[df['Result'] != 'Goal']['xG'])*100, marker='o', alpha=0.5,
  58. edgecolor='black', facecolor='cyan', ax=axes, label='未进球')
  59. # 某赛季, 球员射门位置得分散点图(df['Result']=='Goal'), 深红色,透明度0.7
  60. pitch.scatter(df[df['Result'] == 'Goal']['X'], df[df['Result'] == 'Goal']['Y'],
  61. s=np.sqrt(df[df['Result'] == 'Goal']['xG'])*100,marker='o', alpha=0.7,
  62. edgecolor='black', facecolor='crimson', ax=axes, label='进球得分')
  63. axes.legend(loc='lower right') # 添加图例
  64. # 输出文字
  65. axes.text(25, 64, f"预期进球:{sum(df['xG']):.2f}", weight='bold',
  66. size=14) # 期望进球df['xG']之和
  67. axes.text(25, 61, f"得分次数:{len(df[df['Result'] == 'Goal'])}",
  68. weight='bold', size=14) # 条件df['Result'] == 'Goal'的行数
  69. axes.text(25, 58, f"射门次数:{len(df)}", weight='bold', size=14) # 本赛季数据行数
  70. axes.text(95, 60, f'{season}-{season+1}赛季', weight='bold', size=18)
  71. plt.show()

执行结果如图4所示。

图4 Kylian Mbappé射门位置分布图

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/码创造者/article/detail/937295
推荐阅读
相关标签
  

闽ICP备14008679号