当前位置:   article > 正文

2024美赛C题(网球的动力)深度剖析|详细建模+代码实现(决策树+时间序列+支持向量机)

2024美赛C题(网球的动力)深度剖析|详细建模+代码实现(决策树+时间序列+支持向量机)

首先回顾一下本次美赛的C题:

问题1的解决思路如下:

python示例代码:

  1. import pandas as pd
  2. import matplotlib.pyplot as plt
  3. from sklearn.ensemble import RandomForestClassifier
  4. from sklearn.model_selection import train_test_split
  5. from sklearn.metrics import accuracy_score, confusion_matrix
  6. # 读取数据集
  7. data = pd.read_csv('Wimbledon_featured_matches.csv')
  8. # 选择一个比赛(match_id为示例值)
  9. match_id = '2023-wimbledon-1701'
  10. selected_match = data[data['match_id'] == match_id]
  11. # 提取关键时间序列特征
  12. time_series_features = ['set_no', 'game_no', 'point_no', 'server', 'receiver', 'winner']
  13. # 创建时间序列数据
  14. time_series_data = selected_match[time_series_features]
  15. # 按照时间顺序排序
  16. time_series_data = time_series_data.sort_values(by=['set_no', 'game_no', 'point_no'])
  17. # 特征工程
  18. X = time_series_data[['set_no', 'game_no', 'point_no']]
  19. y = time_series_data['winner']
  20. # 划分训练集和测试集
  21. X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
  22. # 随机森林模型建立
  23. model = RandomForestClassifier(n_estimators=100, random_state=42)
  24. model.fit(X_train, y_train)
  25. # 模型预测
  26. y_pred = model.predict(X_test)
  27. # 模型评估
  28. accuracy = accuracy_score(y_test, y_pred)
  29. conf_matrix = confusion_matrix(y_test, y_pred)
  30. print(f'Accuracy: {accuracy}')
  31. print(f'Confusion Matrix:\n{conf_matrix}')
  32. # 时间序列可视化
  33. plt.figure(figsize=(12, 6))
  34. # 绘制比赛过程的折线图
  35. plt.plot(time_series_data['point_no'], time_series_data['server'], label='Server')
  36. plt.plot(time_series_data['point_no'], time_series_data['receiver'], label='Receiver')
  37. plt.scatter(X_test['point_no'], y_pred, color='red', marker='x', label='Predicted Winner')
  38. plt.title('Match Performance Time Series')
  39. plt.xlabel('Point Number')
  40. plt.ylabel('Player')
  41. plt.legend()
  42. plt.show()

matlab示例代码:

  1. % 读取数据集
  2. data = readtable('Wimbledon_featured_matches.csv');
  3. % 选择一个比赛(match_id为示例值)
  4. match_id = '2023-wimbledon-1701';
  5. selected_match = data(data.match_id == match_id, :);
  6. % 提取关键时间序列特征
  7. time_series_features = {'set_no', 'game_no', 'point_no', 'server', 'receiver', 'winner'};
  8. time_series_data = selected_match(:, time_series_features);
  9. % 按照时间顺序排序
  10. time_series_data = sortrows(time_series_data, {'set_no', 'game_no', 'point_no'});
  11. % 特征工程
  12. X = time_series_data{:, {'set_no', 'game_no', 'point_no'}};
  13. y = time_series_data.winner;
  14. % 划分训练集和测试集
  15. rng(42); % 设置随机种子以保证可复现性
  16. [trainIdx, testIdx] = cvpartition(height(time_series_data), 'HoldOut', 0.2);
  17. X_train = X(trainIdx, :);
  18. y_train = y(trainIdx);
  19. X_test = X(testIdx, :);
  20. y_test = y(testIdx);
  21. % 随机森林模型建立
  22. model = TreeBagger(100, X_train, y_train, 'Method', 'classification', 'OOBPrediction', 'on');
  23. % 模型预测
  24. y_pred = predict(model, X_test);
  25. % 模型评估
  26. accuracy = sum(strcmp(y_pred, y_test)) / numel(y_test);
  27. conf_matrix = confusionmat(y_test, y_pred);
  28. fprintf('Accuracy: %.4f\n', accuracy);
  29. disp('Confusion Matrix:');
  30. disp(conf_matrix);
  31. % 时间序列可视化
  32. figure;
  33. plot(time_series_data.point_no, time_series_data.server, 'DisplayName', 'Server');
  34. hold on;
  35. plot(time_series_data.point_no, time_series_data.receiver, 'DisplayName', 'Receiver');
  36. scatter(X_test(:, 3), str2double(y_pred), 50, 'rx', 'DisplayName', 'Predicted Winner');
  37. hold off;
  38. title('Match Performance Time Series');
  39. xlabel('Point Number');
  40. ylabel('Player');
  41. legend('Server', 'Receiver', 'Predicted Winner');

问题3可以采用支持向量机的思路建模:

python示例代码:

  1. import pandas as pd
  2. from sklearn.model_selection import train_test_split
  3. from sklearn.preprocessing import StandardScaler
  4. from sklearn.svm import SVC
  5. from sklearn.metrics import accuracy_score
  6. features = match_data[['feature1', 'feature2', ...]]
  7. target = match_data['fluctuation']
  8. # 数据标签处理,将波动作为目标变量,转换为二进制分类问题
  9. target_binary = (target == '波动') # 根据实际数据标签调整
  10. # 数据拆分为训练集和测试集
  11. train_features, test_features, train_target, test_target = train_test_split(features, target_binary, test_size=0.2, random_state=42)
  12. # 特征缩放,使用 z-score 标准化
  13. scaler = StandardScaler()
  14. train_features_scaled = scaler.fit_transform(train_features)
  15. test_features_scaled = scaler.transform(test_features)
  16. # SVM 模型训练
  17. svm_model = SVC(kernel='linear', C=1)
  18. svm_model.fit(train_features_scaled, train_target)
  19. # 模型预测
  20. predictions = svm_model.predict(test_features_scaled)
  21. # 模型评估
  22. accuracy = accuracy_score(test_target, predictions)
  23. print(f'SVM Model Accuracy: {accuracy:.4f}')

matlab示例代码:

  1. features = match_data(:, {'feature1', 'feature2', ...});
  2. target = match_data.fluctuation;
  3. % 数据标签处理,将波动作为目标变量,转换为二进制分类问题
  4. target_binary = (target == '波动'); % 根据实际数据标签调整
  5. % 数据拆分为训练集和测试集
  6. rng(42); % 设置随机种子,确保结果可重复
  7. [train_features, test_features, train_target, test_target] = splitData(features, target_binary, 0.8);
  8. % 特征缩放,使用 z-score 标准化
  9. train_features_scaled = zscore(train_features);
  10. test_features_scaled = zscore(test_features);
  11. % SVM 模型训练
  12. svm_model = fitcsvm(train_features_scaled, train_target, 'KernelFunction', 'linear', 'BoxConstraint', 1);
  13. % 模型预测
  14. predictions = predict(svm_model, test_features_scaled);
  15. % 模型评估
  16. accuracy = sum(predictions == test_target) / length(test_target);
  17. fprintf('SVM Model Accuracy: %.4f\n', accuracy);

查看完整思路如下:

【腾讯文档】2024美赛全题目深度解析(建模过程+代码实现+论文指导)
https://docs.qq.com/doc/DSG1LQWtOQ3lFWHNj

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/知新_RL/article/detail/85652
推荐阅读
相关标签
  

闽ICP备14008679号