赞
踩
五一数模竞赛如期开赛,5月4日中午12:00截至,为期74小时的竞赛。为了能够更好的帮助大家,这里为大家带来详细版的C题解题思路,希望能够对大家有所帮助。
五一数模C题分享资料链接:https://pan.baidu.com/s/1n3wU-XCX5HH2x7LKpmxEyw
提取码:sxjm
统计特征 | 时间特征 |
电磁辐射 (EMR) 统计特征: | 测量时间间隔统计特征: |
平均值:77.96 | 平均间隔:约5709秒(约95分钟) |
标准差:90.72 | 标准差:约119216秒(因大部分间隔为31秒或62秒,故标准差大表明存在极端值) |
最小值:0 | 最小间隔:0秒(连续测量) |
第一四分位数:31.62 | 第一四分位数:31秒 |
中位数:45.00 | 中位数:31秒 |
第三四分位数:75.00 | 第三四分位数:62秒 |
最大值:500 | 最大间隔:约5815425秒(约67天) |
特征提取:从“附件一EMR”中提取电磁辐射信号的统计特征。
模型训练:使用提取的特征和类别标签来训练一个分类模型。
干扰识别:应用模型于“问题一EMR检测时间”数据,以识别并记录干扰信号。
时间区间确定:找出最早的5个干扰信号所在的时间区间。
表1 电磁辐射干扰信号时间区间[分类模型]
序号 | 时间区间起点 | 时间区间终点 |
1 | 2022-05-01 00:01:12 | 2022-05-01 05:15:45 |
2 | 2022-05-01 07:01:49 | 2022-05-01 13:49:45 |
3 | 2022-05-01 20:50:30 | 2022-05-01 21:39:53 |
4 | 2022-05-02 00:00:43 | 2022-05-02 08:45:36 |
5 | 2022-05-02 11:00:59 | 2022-05-02 13:36:28 |
表1 电磁辐射干扰信号时间区间[阈值模型]
序号 | 时间区间起点 | 时间区间终点 |
1 | 2022-05-01 10:57:50 | 2022-05-01 11:08:48 |
2 | 2022-05-03 08:23:06 | 2022-05-03 08:23:06 |
3 | 2022-05-07 18:04:53 | 2022-05-07 18:04:53 |
4 | 2022-05-09 07:29:02 | 2022-05-09 07:29:02 |
5 | 2022-05-10 06:27:57 | 2022-05-10 08:30:29 |
预测与验证:对指定时间段的数据应用模型,预测并识别前兆特征信号的时间区间。
Matlab数据预处理代码
- % 加载数据
- data = readtable('path_to/附件一.xlsx');
-
- % 函数用于检查每个类别的数据是否服从正态分布
- function [statistic, p_value, is_normal] = checkNormalDistribution(group_data)
- [h, p] = kstest((group_data - mean(group_data)) / std(group_data)); % Z-Score标准化
- statistic = h;
- p_value = p;
- is_normal = p > 0.05;
- end
-
- % 函数用于计算并可视化异常值
- function outliers = detectAndVisualizeOutliers(data, category)
- Q1 = quantile(data, 0.25);
- Q3 = quantile(data, 0.75);
- IQR = Q3 - Q1;
- outliers = (data < (Q1 - 1.5 * IQR)) | (data > (Q3 + 1.5 * IQR));
-
- % 可视化
- figure;
- boxplot(data, 'Orientation', 'horizontal', 'Labels', {category});
- hold on;
- scatter(data(outliers), ones(sum(outliers), 1), 'r', 'filled');
- title(sprintf('Boxplot and Outliers for Category %s', category));
- grid on;
- end
-
- % 按类别分组并检查每组数据分布
- categories = unique(data.类别_class);
- numCategories = numel(categories);
- distribution_tests = cell(numCategories, 3);
-
- for i = 1:numCategories
- category = categories{i};
- group = data.电磁辐射_EMR(strcmp(data.类别_class, category));
- group = rmmissing(group); % 删除可能的NaN值
- [statistic, p_value, is_normal] = checkNormalDistribution(group);
- distribution_tests{i, 1} = category;
- distribution_tests{i, 2} = p_value;
- distribution_tests{i, 3} = is_normal;
- end
-
- % 输出正态分布测试结果
- disp('K-S Test Results:');
- disp(distribution_tests);
-
- % 应用异常值检测并进行可视化
- outliers_results = cell(numCategories, 2);
- for i = 1:numCategories
- category = categories{i};
- group = data.电磁辐射_EMR(strcmp(data.类别_class, category));
- group = rmmissing(group);
- outliers = detectAndVisualizeOutliers(group, category);
- outliers_results{i, 1} = category;
- outliers_results{i, 2} = sum(outliers); % 记录每个类别中异常值的数量
- end
-
- % 输出每个类别的异常值数量
- disp('Number of Outliers by Category:');
- disp(outliers_results);
- import pandas as pd
- import numpy as np
- import matplotlib.pyplot as plt
-
- # 加载数据
- emr_data_path = '附件一EMR.xlsx'
- emr_detection_path = '问题一EMR检测时间.xlsx'
-
-
- # 读取Excel文件
- emr_data = pd.read_excel(emr_data_path)
- emr_detection_data = pd.read_excel(emr_detection_path)
-
-
- # 提取干扰数据特征
- def extract_features(data):
- # 计算基本统计量
- data['时间 (time)'] = pd.to_datetime(data['时间 (time)'])
- data_sorted = data.sort_values(by='时间 (time)')
- data_sorted['duration'] = data_sorted['时间 (time)'].diff().dt.total_seconds().fillna(0)
-
- features = data_sorted.groupby(pd.Grouper(key='时间 (time)', freq='30S')).agg({
- '电磁辐射 (EMR)': ['mean', 'std', 'min', 'max', 'count'],
- 'duration': 'sum' # Sum of durations within each group
- }).dropna()
-
- # 重命名列
- features.columns = ['EMR_mean', 'EMR_std', 'EMR_min', 'EMR_max', 'EMR_count', 'total_duration']
-
- return features
-
-
- # 使用附件一中的数据提取特征
- emr_features = extract_features(emr_data)
-
- # 计算附件一数据的统计特性以设置阈值
- emr_stats = emr_features.describe()
- emr_stats_mean_threshold = emr_stats.loc['mean', 'EMR_mean'] + 2 * emr_stats.loc['std', 'EMR_mean']
- emr_stats_std_threshold = emr_stats.loc['mean', 'EMR_std']
- emr_stats_duration_threshold = emr_stats.loc['mean', 'total_duration']
-
-
- # 简单的阈值判断方法来初步识别干扰信号
- def identify_interferences(features, mean_threshold, std_threshold, duration_threshold):
- # 根据阈值判断是否为干扰
- potential_interferences = features[(features['EMR_mean'] > mean_threshold) |
- (features['EMR_std'] > std_threshold) |
- (features['total_duration'] > duration_threshold)]
- return potential_interferences
-
-
- # 从问题一的EMR检测时间数据中提取特征
- emr_detection_features = extract_features(emr_detection_data)
-
- # 应用阈值识别干扰信号
- identified_interferences = identify_interferences(emr_detection_features,
- emr_stats_mean_threshold,
- emr_stats_std_threshold,
- emr_stats_duration_threshold)
-
- # 提取最早发生的5个干扰信号所在的区间
- earliest_interferences = identified_interferences.sort_index().head(5)
-
- # 输出每个区间的时间区间起点和终点
- intervals = [(index, index + pd.Timedelta(seconds=30)) for index in earliest_interferences.index]
-
- # 打印时间区间
- print("Time Intervals for Identified Interferences:")
- for start, end in intervals:
- print(f"Start: {start}, End: {end}")
-
- # 可视化结果
- plt.figure(figsize=(12, 8))
- plt.plot(emr_detection_features.index, emr_detection_features['EMR_mean'], label='EMR Mean')
- plt.scatter([idx for idx, _ in intervals], [earliest_interferences['EMR_mean'][idx] for idx, _ in intervals],
- color='red', label='Identified Interferences')
- plt.xlabel('Time')
- plt.ylabel('EMR Mean Value')
- plt.title('EMR Mean Values and Identified Interferences Over Time')
- plt.legend()
- plt.show()
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。