,能够使用 nn.Module
一些接口,比如说 state_dict
和 load_state_dict
等,能够支持 Metric 的序列化与反序列化higher_is_better
和 fp16
是 pytorch 官方实现的评测指标,但与 torchmetrics
比较像(所以有点抄袭的嫌疑),对于 Metric 基类的设计,有点像 torchmetrics
Metric 基类简化版,区别是把进程同步的功能解耦合出来为一个 sync_and_compute
函数,对于 Metric 本身,并没有耦合过多的进程同步功能,易于理解和维护,而且sync_and_compute
将评测分为三类,分别是metrics / comparisions / measurements
,对应着算法评测,模型输出比较,数据集统计指标,其中每个评测指标都是一个单独的 repo,并且实现 app.py 可以在 huggingface space 上使用:
- Metric 基类设计的较为简单,将每个进程的输入缓存写到文件中,最终计算之前利用 huggingface/datasets 读取拼接文件实现进程同步,以此实现分布式评测,在我看来其实是偷懒了,不管什么情况,都是直接缓存输入的模型预测结果和 ground
truth,并且使用文件的方式来进行通信,不支持多机的并分布式评测- 实现的评测指标主要是与
相关的居多,并且很多指标的实现其实是直接调用第三方库,比如 Accuracy 直接调用 sklearn.metrics.accuracy_score
的核心定位是跨框架算法评测库,希望不同的 codebase 能够使用同一个评测工具,并且不同的训练框架也能够使用同一个评测工具mmeval
扩展了 torchmetrics
- 一个标准化的接口,以提高可重复性
- 支持 分布式 训练
- 在批次 batch 之间 自动累积
- 在多个设备之间 自动同步
- 一致性:无论你在何处使用它(CPU、GPU或TPU上),它都提供了相同的结果
安装:pip install torchmetrics
或者conda install -c conda-forge torchmetrics
可视化接口依赖安装:pip install matplotlib
orpip install 'torchmetrics[visual]'
几乎所有的函数版本的指标都有一个相应的 基于类的版本(底层Metric
(类似于 PyTorch模块的参数
支持大多数 Python 内置的算术、逻辑和位操作的运算符)
和预测值 Y_PRED
传递给 torchmetrics
的度量对象,度量对象会计算批次指标并保存它(在其内部被称为 state
)一个 Epoch 完成
)。这里的每个度量对象都是从 metric 类继承,它包含了 4 个关键方法:
:更新度量状态并返回当前批次上计算的度量结果。 如果您愿意,也可以使用 metric(pred, target)
:与forward相同,但是不会返回计算结果,相当于是只将结果存入了state。 如果不需要在当前批处理上计算出的度量结果,则优先使用这个方法,因为他不计算最终结果速度会很快metric.compute()
:返回在所有批次上计算的最终结果。也就是说其实 forward
相当于是 update+compute
: 重置状态,以便为下一个验证阶段做好准备import torch import torchmetrics # initialize metric metric = torchmetrics.Accuracy(task="multiclass", num_classes=5) # move the metric to device you want computations to take place device = "cuda" if torch.cuda.is_available() else "cpu" metric.to(device) n_batches = 10 for i in range(n_batches): # simulate a classification problem preds = torch.randn(10, 5).softmax(dim=-1).to(device) # (10,5), 还需经过 argmax 才能得到 label target = torch.randint(5, (10,)).to(device) # (10,) # metric on current batch acc = metric(preds, target) print(f"Accuracy on batch {i}: {acc}") # metric on all batches using custom accumulation acc = metric.compute() print(f"Accuracy on all data: {acc}") # Reseting internal state such that metric ready for new data metric.reset() # 输出如下 Accuracy on batch 0: 0.30000001192092896 Accuracy on batch 1: 0.20000000298023224 Accuracy on batch 2: 0.30000001192092896 Accuracy on batch 3: 0.10000000149011612 Accuracy on batch 4: 0.10000000149011612 Accuracy on batch 5: 0.10000000149011612 Accuracy on batch 6: 0.10000000149011612 Accuracy on batch 7: 0.30000001192092896 Accuracy on batch 8: 0.10000000149011612 Accuracy on batch 9: 0.4000000059604645 Accuracy on all data: 0.20000000298023224
内部状态需要在 epoch 之间被重置
,并且不应该在训练、验证和测试之间混淆。因此,强烈建议按不同的模式重新初始化指标,如下例所示:from torchmetrics.classification import Accuracy train_accuracy = Accuracy() valid_accuracy = Accuracy() for epoch in range(epochs): for x, y in train_data: y_hat = model(x) # training step accuracy batch_acc = train_accuracy(y_hat, y) print(f"Accuracy of batch{i} is {batch_acc}") for x, y in valid_data: y_hat = model(x) valid_accuracy.update(y_hat, y) # total accuracy over all training batches total_train_accuracy = train_accuracy.compute() # total accuracy over all validation batches total_valid_accuracy = valid_accuracy.compute() print(f"Training acc for epoch {epoch}: {total_train_accuracy}") print(f"Validation acc for epoch {epoch}: {total_valid_accuracy}") # Reset metric states after each epoch train_accuracy.reset() valid_accuracy.reset()
来实现自定义指标,只需继承 torchmetrics.Metric
方法,在这里为每一个指标计算所需的内部状态调用 self.add_state
方法,在这里进行最终的指标计算import torch from torchmetrics import Metric class MyAccuracy(Metric): def __init__(self): # remember to call super super().__init__() # call `self.add_state`for every internal state that is needed for the metrics computations # dist_reduce_fx indicates the function that should be used to reduce # state from multiple processes self.add_state("correct", default=torch.tensor(0), dist_reduce_fx="sum") self.add_state("total", default=torch.tensor(0), dist_reduce_fx="sum") def update(self, preds: torch.Tensor, target: torch.Tensor) -> None: # extract predicted class index for computing accuracy preds = preds.argmax(dim=-1) assert preds.shape == target.shape # update metric states self.correct += torch.sum(preds == target) self.total += target.numel() def compute(self) -> torch.Tensor: # compute final result return self.correct.float() / self.total my_metric = MyAccuracy() preds = torch.randn(10, 5).softmax(dim=-1) target = torch.randint(5, (10,)) print(my_metric(preds, target))
,自己实现的方式(继承 nn.Module
):from torch import nn
class CTCGreedyDecode(nn.Module):
def __init__(self):
def forward(self, preds, labels, label_lengths):
preds = preds.permute(1, 0, 2).detach().cpu().numpy() # tensor T,N,C --> numpy N,T,C
labels = labels.cpu().numpy()
label_lengths = label_lengths.cpu().numpy()
gt_labels = get_gt_labels(labels, label_lengths)
acc = cal_acc(preds, gt_labels)
return acc
import torch from torchmetrics import MetricCollection, Accuracy, Precision, Recall target = torch.tensor([0, 2, 0, 2, 0, 1, 0, 2]) preds = torch.tensor([2, 1, 2, 0, 1, 2, 2, 2]) metric_collection = MetricCollection([ Accuracy(task="multiclass", num_classes=3), Precision(task="multiclass", num_classes=3, average='macro'), Recall(task="multiclass", num_classes=3, average='macro') ]) print(metric_collection(preds, target)) # 输出结果如下: {'MulticlassAccuracy': tensor(0.1250), 'MulticlassPrecision': tensor(0.0667), 'MulticlassRecall': tensor(0.1111)}
from torchmetrics.classification import BinaryAccuracy target = torch.tensor([1, 1, 0, 0], device=torch.device("cuda", 0)) preds = torch.tensor([0, 1, 0, 0], device=torch.device("cuda", 0)) # Metric states are always initialized on cpu, and needs to be moved to the correct device confmat = BinaryAccuracy().to(torch.device("cuda", 0)) out = confmat(preds, target) print(out.device) # cuda:0 # when properly defined inside a Module or LightningModule the metric will be automatically moved to the # same device( # metric is correctly identified as a child module of the model (check .children() attribute of the model)) from torchmetrics import MetricCollection from torchmetrics.classification import BinaryAccuracy class MyModule(torch.nn.Module): def __init__(self): ... # valid ways metrics will be identified as child modules self.metric1 = BinaryAccuracy() self.metric2 = nn.ModuleList(BinaryAccuracy()) self.metric3 = nn.ModuleDict({'accuracy': BinaryAccuracy()}) self.metric4 = MetricCollection([BinaryAccuracy()]) # torchmetrics build-in collection class def forward(self, batch): data, target = batch preds = self(data) ... val1 = self.metric1(preds, target) val2 = self.metric2[0](preds, target) val3 = self.metric3['accuracy'](preds, target) val4 = self.metric4(preds, target)
import os import torch import torch.distributed as dist import torch.multiprocessing as mp from torch import nn from torch.nn.parallel import DistributedDataParallel as DDP import torchmetrics def metric_ddp(rank, world_size): os.environ["MASTER_ADDR"] = "localhost" os.environ["MASTER_PORT"] = "12355" # create default process group dist.init_process_group("gloo", rank=rank, world_size=world_size) # initialize model metric = torchmetrics.classification.Accuracy(task="multiclass", num_classes=5) # define a model and append your metric to it # this allows metric states to be placed on correct accelerators when # .to(device) is called on the model model = nn.Linear(10, 10) model.metric = metric model = model.to(rank) # initialize DDP model = DDP(model, device_ids=[rank]) n_epochs = 5 # this shows iteration over multiple training epochs for n in range(n_epochs): # this will be replaced by a DataLoader with a DistributedSampler n_batches = 10 for i in range(n_batches): # simulate a classification problem preds = torch.randn(10, 5).softmax(dim=-1) target = torch.randint(5, (10,)) # metric on current batch acc = metric(preds, target) if rank == 0: # print only for rank 0 print(f"Accuracy on batch {i}: {acc}") # metric on all batches and all accelerators using custom accumulation # accuracy is same across both accelerators acc = metric.compute() print(f"Accuracy on all data: {acc}, accelerator rank: {rank}") # Resetting internal state such that metric ready for new data metric.reset() # cleanup dist.destroy_process_group() if __name__ == "__main__": world_size = 2 # number of gpus to parallelize over mp.spawn(metric_ddp, args=(world_size,), nprocs=world_size, join=True)
- 多类别分类任务中,每个实例都只能属于一个类别。例如,对于手写数字识别任务,每个图像实例只能被归类为一个数字(0到9中的一个)。这种情况下,问题可以被视为一个离散选择问题。我们上文中提到过的二分类、多分类都属于多类别分类。
- 然而,对于多标签分类任务,每个实例可以被赋予多个标签。例如,在音乐分类任务中,一首歌曲可以同时属于多种风格,如“摇滚”和“经典”。
# Accuracy 模块的默认参数如下:指定任务类型,然后调用不同的类 def __new__( # type: ignore[misc] cls, task: Literal["binary", "multiclass", "multilabel"], threshold: float = 0.5, # 在 binary 和 mutilabel 任务中指定;在 multiclass 中内部会使用 argmax num_classes: Optional[int] = None, num_labels: Optional[int] = None, average: Optional[Literal["micro", "macro", "weighted", "none"]] = "micro", multidim_average: Literal["global", "samplewise"] = "global", top_k: Optional[int] = 1, ignore_index: Optional[int] = None, validate_args: bool = True, **kwargs: Any, ) -> Metric: # demo 示例 import torch from torchmetrics import Accuracy # Binary inputs binary_preds = torch.tensor([0, 1, 1]) binary_target = torch.tensor([1, 0, 1]) accuracy = Accuracy(task="binary") # threshold: 0.5 binary_acc = accuracy(binary_preds, binary_target) print(binary_acc) # tensor(0.3333) # Multi-class inputs mc_preds = torch.tensor([0, 2, 1]) mc_target = torch.tensor([0, 1, 2]) mc_accuracy = Accuracy(task="multiclass", num_classes=3) mc_acc = mc_accuracy(mc_preds, mc_target) print(mc_acc) # tensor(0.3333) # Multi-class inputs with probabilities,内部会首先进行 topk 或 argmax 处理 mc_preds_probs = torch.tensor([[0.8, 0.2, 0], [0.1, 0.2, 0.7], [0.3, 0.6, 0.1]]) mc_target_probs = torch.tensor([0, 1, 2]) mc_accuracy = Accuracy(task="multiclass", num_classes=3, top_k=2) # 默认 topk=1 mc_acc_logits = mc_accuracy(mc_preds_probs, mc_target_probs) print(mc_acc_logits) # tensor(0.6667) # Multi-label inputs ml_preds = torch.tensor([[0.11, 0.22, 0.84], [0.73, 0.33, 0.92]]) ml_target = torch.tensor([[0, 1, 0], [1, 0, 1]]) ml_accuracy = Accuracy(task="multilabel", num_labels=3) ml_acc = ml_accuracy(ml_preds, ml_target) print(ml_acc) # tensor(0.6667) # 多分类内部 tp/fp/fn/tn 的计算 elif average == "micro": preds = preds.flatten() target = target.flatten() if ignore_index is not None: idx = target != ignore_index preds = preds[idx] target = target[idx] tp = (preds == target).sum() fp = (preds != target).sum() fn = (preds != target).sum() tn = num_classes * preds.numel() - (fp + fn + tp)
使用 forward
和 update
As input to
the metric accepts the following input:
): An int tensor of shape(N, ...)
or float tensor of shape(N, C, ..)
. If preds is a floating
point we applytorch.argmax
along theC
dimension to automatically convert probabilities/logits into an int tensor.target
): An int tensor of shape(N, ...)
As output to
the metric returns the following output:
): A tensor with the accuracy score whose returned shape depends on theaverage
- If
is set toglobal
- If
, the output will be a scalar tensor- If
, the shape will be(C,)
- If
is set tosamplewise
- If
, the shape will be(N,)
- If
, the shape will be(N, C)
num_classes: Integer specifing the number of classes
average: Defines the reduction that is applied over labels. Should be one of the following:
: Sum statistics over all labelsmacro
: Calculate statistics for each label and average themweighted
: calculates statistics for each label and computes weighted average using their support"none"
: calculates statistic for each label and applies no reductiontop_k: Number of highest probability or logit score predictions considered to find the correct label. Only works when
contain probabilities/logits.multidim_average: Defines how additionally dimensions
should be handled. Should be one of the following:
: Additional dimensions are flatted along the batch dimensionsamplewise
: Statistic will be calculated independently for each sample on theN
axis. The statistics in this case are calculated over the additional dimensions.ignore_index: Specifies a target value that is ignored and does not contribute to the metric calculation
validate_args: bool indicating if input arguments and tensors should be validated for correctness. Set to
for faster
import torch
from torchmetrics import MeanSquaredError
target = torch.tensor([0., 1, 2, 3])
preds = torch.tensor([0., 1, 2, 1])
mean_squared_error = MeanSquaredError()
mse_error = mean_squared_error(preds, target)
print(mse_error) # tensor(1.)
L1 Loss
)import torch
from torchmetrics import MeanAbsoluteError
target = torch.tensor([3.0, -0.5, 2.0, 7.0])
preds = torch.tensor([2.5, 0.0, 2.0, 8.0])
mean_absolute_error = MeanAbsoluteError()
mae_error = mean_absolute_error(preds, target)
print(mae_error) # tensor(0.5000)
import torch
from torchmetrics import CosineSimilarity
target = torch.tensor([[0, 1], [1, 1]])
preds = torch.tensor([[0, 1], [0, 1]])
# reduction: how to reduce over the batch dimension using 'sum', 'mean' or 'none'
# (taking the individual scores)
cosine_similarity = CosineSimilarity(reduction='mean') # 默认为 sum
out = cosine_similarity(preds, target)
print(out) # tensor(0.8536)
import torch
from torchmetrics import KLDivergence
p = torch.tensor([[0.36, 0.48, 0.16]])
q = torch.tensor([[1 / 3, 1 / 3, 1 / 3]])
kl_divergence = KLDivergence()
out = kl_divergence(p, q)
print(out) # tensor(0.0853)
mean Average Precision
,可翻译为“全类平均精度”,是将所有类别检测的平均正确率(AP)进行综合加权平均而得到的。而 AP
是 PR曲线(精度-召回率曲线)下面积# MeanAveragePrecision 初始化参数 def __init__( self, box_format: Literal["xyxy", "xywh", "cxcywh"] = "xyxy", iou_type: Union[Literal["bbox", "segm"], Tuple[str]] = "bbox", iou_thresholds: Optional[List[float]] = None, rec_thresholds: Optional[List[float]] = None, max_detection_thresholds: Optional[List[int]] = None, class_metrics: bool = False, extended_summary: bool = False, average: Literal["macro", "micro"] = "macro", backend: Literal["pycocotools", "faster_coco_eval"] = "pycocotools", **kwargs: Any, ) -> None: import torch from torchmetrics.detection.mean_ap import MeanAveragePrecision # pip install pycocotools # 检测相关的 iou 计算 from torchmetrics.detection.ciou import CompleteIntersectionOverUnion from torchmetrics.detection.diou import DistanceIntersectionOverUnion from torchmetrics.detection.giou import GeneralizedIntersectionOverUnion from torchmetrics.detection.iou import IntersectionOverUnion from pprint import pprint preds = [ dict( boxes=torch.tensor([[258.0, 41.0, 606.0, 285.0]]), scores=torch.tensor([0.536]), labels=torch.tensor([0]), ) ] target = [ dict( boxes=torch.tensor([[214.0, 41.0, 562.0, 285.0]]), labels=torch.tensor([0]), ) ] metric = MeanAveragePrecision() out = metric(preds, target) pprint(out) # 输出如下: {'classes': tensor(0, dtype=torch.int32), 'map': tensor(0.6000), 'map_50': tensor(1.), 'map_75': tensor(1.), 'map_large': tensor(0.6000), 'map_medium': tensor(-1.), 'map_per_class': tensor(-1.), 'map_small': tensor(-1.), 'mar_1': tensor(0.6000), 'mar_10': tensor(0.6000), 'mar_100': tensor(0.6000), 'mar_100_per_class': tensor(-1.), 'mar_large': tensor(0.6000), 'mar_medium': tensor(-1.), 'mar_small': tensor(-1.)}
MMEval 是一个机器学习算法评测库,提供高效准确的 分布式评测 以及 多种机器学习框架后端 支持,具有以下特点:
- 提供丰富的计算机视觉各细分方向评测指标
- 支持多种分布式通信库,实现高效准确的分布式评测。
- 支持多种机器学习框架,根据输入自动分发对应实现。
- 安装与使用示例:
pip install mmeval from mmeval import Accuracy import numpy as np accuracy = Accuracy() # 第一种是直接调用实例化的 Accuracy 对象,计算评测指标。 labels = np.asarray([0, 1, 2, 3]) preds = np.asarray([0, 2, 1, 3]) accuracy(preds, labels) # {'top1': 0.5} # 第二种是累积多个批次的数据后,计算评测指标。 for i in range(10): labels = np.random.randint(0, 4, size=(100, )) predicts = np.random.randint(0, 4, size=(100, )) # 调用 `add` 方法,保存指标计算中间结果。 accuracy.add(predicts, labels) # 调用 compute 方法计算评测指标 accuracy.compute() # {'top1': ...} # 调用 reset 方法,清除保存的中间结果。 accuracy.reset()
1、torchmetrics 链接:https://github.com/Lightning-AI/torchmetrics
2、torchmetrics 文档:https://lightning.ai/docs/torchmetrics/stable/
3、torcheval 链接:https://github.com/pytorch/torcheval
4、torcheval 文档:https://pytorch.org/torcheval/stable/
5、huggingface/evaluate 链接:https://github.com/huggingface/evaluate
6、huggingface/evaluate 文档:https://huggingface.co/docs/evaluate/index
7、mmeval 链接:https://github.com/open-mmlab/mmeval
8、mmeval 文档:https://mmeval.readthedocs.io/zh-cn/latest/
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。