保姆级教程:用Python和PyTorch从零搭建一个行人重识别(ReID)原型系统
2026/6/1 8:58:00 网站建设 项目流程

保姆级教程:用Python和PyTorch从零搭建一个行人重识别(ReID)原型系统

行人重识别技术正在成为智能安防和视频分析领域的核心工具。想象一下这样的场景:当你在商场寻找走散的家人时,摄像头系统能自动追踪他们的移动轨迹;或者在地铁站,安保人员需要快速定位特定着装的可疑人员——这些正是ReID技术的典型应用。本教程将带你用Python和PyTorch,从零开始构建一个完整的ReID系统原型。

1. 环境准备与数据加载

在开始编码前,我们需要搭建开发环境。推荐使用Anaconda创建独立的Python环境,避免依赖冲突:

conda create -n reid python=3.8 conda activate reid pip install torch torchvision opencv-python scikit-learn

Market-1501是ReID领域最常用的基准数据集之一,包含32,668张标注图像和1,501个行人ID。下载解压后,目录结构通常如下:

Market-1501/ ├── bounding_box_test/ # 测试集 ├── bounding_box_train/ # 训练集 ├── gt_bbox/ # 手工标注区域 └── query/ # 查询图像

使用PyTorch的ImageFolder加载数据时,需要自定义数据集类处理ReID特有的标注信息。以下是关键的数据加载代码片段:

from torch.utils.data import Dataset import os from PIL import Image class MarketDataset(Dataset): def __init__(self, root_dir, transform=None): self.root_dir = root_dir self.transform = transform self.samples = [] for img_name in os.listdir(root_dir): if not img_name.endswith('.jpg'): continue pid, camid, _ = map(int, img_name.split('_')[:3]) img_path = os.path.join(root_dir, img_name) self.samples.append((img_path, pid, camid)) def __getitem__(self, index): img_path, pid, camid = self.samples[index] img = Image.open(img_path).convert('RGB') if self.transform: img = self.transform(img) return img, pid, camid

注意:Market-1501中的图像来自6个不同摄像头,同一个行人可能在不同摄像头下出现,这正是ReID要解决的跨摄像头匹配问题。

2. 特征提取网络构建

ResNet50是ReID任务中最常用的骨干网络。我们基于PyTorch实现一个改进版本,添加了适用于ReID的特定模块:

import torch.nn as nn from torchvision.models import resnet50 class ReIDResNet(nn.Module): def __init__(self, num_classes): super().__init__() base = resnet50(pretrained=True) # 移除原始分类层 self.features = nn.Sequential(*list(base.children())[:-2]) # 添加全局平均池化 self.global_avg_pool = nn.AdaptiveAvgPool2d((1, 1)) # 分类层 self.classifier = nn.Linear(2048, num_classes) def forward(self, x): x = self.features(x) # [batch, 2048, 16, 8] global_feat = self.global_avg_pool(x) # [batch, 2048, 1, 1] global_feat = global_feat.view(global_feat.size(0), -1) # [batch, 2048] if not self.training: return global_feat cls_score = self.classifier(global_feat) return global_feat, cls_score

实际应用中,我们还需要考虑以下优化点:

  • 局部特征提取:在全局特征基础上添加水平切块模块
  • 注意力机制:引入CBAM或SE模块增强关键区域特征
  • BNNeck:在全局特征和分类层之间添加批归一化层

特征提取网络的输出维度直接影响后续度量学习的效果。典型的特征维度选择有:

维度优点缺点
512计算量小表达能力有限
1024平衡性好需要更多数据
2048强表征能力计算成本高

3. 度量学习与损失函数

度量学习是ReID的核心,目标是让相同ID的特征更接近,不同ID的特征更远。Triplet Loss是最基础的度量学习方法:

import torch import torch.nn.functional as F def triplet_loss(anchor, positive, negative, margin=0.3): pos_dist = F.pairwise_distance(anchor, positive) neg_dist = F.pairwise_distance(anchor, negative) loss = torch.clamp(pos_dist - neg_dist + margin, min=0.0) return loss.mean()

在实际训练中,我们需要实现高效的在线难样本挖掘(Online Hard Mining):

from itertools import combinations def batch_hard_triplet_loss(embeddings, labels, margin=0.3): pairwise_dist = torch.cdist(embeddings, embeddings, p=2) # 计算所有样本间距离 mask = labels.expand(len(labels), len(labels)) == labels.expand(len(labels), len(labels)).t() hardest_positive = (pairwise_dist * mask.float()).max(dim=1)[0] negative_mask = ~mask max_dist = pairwise_dist.max() hardest_negative = (pairwise_dist + max_dist * mask.float()).min(dim=1)[0] return F.relu(hardest_positive - hardest_negative + margin).mean()

现代ReID系统通常组合多种损失函数:

  • 交叉熵损失:保证分类准确性
  • Triplet Loss:优化特征空间分布
  • Circle Loss:更平滑的优化目标
def combined_loss(global_feat, cls_score, labels): # 分类损失 ce_loss = F.cross_entropy(cls_score, labels) # 度量学习损失 tri_loss = batch_hard_triplet_loss(global_feat, labels) # 权重平衡 return ce_loss + 0.5 * tri_loss

4. 图像检索与评估

构建完整的ReID系统需要实现图像检索流程。给定查询图像,系统应返回最相似的库图像:

from sklearn.metrics.pairwise import cosine_similarity import numpy as np class ReIDSystem: def __init__(self, model, gallery_loader): self.model = model self.model.eval() # 预计算图库特征 self.gallery_feats = [] self.gallery_labels = [] with torch.no_grad(): for img, label, _ in gallery_loader: feat = model(img.cuda()) self.gallery_feats.append(feat.cpu()) self.gallery_labels.append(label) self.gallery_feats = torch.cat(self.gallery_feats) self.gallery_labels = torch.cat(self.gallery_labels) def search(self, query_img, topk=10): with torch.no_grad(): query_feat = self.model(query_img.unsqueeze(0).cuda()).cpu() # 计算相似度 sim = cosine_similarity(query_feat, self.gallery_feats) topk_idx = np.argsort(sim[0])[-topk:][::-1] return [(self.gallery_feats[i], self.gallery_labels[i]) for i in topk_idx]

评估ReID系统性能的主要指标是CMC曲线和mAP:

  • CMC@k:正确结果出现在前k位的概率
  • mAP:考虑排序位置的平均精度

实现评估代码:

def evaluate(model, query_loader, gallery_loader): model.eval() # 提取所有查询特征 query_feats, query_labels = [], [] with torch.no_grad(): for img, label, _ in query_loader: feat = model(img.cuda()) query_feats.append(feat.cpu()) query_labels.append(label) # 提取所有图库特征 gallery_feats, gallery_labels = [], [] with torch.no_grad(): for img, label, _ in gallery_loader: feat = model(img.cuda()) gallery_feats.append(feat.cpu()) gallery_labels.append(label) # 计算相似度矩阵 query_feats = torch.cat(query_feats) gallery_feats = torch.cat(gallery_feats) sim_matrix = cosine_similarity(query_feats, gallery_feats) # 计算CMC和mAP cmc = torch.zeros(len(query_labels)).long() ap = torch.zeros(len(query_labels)) for i in range(len(query_labels)): # 对每个查询样本排序结果 scores = sim_matrix[i] labels = (gallery_labels == query_labels[i]).float() # 计算AP _, indices = torch.sort(scores, descending=True) matches = labels[indices] pos_cnt = 0 total_precision = 0 for k in range(len(matches)): if matches[k]: pos_cnt += 1 total_precision += pos_cnt / (k + 1) ap[i] = total_precision / max(pos_cnt, 1) # 计算CMC cmc[i:] += (matches.cumsum(0) > 0).float() return cmc / len(query_labels), ap.mean()

5. 系统优化与部署

完成基础系统后,我们可以从多个角度进行优化:

数据增强策略

  • 随机擦除(Random Erasing)
  • 颜色抖动(Color Jitter)
  • 姿态变换(Pose Transformation)
from torchvision import transforms train_transform = transforms.Compose([ transforms.Resize((256, 128)), transforms.RandomHorizontalFlip(), transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), transforms.RandomErasing(p=0.5, scale=(0.02, 0.4), ratio=(0.3, 3.3)) ])

模型蒸馏:使用大模型指导小模型训练

class DistillLoss(nn.Module): def __init__(self, T=3.0): super().__init__() self.T = T self.kl_div = nn.KLDivLoss(reduction='batchmean') def forward(self, student_logits, teacher_logits): soft_teacher = F.softmax(teacher_logits/self.T, dim=1) soft_student = F.log_softmax(student_logits/self.T, dim=1) return self.kl_div(soft_student, soft_teacher) * (self.T ** 2)

部署优化

  1. 模型量化减小体积
  2. 使用TensorRT加速推理
  3. 构建高效的向量检索系统
# 模型量化示例 quantized_model = torch.quantization.quantize_dynamic( model, {nn.Linear}, dtype=torch.qint8 )

在真实场景部署时,还需要考虑:

  • 多摄像头间的色彩校正
  • 行人检测与ReID的联合优化
  • 实时性要求下的性能平衡

需要专业的网站建设服务?

联系我们获取免费的网站建设咨询和方案报价,让我们帮助您实现业务目标

立即咨询