基于YOLOv8/YOLOv7/YOLOv6/YOLOv5的常见手势识别系统(深度学习模型+UI界面代码+训练数据集)
手势识别作为人机交互领域的重要研究方向,在虚拟现实、智能家居、无障碍交互等场景中具有广泛应用价值。本文将详细介绍基于YOLOv5/v6/v7/v8的实时手势识别系统的完整实现,涵盖算法原理、数据集构建、模型训练、系统集成及部署全过程。本系统实现了高精度、低延迟的手势识别,并提供了直观的图形用户界面,便于实际应用和二次开发。
摘要
手势识别作为人机交互领域的重要研究方向,在虚拟现实、智能家居、无障碍交互等场景中具有广泛应用价值。本文将详细介绍基于YOLOv5/v6/v7/v8的实时手势识别系统的完整实现,涵盖算法原理、数据集构建、模型训练、系统集成及部署全过程。本系统实现了高精度、低延迟的手势识别,并提供了直观的图形用户界面,便于实际应用和二次开发。
目录
1. 引言
1.1 手势识别的意义与应用场景
手势识别技术通过计算机视觉算法理解人类手部动作和姿态,实现自然的人机交互。其主要应用包括:
-
智能家居控制:通过手势控制灯光、电器等
-
虚拟现实/增强现实:实现自然的虚拟交互
-
医疗康复:辅助康复训练和评估
-
车载系统:减少驾驶员注意力分散
-
无障碍交互:帮助听力或语言障碍者沟通
1.2 YOLO算法在手势识别中的优势
YOLO(You Only Look Once)系列算法以其卓越的实时检测性能而闻名,特别适合手势识别这类需要快速响应的应用场景:
-
单阶段检测:直接回归目标位置和类别,速度更快
-
端到端训练:简化训练流程,提高模型性能
-
多尺度特征融合:更好地处理不同尺度的手势
-
持续演进:从YOLOv5到v8不断优化性能和精度
2. 系统架构设计
2.1 整体架构
text
┌─────────────────────────────────────────────────┐ │ 用户界面层 │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐│ │ │ 摄像头 │ │ 手势检测 │ │ 控制面板 ││ │ │ 输入模块 │ │ 显示模块 │ │ ││ │ └─────────────┘ └─────────────┘ └─────────────┘│ ├─────────────────────────────────────────────────┤ │ 业务逻辑层 │ │ ┌─────────────────────────────────────────┐ │ │ │ YOLO手势检测引擎 │ │ │ │ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ │ │ │ │ │预处理│ │推理│ │后处理│ │跟踪│ │ │ │ │ └─────┘ └─────┘ └─────┘ └─────┘ │ │ │ └─────────────────────────────────────────┘ │ ├─────────────────────────────────────────────────┤ │ 数据层 │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐│ │ │ 训练数据集 │ │ 模型权重 │ │ 配置参数 ││ │ └─────────────┘ └─────────────┘ └─────────────┘│ └─────────────────────────────────────────────────┘
2.2 模块设计
-
数据采集与预处理模块
-
YOLO模型训练与优化模块
-
实时检测与推理模块
-
手势识别后处理模块
-
用户界面交互模块
3. 数据集准备与增强
3.1 参考数据集
-
HaGRID (HAnd Gesture Recognition Image Dataset)
-
包含18种手势类别,超过55万张图像
-
标注包括边界框和手势类别
-
多样化的背景、光照和手部姿态
-
-
EgoHands
-
包含48个视频序列,超过15,000帧
-
适用于第一人称视角手势识别
-
精确的手部边界框标注
-
-
Hand Gesture Recognition Database
-
10种手势类别,超过20,000张图像
-
包含不同肤色、手势变化
-
统一背景,适用于初步实验
-
-
自定义数据集构建
python
# 数据集结构 hand_gesture_dataset/ ├── images/ │ ├── train/ │ └── val/ ├── labels/ │ ├── train/ │ └── val/ └── dataset.yaml
3.2 数据增强策略
为提高模型泛化能力,采用以下数据增强技术:
python
# 数据增强配置示例
augmentation = {
'hsv_h': 0.015, # 色调增强
'hsv_s': 0.7, # 饱和度增强
'hsv_v': 0.4, # 亮度增强
'rotation': 15, # 旋转角度
'scale': 0.5, # 缩放范围
'shear': 0.0, # 剪切变换
'flipud': 0.0, # 上下翻转概率
'fliplr': 0.5, # 左右翻转概率
'mosaic': 1.0, # Mosaic增强概率
'mixup': 0.5, # MixUp增强概率
}
4. YOLO模型实现与训练
4.1 环境配置
python
# 环境要求 """ Python 3.8+ PyTorch 1.7+ CUDA 11.0+ (GPU训练推荐) torchvision 0.8+ opencv-python albumentations PyQt5 (用于UI界面) """
4.2 YOLOv8模型实现
python
import torch
import torch.nn as nn
from ultralytics import YOLO
import cv2
import numpy as np
class GestureRecognitionSystem:
def __init__(self, model_path='weights/best.pt', device='cuda'):
"""
初始化手势识别系统
参数:
model_path: 模型权重路径
device: 运行设备 (cuda/cpu)
"""
self.device = device if torch.cuda.is_available() and device == 'cuda' else 'cpu'
self.model = self.load_model(model_path)
self.class_names = [
'ok', 'peace', 'thumbs_up', 'thumbs_down', 'call_me',
'stop', 'rock', 'like', 'dislike', 'fist',
'palm', 'point', 'victory', 'three', 'four',
'five', 'heart', 'hang_loose'
]
self.colors = self.generate_colors(len(self.class_names))
def load_model(self, model_path):
"""加载YOLOv8模型"""
try:
model = YOLO(model_path)
model.to(self.device)
model.eval()
print(f"模型加载成功,设备: {self.device}")
return model
except Exception as e:
print(f"模型加载失败: {e}")
raise
def generate_colors(self, n):
"""为每个类别生成唯一颜色"""
np.random.seed(42)
colors = np.random.randint(0, 255, size=(n, 3))
return colors
def preprocess(self, image):
"""图像预处理"""
# 保持原始图像用于显示
original_image = image.copy()
# 转换为RGB(如果输入是BGR)
if len(image.shape) == 3 and image.shape[2] == 3:
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
else:
image_rgb = image
return original_image, image_rgb
def detect(self, image, conf_threshold=0.5, iou_threshold=0.45):
"""
执行手势检测
参数:
image: 输入图像
conf_threshold: 置信度阈值
iou_threshold: IOU阈值
返回:
detections: 检测结果列表
processed_image: 绘制检测框的图像
"""
# 预处理
original_image, image_rgb = self.preprocess(image)
# YOLOv8推理
results = self.model(
image_rgb,
conf=conf_threshold,
iou=iou_threshold,
verbose=False
)
# 解析结果
detections = []
processed_image = original_image.copy()
if results[0].boxes is not None:
boxes = results[0].boxes.xyxy.cpu().numpy()
scores = results[0].boxes.conf.cpu().numpy()
classes = results[0].boxes.cls.cpu().numpy().astype(int)
for box, score, cls_id in zip(boxes, scores, classes):
x1, y1, x2, y2 = map(int, box)
class_name = self.class_names[cls_id]
# 添加到检测结果
detection = {
'bbox': [x1, y1, x2, y2],
'score': float(score),
'class': class_name,
'class_id': cls_id
}
detections.append(detection)
# 绘制检测框
color = tuple(map(int, self.colors[cls_id]))
# 绘制边界框
cv2.rectangle(processed_image, (x1, y1), (x2, y2), color, 2)
# 绘制标签背景
label = f"{class_name}: {score:.2f}"
(text_width, text_height), baseline = cv2.getTextSize(
label, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 1
)
cv2.rectangle(
processed_image,
(x1, y1 - text_height - baseline - 5),
(x1 + text_width, y1),
color,
-1
)
# 绘制标签文本
cv2.putText(
processed_image,
label,
(x1, y1 - baseline - 5),
cv2.FONT_HERSHEY_SIMPLEX,
0.5,
(255, 255, 255),
1
)
return detections, processed_image
def process_video(self, video_path, output_path=None):
"""
处理视频文件
参数:
video_path: 视频文件路径
output_path: 输出视频路径
"""
cap = cv2.VideoCapture(video_path)
if not cap.isOpened():
print(f"无法打开视频文件: {video_path}")
return
# 获取视频属性
fps = int(cap.get(cv2.CAP_PROP_FPS))
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
# 创建视频写入器
if output_path:
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))
frame_count = 0
total_fps = 0
while True:
ret, frame = cap.read()
if not ret:
break
frame_count += 1
# 计时开始
start_time = cv2.getTickCount()
# 执行检测
detections, processed_frame = self.detect(frame)
# 计时结束
end_time = cv2.getTickCount()
fps = cv2.getTickFrequency() / (end_time - start_time)
total_fps += fps
# 显示FPS
cv2.putText(
processed_frame,
f"FPS: {fps:.2f}",
(10, 30),
cv2.FONT_HERSHEY_SIMPLEX,
1,
(0, 255, 0),
2
)
# 显示检测数量
cv2.putText(
processed_frame,
f"Detections: {len(detections)}",
(10, 60),
cv2.FONT_HERSHEY_SIMPLEX,
1,
(0, 255, 0),
2
)
# 显示视频
cv2.imshow('Gesture Recognition', processed_frame)
# 写入输出视频
if output_path:
out.write(processed_frame)
# 退出条件
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# 释放资源
cap.release()
if output_path:
out.release()
cv2.destroyAllWindows()
# 输出统计信息
avg_fps = total_fps / frame_count if frame_count > 0 else 0
print(f"视频处理完成,平均FPS: {avg_fps:.2f}")
4.3 训练脚本
python
import os
import yaml
from ultralytics import YOLO
def train_yolov8():
"""训练YOLOv8手势识别模型"""
# 数据集配置
dataset_config = {
'path': 'datasets/hand_gesture', # 数据集根目录
'train': 'images/train', # 训练集路径
'val': 'images/val', # 验证集路径
'test': 'images/test', # 测试集路径
'nc': 18, # 类别数量
'names': [ # 类别名称
'ok', 'peace', 'thumbs_up', 'thumbs_down', 'call_me',
'stop', 'rock', 'like', 'dislike', 'fist',
'palm', 'point', 'victory', 'three', 'four',
'five', 'heart', 'hang_loose'
]
}
# 保存数据集配置文件
with open('datasets/hand_gesture/dataset.yaml', 'w') as f:
yaml.dump(dataset_config, f, default_flow_style=False)
# 加载预训练模型
model = YOLO('yolov8n.pt') # 可以使用yolov8s.pt, yolov8m.pt等
# 训练参数配置
train_args = {
'data': 'datasets/hand_gesture/dataset.yaml',
'epochs': 100,
'batch': 16,
'imgsz': 640,
'device': '0', # GPU设备ID,使用'cpu'进行CPU训练
'workers': 8,
'optimizer': 'AdamW',
'lr0': 0.001, # 初始学习率
'lrf': 0.01, # 最终学习率 = lr0 * lrf
'momentum': 0.937,
'weight_decay': 0.0005,
'warmup_epochs': 3,
'warmup_momentum': 0.8,
'box': 7.5, # 边界框损失权重
'cls': 0.5, # 分类损失权重
'dfl': 1.5, # DFL损失权重
'pose': 12.0, # 姿态损失权重(如果适用)
'kobj': 1.0, # 关键点物体损失权重
'label_smoothing': 0.0,
'nbs': 64, # 名义批量大小
'overlap_mask': True,
'mask_ratio': 4,
'dropout': 0.0,
'val': True, # 训练期间进行验证
'save': True, # 保存检查点
'save_period': 10, # 每10个epoch保存一次
'cache': False, # 缓存图像(需要大量RAM)
'project': 'runs/train', # 保存结果的项目目录
'name': 'exp', # 实验名称
'exist_ok': False, # 是否覆盖现有实验
'pretrained': True, # 使用预训练权重
'patience': 50, # 早停耐心值
'freeze': None, # 冻结层
'evolve': None,
'resume': False, # 从最新检查点恢复训练
'amp': True, # 自动混合精度
'fraction': 1.0, # 数据集使用比例
'profile': False, # 在训练期间分析ONNX和TensorRT速度
'seed': 0, # 随机种子
'close_mosaic': 10, # 最后10个epoch关闭mosaic
'erasing': 0.4, # 随机擦除概率
'crop_fraction': 1.0, # 图像裁剪比例
}
# 开始训练
results = model.train(**train_args)
# 验证模型
val_results = model.val()
# 导出模型
model.export(format='onnx', simplify=True)
return results, val_results
if __name__ == '__main__':
results, val_results = train_yolov8()
print("训练完成!")
print(f"mAP50-95: {val_results.box.map:.4f}")
print(f"mAP50: {val_results.box.map50:.4f}")
4.4 YOLOv5实现对比
python
# YOLOv5手势检测实现
import torch
import torch.nn.functional as F
class YOLOv5GestureDetector:
"""YOLOv5手势检测器"""
def __init__(self, model_path='yolov5s_gesture.pt'):
self.model = torch.hub.load('ultralytics/yolov5', 'custom',
path=model_path, force_reload=False)
self.model.eval()
# 类别映射
self.classes = {
0: 'ok', 1: 'peace', 2: 'thumbs_up', 3: 'thumbs_down',
4: 'call_me', 5: 'stop', 6: 'rock', 7: 'like',
8: 'dislike', 9: 'fist', 10: 'palm', 11: 'point',
12: 'victory', 13: 'three', 14: 'four', 15: 'five',
16: 'heart', 17: 'hang_loose'
}
def detect(self, image):
"""检测手势"""
results = self.model(image)
detections = []
for *xyxy, conf, cls in results.xyxy[0]:
detection = {
'bbox': [int(x) for x in xyxy],
'confidence': float(conf),
'class': self.classes[int(cls)],
'class_id': int(cls)
}
detections.append(detection)
return detections, results.render()[0]
5. 图形用户界面实现
5.1 基于PyQt5的UI界面
python
import sys
import cv2
from PyQt5.QtWidgets import *
from PyQt5.QtCore import *
from PyQt5.QtGui import *
import numpy as np
class GestureRecognitionUI(QMainWindow):
"""手势识别系统主界面"""
def __init__(self):
super().__init__()
self.detector = None
self.camera_active = False
self.cap = None
self.init_ui()
def init_ui(self):
"""初始化用户界面"""
self.setWindowTitle('基于YOLO的手势识别系统')
self.setGeometry(100, 100, 1400, 800)
# 设置窗口图标
self.setWindowIcon(QIcon('icon.png'))
# 创建中央部件
central_widget = QWidget()
self.setCentralWidget(central_widget)
# 主布局
main_layout = QHBoxLayout(central_widget)
# 左侧视频显示区域
left_panel = QFrame()
left_panel.setFrameStyle(QFrame.Box | QFrame.Raised)
left_layout = QVBoxLayout(left_panel)
# 视频显示标签
self.video_label = QLabel()
self.video_label.setAlignment(Qt.AlignCenter)
self.video_label.setStyleSheet("border: 2px solid gray; background-color: black;")
self.video_label.setMinimumSize(800, 600)
left_layout.addWidget(self.video_label)
# 视频控制按钮
control_layout = QHBoxLayout()
self.camera_btn = QPushButton('开启摄像头')
self.camera_btn.clicked.connect(self.toggle_camera)
control_layout.addWidget(self.camera_btn)
self.load_video_btn = QPushButton('加载视频')
self.load_video_btn.clicked.connect(self.load_video)
control_layout.addWidget(self.load_video_btn)
self.load_image_btn = QPushButton('加载图片')
self.load_image_btn.clicked.connect(self.load_image)
control_layout.addWidget(self.load_image_btn)
self.screenshot_btn = QPushButton('截图保存')
self.screenshot_btn.clicked.connect(self.save_screenshot)
control_layout.addWidget(self.screenshot_btn)
left_layout.addLayout(control_layout)
# 右侧控制面板
right_panel = QFrame()
right_panel.setFrameStyle(QFrame.Box | QFrame.Raised)
right_layout = QVBoxLayout(right_panel)
# 模型选择
model_group = QGroupBox("模型选择")
model_layout = QVBoxLayout()
self.model_combo = QComboBox()
self.model_combo.addItems(['YOLOv8n', 'YOLOv8s', 'YOLOv8m', 'YOLOv8l', 'YOLOv5s', 'YOLOv7'])
model_layout.addWidget(QLabel("选择模型版本:"))
model_layout.addWidget(self.model_combo)
self.load_model_btn = QPushButton('加载模型')
self.load_model_btn.clicked.connect(self.load_model)
model_layout.addWidget(self.load_model_btn)
model_group.setLayout(model_layout)
right_layout.addWidget(model_group)
# 参数设置
param_group = QGroupBox("检测参数")
param_layout = QFormLayout()
self.conf_slider = QSlider(Qt.Horizontal)
self.conf_slider.setRange(10, 90)
self.conf_slider.setValue(50)
self.conf_label = QLabel('0.5')
self.conf_slider.valueChanged.connect(self.update_conf_label)
param_layout.addRow('置信度阈值:', self.conf_slider)
param_layout.addRow('当前值:', self.conf_label)
self.iou_slider = QSlider(Qt.Horizontal)
self.iou_slider.setRange(10, 90)
self.iou_slider.setValue(45)
self.iou_label = QLabel('0.45')
self.iou_slider.valueChanged.connect(self.update_iou_label)
param_layout.addRow('IOU阈值:', self.iou_slider)
param_layout.addRow('当前值:', self.iou_label)
param_group.setLayout(param_layout)
right_layout.addWidget(param_group)
# 检测结果显示
result_group = QGroupBox("检测结果")
result_layout = QVBoxLayout()
self.result_table = QTableWidget()
self.result_table.setColumnCount(4)
self.result_table.setHorizontalHeaderLabels(['类别', '置信度', '位置', 'ID'])
self.result_table.setEditTriggers(QTableWidget.NoEditTriggers)
result_layout.addWidget(self.result_table)
self.status_label = QLabel('状态: 等待检测')
result_layout.addWidget(self.status_label)
self.fps_label = QLabel('FPS: 0.0')
result_layout.addWidget(self.fps_label)
result_group.setLayout(result_layout)
right_layout.addWidget(result_group)
# 手势说明
gesture_group = QGroupBox("手势说明")
gesture_layout = QVBoxLayout()
gesture_text = QTextEdit()
gesture_text.setReadOnly(True)
gesture_text.setText("""
支持的手势类别:
1. 👌 OK手势
2. ✌️ 和平手势
3. 👍 大拇指向上
4. 👎 大拇指向下
5. 🤙 给我打电话
6. ✋ 停止手势
7. 🤘 摇滚手势
8. 👍 点赞
9. 👎 点踩
10. ✊ 拳头
11. 🖐️ 手掌
12. 👈 指向
13. ✌️ 胜利手势
14. 3️⃣ 数字三
15. 4️⃣ 数字四
16. 5️⃣ 数字五
17. ❤️ 爱心手势
18. 🤙 放松手势
""")
gesture_layout.addWidget(gesture_text)
gesture_group.setLayout(gesture_layout)
right_layout.addWidget(gesture_group)
# 添加到主布局
main_layout.addWidget(left_panel, 70)
main_layout.addWidget(right_panel, 30)
# 状态栏
self.statusBar().showMessage('就绪')
# 定时器用于更新视频帧
self.timer = QTimer()
self.timer.timeout.connect(self.update_frame)
# 加载默认模型
QTimer.singleShot(100, self.load_default_model)
def load_default_model(self):
"""加载默认模型"""
try:
self.detector = GestureRecognitionSystem('weights/yolov8n_gesture.pt')
self.statusBar().showMessage('默认模型加载成功')
except:
self.statusBar().showMessage('模型加载失败,请手动加载')
def load_model(self):
"""加载选择的模型"""
model_name = self.model_combo.currentText()
model_paths = {
'YOLOv8n': 'weights/yolov8n_gesture.pt',
'YOLOv8s': 'weights/yolov8s_gesture.pt',
'YOLOv8m': 'weights/yolov8m_gesture.pt',
'YOLOv8l': 'weights/yolov8l_gesture.pt',
'YOLOv5s': 'weights/yolov5s_gesture.pt',
'YOLOv7': 'weights/yolov7_gesture.pt'
}
if model_name in model_paths:
try:
self.detector = GestureRecognitionSystem(model_paths[model_name])
QMessageBox.information(self, '成功', f'{model_name}模型加载成功!')
self.statusBar().showMessage(f'{model_name}模型已加载')
except Exception as e:
QMessageBox.critical(self, '错误', f'模型加载失败: {str(e)}')
def toggle_camera(self):
"""切换摄像头状态"""
if not self.camera_active:
# 开启摄像头
self.cap = cv2.VideoCapture(0)
if not self.cap.isOpened():
QMessageBox.critical(self, '错误', '无法打开摄像头')
return
self.camera_active = True
self.camera_btn.setText('关闭摄像头')
self.timer.start(30) # 约33FPS
self.statusBar().showMessage('摄像头已开启')
else:
# 关闭摄像头
self.camera_active = False
self.camera_btn.setText('开启摄像头')
if self.cap:
self.cap.release()
self.timer.stop()
self.video_label.clear()
self.statusBar().showMessage('摄像头已关闭')
def update_frame(self):
"""更新视频帧"""
if self.cap and self.cap.isOpened():
ret, frame = self.cap.read()
if ret:
# 记录时间用于计算FPS
start_time = cv2.getTickCount()
if self.detector:
# 执行检测
detections, processed_frame = self.detector.detect(
frame,
conf_threshold=self.conf_slider.value() / 100.0,
iou_threshold=self.iou_slider.value() / 100.0
)
# 更新结果表格
self.update_result_table(detections)
# 计算FPS
end_time = cv2.getTickCount()
fps = cv2.getTickFrequency() / (end_time - start_time)
self.fps_label.setText(f'FPS: {fps:.1f}')
else:
processed_frame = frame
self.status_label.setText('状态: 模型未加载')
# 转换为Qt图像格式
height, width, channel = processed_frame.shape
bytes_per_line = 3 * width
qt_image = QImage(processed_frame.data, width, height,
bytes_per_line, QImage.Format_RGB888)
qt_image = qt_image.rgbSwapped()
# 显示图像
pixmap = QPixmap.fromImage(qt_image)
scaled_pixmap = pixmap.scaled(self.video_label.size(),
Qt.KeepAspectRatio,
Qt.SmoothTransformation)
self.video_label.setPixmap(scaled_pixmap)
def update_result_table(self, detections):
"""更新检测结果表格"""
self.result_table.setRowCount(len(detections))
for i, detection in enumerate(detections):
# 类别
class_item = QTableWidgetItem(detection['class'])
# 置信度
conf_item = QTableWidgetItem(f"{detection['score']:.3f}")
# 位置
bbox = detection['bbox']
pos_item = QTableWidgetItem(f"({bbox[0]}, {bbox[1]}) - ({bbox[2]}, {bbox[3]})")
# ID
id_item = QTableWidgetItem(str(detection['class_id']))
self.result_table.setItem(i, 0, class_item)
self.result_table.setItem(i, 1, conf_item)
self.result_table.setItem(i, 2, pos_item)
self.result_table.setItem(i, 3, id_item)
self.status_label.setText(f'状态: 检测到 {len(detections)} 个手势')
def load_video(self):
"""加载视频文件"""
if self.camera_active:
self.toggle_camera()
file_path, _ = QFileDialog.getOpenFileName(
self, '选择视频文件', '',
'视频文件 (*.mp4 *.avi *.mov *.mkv);;所有文件 (*.*)'
)
if file_path:
self.cap = cv2.VideoCapture(file_path)
if self.cap.isOpened():
self.camera_active = True
self.camera_btn.setText('停止视频')
self.timer.start(30)
self.statusBar().showMessage(f'正在播放: {file_path}')
else:
QMessageBox.critical(self, '错误', '无法打开视频文件')
def load_image(self):
"""加载图片文件"""
if self.camera_active:
self.toggle_camera()
file_path, _ = QFileDialog.getOpenFileName(
self, '选择图片文件', '',
'图片文件 (*.jpg *.jpeg *.png *.bmp);;所有文件 (*.*)'
)
if file_path and self.detector:
# 读取图像
image = cv2.imread(file_path)
if image is not None:
# 执行检测
detections, processed_image = self.detector.detect(
image,
conf_threshold=self.conf_slider.value() / 100.0,
iou_threshold=self.iou_slider.value() / 100.0
)
# 更新结果表格
self.update_result_table(detections)
# 显示图像
height, width, channel = processed_image.shape
bytes_per_line = 3 * width
qt_image = QImage(processed_image.data, width, height,
bytes_per_line, QImage.Format_RGB888)
qt_image = qt_image.rgbSwapped()
pixmap = QPixmap.fromImage(qt_image)
scaled_pixmap = pixmap.scaled(self.video_label.size(),
Qt.KeepAspectRatio,
Qt.SmoothTransformation)
self.video_label.setPixmap(scaled_pixmap)
self.statusBar().showMessage(f'图片已加载: {file_path}')
else:
QMessageBox.critical(self, '错误', '无法读取图片文件')
def save_screenshot(self):
"""保存截图"""
if hasattr(self.video_label, 'pixmap') and self.video_label.pixmap():
file_path, _ = QFileDialog.getSaveFileName(
self, '保存截图', 'gesture_screenshot.png',
'PNG文件 (*.png);;JPEG文件 (*.jpg *.jpeg);;所有文件 (*.*)'
)
if file_path:
self.video_label.pixmap().save(file_path)
QMessageBox.information(self, '成功', f'截图已保存到: {file_path}')
def update_conf_label(self, value):
"""更新置信度标签"""
self.conf_label.setText(f'{value/100:.2f}')
def update_iou_label(self, value):
"""更新IOU标签"""
self.iou_label.setText(f'{value/100:.2f}')
def closeEvent(self, event):
"""关闭事件"""
if self.cap:
self.cap.release()
self.timer.stop()
event.accept()
def main():
"""主函数"""
app = QApplication(sys.argv)
# 设置应用样式
app.setStyle('Fusion')
# 创建并显示主窗口
window = GestureRecognitionUI()
window.show()
sys.exit(app.exec_())
if __name__ == '__main__':
main()
6. 模型评估与优化
6.1 评估指标
python
import numpy as np
from sklearn.metrics import precision_recall_curve, average_precision_score
import matplotlib.pyplot as plt
def evaluate_model(model, test_loader):
"""评估模型性能"""
all_predictions = []
all_targets = []
model.eval()
with torch.no_grad():
for images, targets in test_loader:
images = images.to(device)
predictions = model(images)
# 处理预测结果
processed_preds = process_predictions(predictions)
processed_targets = process_targets(targets)
all_predictions.extend(processed_preds)
all_targets.extend(processed_targets)
# 计算评估指标
metrics = calculate_metrics(all_predictions, all_targets)
return metrics
def calculate_metrics(predictions, targets):
"""计算各种评估指标"""
metrics = {
'precision': [],
'recall': [],
'f1_score': [],
'ap': [], # 各类别AP
'map': 0, # mAP
'map50': 0, # mAP@0.5
'map75': 0, # mAP@0.75
}
# 计算每个类别的指标
for class_id in range(num_classes):
# 提取当前类别的预测和标签
class_preds = [p for p in predictions if p['class_id'] == class_id]
class_targets = [t for t in targets if t['class_id'] == class_id]
# 计算Precision, Recall, F1
if class_preds or class_targets:
precision, recall, f1 = calculate_prf(class_preds, class_targets)
metrics['precision'].append(precision)
metrics['recall'].append(recall)
metrics['f1_score'].append(f1)
# 计算AP
ap = calculate_ap(class_preds, class_targets)
metrics['ap'].append(ap)
# 计算mAP
metrics['map'] = np.mean(metrics['ap']) if metrics['ap'] else 0
metrics['map50'] = calculate_map_at_iou(predictions, targets, iou_threshold=0.5)
metrics['map75'] = calculate_map_at_iou(predictions, targets, iou_threshold=0.75)
return metrics
6.2 性能优化策略
6.2.1 模型剪枝
python
def model_pruning(model, pruning_rate=0.3):
"""模型剪枝"""
parameters_to_prune = []
# 选择要剪枝的层
for name, module in model.named_modules():
if isinstance(module, nn.Conv2d):
parameters_to_prune.append((module, 'weight'))
# 执行剪枝
prune.global_unstructured(
parameters_to_prune,
pruning_method=prune.L1Unstructured,
amount=pruning_rate
)
# 移除剪枝掩码
for module, param_name in parameters_to_prune:
prune.remove(module, param_name)
return model
6.2.2 知识蒸馏
python
class KnowledgeDistillation:
"""知识蒸馏"""
def __init__(self, teacher_model, student_model, temperature=3.0, alpha=0.7):
self.teacher = teacher_model
self.student = student_model
self.temperature = temperature
self.alpha = alpha
def distill(self, images, labels):
"""蒸馏训练"""
# 教师模型预测
with torch.no_grad():
teacher_logits = self.teacher(images)
# 学生模型预测
student_logits = self.student(images)
# 计算蒸馏损失
distillation_loss = nn.KLDivLoss()(
F.log_softmax(student_logits / self.temperature, dim=1),
F.softmax(teacher_logits / self.temperature, dim=1)
) * (self.alpha * self.temperature * self.temperature)
# 计算学生损失
student_loss = F.cross_entropy(student_logits, labels) * (1 - self.alpha)
# 总损失
total_loss = distillation_loss + student_loss
return total_loss
7. 部署与性能测试
7.1 ONNX导出与优化
python
def export_to_onnx(model, input_shape=(1, 3, 640, 640)):
"""导出模型到ONNX格式"""
# 创建虚拟输入
dummy_input = torch.randn(*input_shape)
# 导出ONNX
torch.onnx.export(
model,
dummy_input,
"gesture_recognition.onnx",
export_params=True,
opset_version=12,
do_constant_folding=True,
input_names=['input'],
output_names=['output'],
dynamic_axes={
'input': {0: 'batch_size'},
'output': {0: 'batch_size'}
}
)
# ONNX模型优化
import onnx
from onnxsim import simplify
# 加载导出的模型
onnx_model = onnx.load("gesture_recognition.onnx")
# 简化模型
simplified_model, check = simplify(onnx_model)
assert check, "Simplified ONNX model could not be validated"
# 保存简化后的模型
onnx.save(simplified_model, "gesture_recognition_simplified.onnx")
print("ONNX模型导出并优化完成")
7.2 TensorRT加速
python
import tensorrt as trt
def build_tensorrt_engine(onnx_path, engine_path, fp16_mode=True):
"""构建TensorRT引擎"""
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
with trt.Builder(TRT_LOGGER) as builder, \
builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)) as network, \
trt.OnnxParser(network, TRT_LOGGER) as parser:
# 配置构建器
config = builder.create_builder_config()
config.max_workspace_size = 1 << 30 # 1GB
if fp16_mode and builder.platform_has_fast_fp16:
config.set_flag(trt.BuilderFlag.FP16)
# 解析ONNX模型
with open(onnx_path, 'rb') as model:
if not parser.parse(model.read()):
print('ONNX解析失败:')
for error in range(parser.num_errors):
print(parser.get_error(error))
return None
# 构建引擎
engine = builder.build_serialized_network(network, config)
# 保存引擎
with open(engine_path, 'wb') as f:
f.write(engine)
print(f"TensorRT引擎已保存到: {engine_path}")
return engine_path
7.3 性能测试结果
python
def benchmark_performance():
"""性能基准测试"""
test_configs = [
{'model': 'YOLOv8n', 'resolution': 640, 'device': 'CPU'},
{'model': 'YOLOv8s', 'resolution': 640, 'device': 'CPU'},
{'model': 'YOLOv8n', 'resolution': 640, 'device': 'GPU'},
{'model': 'YOLOv8n', 'resolution': 320, 'device': 'GPU'},
{'model': 'YOLOv8n TensorRT', 'resolution': 640, 'device': 'GPU'},
]
results = []
for config in test_configs:
print(f"\n测试配置: {config}")
# 加载模型
if 'TensorRT' in config['model']:
detector = TensorRTDetector(f"weights/{config['model'].replace(' ', '_').lower()}.engine")
else:
detector = GestureRecognitionSystem(f"weights/{config['model'].lower()}_gesture.pt")
# 性能测试
fps, latency, accuracy = test_inference_speed(detector,
resolution=config['resolution'])
results.append({
'model': config['model'],
'resolution': config['resolution'],
'device': config['device'],
'fps': fps,
'latency_ms': latency * 1000,
'accuracy': accuracy
})
# 输出结果表格
print("\n性能测试结果:")
print("="*80)
print(f"{'模型':<20} {'分辨率':<10} {'设备':<8} {'FPS':<8} {'延迟(ms)':<12} {'准确率':<8}")
print("-"*80)
for result in results:
print(f"{result['model']:<20} {result['resolution']:<10} "
f"{result['device']:<8} {result['fps']:<8.1f} "
f"{result['latency_ms']:<12.2f} {result['accuracy']:<8.2%}")
8. 实际应用案例
8.1 智能家居控制
python
class SmartHomeController:
"""智能家居手势控制器"""
def __init__(self, gesture_detector):
self.detector = gesture_detector
self.gesture_actions = {
'thumbs_up': self.turn_on_lights,
'thumbs_down': self.turn_off_lights,
'ok': self.toggle_tv,
'peace': self.adjust_volume_up,
'fist': self.adjust_volume_down,
'palm': self.toggle_ac,
'stop': self.emergency_stop,
}
def process_gesture(self, gesture):
"""处理手势命令"""
if gesture in self.gesture_actions:
self.gesture_actions[gesture]()
return True
return False
def turn_on_lights(self):
"""打开灯光"""
print("手势控制:打开灯光")
# 实际控制代码
# homeassistant.turn_on('light.living_room')
def turn_off_lights(self):
"""关闭灯光"""
print("手势控制:关闭灯光")
# homeassistant.turn_off('light.living_room')
def toggle_tv(self):
"""开关电视"""
print("手势控制:切换电视状态")
# homeassistant.toggle('media_player.tv')
def emergency_stop(self):
"""紧急停止"""
print("手势控制:紧急停止所有设备")
# homeassistant.turn_off('all')
8.2 虚拟现实交互
python
class VRGestureInterface:
"""VR手势交互接口"""
def __init__(self):
self.detector = GestureRecognitionSystem()
self.current_gesture = None
self.gesture_start_time = None
self.gesture_hold_threshold = 1.0 # 手势保持时间阈值
def update(self, vr_camera_frame):
"""更新VR手势状态"""
detections, _ = self.detector.detect(vr_camera_frame)
if detections:
# 获取置信度最高的手势
best_detection = max(detections, key=lambda x: x['score'])
if best_detection['score'] > 0.7:
gesture = best_detection['class']
if gesture != self.current_gesture:
self.current_gesture = gesture
self.gesture_start_time = time.time()
self.on_gesture_start(gesture)
else:
hold_time = time.time() - self.gesture_start_time
if hold_time > self.gesture_hold_threshold:
self.on_gesture_hold(gesture, hold_time)
else:
if self.current_gesture:
self.on_gesture_end(self.current_gesture)
self.current_gesture = None
def on_gesture_start(self, gesture):
"""手势开始"""
print(f"检测到手势: {gesture}")
# VR交互逻辑
if gesture == 'grab':
self.vr_grab_object()
elif gesture == 'point':
self.vr_select_object()
def on_gesture_hold(self, gesture, duration):
"""手势保持"""
if gesture == 'thumbs_up':
self.vr_increase_scale(duration)
elif gesture == 'thumbs_down':
self.vr_decrease_scale(duration)
9. 系统完整代码整合
9.1 项目结构
text
gesture_recognition_system/ │ ├── configs/ # 配置文件 │ ├── yolov8.yaml │ ├── yolov5.yaml │ └── inference_config.yaml │ ├── data/ # 数据集 │ ├── images/ │ ├── labels/ │ └── dataset.yaml │ ├── models/ # 模型定义 │ ├── yolov8.py │ ├── yolov5.py │ └── common.py │ ├── utils/ # 工具函数 │ ├── data_augmentation.py │ ├── visualization.py │ ├── metrics.py │ └── helpers.py │ ├── weights/ # 模型权重 │ ├── yolov8n_gesture.pt │ ├── yolov8s_gesture.pt │ └── yolov5s_gesture.pt │ ├── train.py # 训练脚本 ├── detect.py # 检测脚本 ├── export.py # 模型导出 ├── evaluate.py # 评估脚本 ├── app.py # 主应用程序 ├── requirements.txt # 依赖包 └── README.md # 项目说明
9.2 主应用程序入口
python
# main.py
import sys
import argparse
from pathlib import Path
def main():
parser = argparse.ArgumentParser(description='手势识别系统')
parser.add_argument('--mode', type=str, default='gui',
choices=['gui', 'train', 'detect', 'export', 'evaluate'],
help='运行模式: gui(图形界面), train(训练), detect(检测), export(导出), evaluate(评估)')
parser.add_argument('--model', type=str, default='yolov8n',
help='模型类型: yolov8n, yolov8s, yolov8m, yolov5s, yolov7')
parser.add_argument('--source', type=str, default='0',
help='输入源: 摄像头ID, 视频文件路径, 图片路径')
parser.add_argument('--weights', type=str, default='weights/best.pt',
help='模型权重路径')
parser.add_argument('--conf', type=float, default=0.5,
help='置信度阈值')
parser.add_argument('--iou', type=float, default=0.45,
help='IOU阈值')
parser.add_argument('--device', type=str, default='cuda',
help='运行设备: cuda, cpu')
parser.add_argument('--imgsz', type=int, default=640,
help='推理图像尺寸')
args = parser.parse_args()
if args.mode == 'gui':
# 启动图形界面
from ui.main_window import GestureRecognitionUI
app = QApplication(sys.argv)
window = GestureRecognitionUI()
window.show()
sys.exit(app.exec_())
elif args.mode == 'train':
# 训练模式
from train import train_model
train_model(args)
elif args.mode == 'detect':
# 检测模式
from detect import run_detection
run_detection(args)
elif args.mode == 'export':
# 导出模式
from export import export_model
export_model(args)
elif args.mode == 'evaluate':
# 评估模式
from evaluate import evaluate_model
evaluate_model(args)
if __name__ == '__main__':
main()
9.3 安装与使用说明
bash
# 1. 克隆项目 git clone https://github.com/yourusername/gesture-recognition-system.git cd gesture-recognition-system # 2. 安装依赖 pip install -r requirements.txt # 3. 准备数据集 # 将数据集放在 data/ 目录下,按照YOLO格式组织 # 4. 训练模型 python main.py --mode train --model yolov8n --epochs 100 # 5. 启动图形界面 python main.py --mode gui # 6. 命令行检测 python main.py --mode detect --source 0 # 摄像头 python main.py --mode detect --source video.mp4 # 视频文件 python main.py --mode detect --source image.jpg # 图片文件
魔乐社区(Modelers.cn) 是一个中立、公益的人工智能社区,提供人工智能工具、模型、数据的托管、展示与应用协同服务,为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作,由全产业链共同建设、共同运营、共同享有,推动国产AI生态繁荣发展。
更多推荐


所有评论(0)