在计算机视觉领域,目标检测是许多应用的基础技术,从自动驾驶到安防监控,再到工业质检。然而,构建一个实用的目标检测系统往往面临两大挑战:高质量数据标注的耗时耗力部署使用的便捷性

基于YOLOv8的多目标检测与自动标注软件,不仅能准确检测上百类目标,还支持图片、视频和摄像头实时输入,更重要的是具备自动标注功能,将标注效率提升数倍!

1 软件核心功能亮点

  • 上百类目标检测:基于YOLOv8的强大性能,支持COCO数据集的80个类别,并可扩展自定义类别

  • 多源输入支持:图片(JPG/PNG/BMP)、视频(MP4/AVI/MOV)和实时摄像头输入

  • 智能自动标注:一键将检测结果转为YOLO格式标注文件

  • 用户友好界面:直观的图形界面,无需编程经验即可使用

  • 批量处理能力:支持批量图片/视频的自动检测与标注

  • 灵活的输出格式:支持YOLO格式、VOC格式和COCO格式导出

2 技术栈与架构

2.1 核心技术

  • YOLOv8:Ultralytics公司的最新目标检测模型,平衡速度与精度

  • PyTorch:深度学习框架,支持GPU加速

  • OpenCV:图像处理和视频I/O

  • PyQt5:构建现代图形用户界面

  • SQLite:轻量级数据库存储标注结果和历史记录

2.2 系统架构

┌─────────────────────────────────┐
│                              用户界面层                                     │
│  (PyQt5实现:主窗口、控制面板、结果显示区)         │
├─────────────────────────────────┤
│                              应用逻辑层                                    │
│  (检测控制器、标注管理器、文件处理器)                   │
├─────────────────────────────────┤
│                         计算机视觉引擎层                               │
│  (YOLOv8检测器、OpenCV视频处理、图像预处理)  │
├─────────────────────────────────┤
│                              数据存储层                                     │
│  (标注文件存储、模型缓存、历史记录数据库)            │
└─────────────────────────────────┘

3 核心代码实现

3.1 YOLOv8检测器封装

import cv2
import torch
from ultralytics import YOLO
import numpy as np
from pathlib import Path
from typing import List, Tuple, Dict, Union
import json

class YOLOv8Detector:
    """YOLOv8检测器封装类"""
    
    def __init__(self, model_path: str = 'yolov8n.pt', device: str = 'cuda'):
        """
        初始化YOLOv8检测器
        
        Args:
            model_path: 模型路径,可以是官方模型或自定义训练模型
            device: 计算设备,'cuda'或'cpu'
        """
        self.device = device if torch.cuda.is_available() and device == 'cuda' else 'cpu'
        self.model = YOLO(model_path)
        self.model.to(self.device)
        
        # COCO数据集类别名称(共80类)
        self.class_names = [
            'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck',
            'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench',
            'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra',
            'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
            'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove',
            'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup',
            'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange',
            'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
            'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse',
            'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink',
            'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier',
            'toothbrush'
        ]
        
    def detect(self, image: np.ndarray, conf_threshold: float = 0.25, 
               iou_threshold: float = 0.45) -> List[Dict]:
        """
        执行目标检测
        
        Args:
            image: 输入图像 (BGR格式)
            conf_threshold: 置信度阈值
            iou_threshold: IoU阈值
            
        Returns:
            检测结果列表,每个元素包含bbox、置信度和类别
        """
        # 转换BGR到RGB
        image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        
        # 执行推理
        results = self.model.predict(
            source=image_rgb,
            conf=conf_threshold,
            iou=iou_threshold,
            device=self.device,
            verbose=False
        )
        
        # 解析结果
        detections = []
        result = results[0]
        
        if result.boxes is not None:
            boxes = result.boxes.xyxy.cpu().numpy()  # 边界框 (x1, y1, x2, y2)
            confidences = result.boxes.conf.cpu().numpy()  # 置信度
            class_ids = result.boxes.cls.cpu().numpy().astype(int)  # 类别ID
            
            for box, conf, cls_id in zip(boxes, confidences, class_ids):
                detection = {
                    'bbox': box.tolist(),  # [x1, y1, x2, y2]
                    'confidence': float(conf),
                    'class_id': int(cls_id),
                    'class_name': self.class_names[cls_id] if cls_id < len(self.class_names) else f'class_{cls_id}'
                }
                detections.append(detection)
        
        return detections
    
    def detect_and_annotate(self, image: np.ndarray, conf_threshold: float = 0.25) -> Tuple[np.ndarray, List[Dict]]:
        """
        检测并在图像上绘制标注框
        
        Args:
            image: 输入图像
            conf_threshold: 置信度阈值
            
        Returns:
            annotated_image: 标注后的图像
            detections: 检测结果
        """
        detections = self.detect(image, conf_threshold)
        annotated_image = self.draw_detections(image, detections)
        
        return annotated_image, detections
    
    def draw_detections(self, image: np.ndarray, detections: List[Dict]) -> np.ndarray:
        """
        在图像上绘制检测框
        
        Args:
            image: 原始图像
            detections: 检测结果
            
        Returns:
            绘制了检测框的图像
        """
        annotated_image = image.copy()
        
        # 定义颜色映射
        colors = [
            (255, 0, 0), (0, 255, 0), (0, 0, 255), (255, 255, 0),
            (255, 0, 255), (0, 255, 255), (128, 0, 0), (0, 128, 0)
        ]
        
        for det in detections:
            bbox = det['bbox']
            class_name = det['class_name']
            confidence = det['confidence']
            class_id = det['class_id']
            
            # 选择颜色
            color = colors[class_id % len(colors)]
            
            # 绘制边界框
            x1, y1, x2, y2 = map(int, bbox)
            cv2.rectangle(annotated_image, (x1, y1), (x2, y2), color, 2)
            
            # 绘制标签背景
            label = f"{class_name}: {confidence:.2f}"
            (label_width, label_height), baseline = cv2.getTextSize(
                label, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 1
            )
            
            cv2.rectangle(
                annotated_image,
                (x1, y1 - label_height - baseline - 5),
                (x1 + label_width, y1),
                color,
                -1
            )
            
            # 绘制标签文本
            cv2.putText(
                annotated_image,
                label,
                (x1, y1 - baseline - 5),
                cv2.FONT_HERSHEY_SIMPLEX,
                0.5,
                (255, 255, 255),
                1
            )
        
        return annotated_image

3.2 自动标注与格式转换

class AutoAnnotator:
    """自动标注器,将检测结果转换为各种标注格式"""
    
    def __init__(self, class_names: List[str]):
        """
        初始化自动标注器
        
        Args:
            class_names: 类别名称列表
        """
        self.class_names = class_names
        self.class_name_to_id = {name: idx for idx, name in enumerate(class_names)}
    
    def detections_to_yolo_format(self, detections: List[Dict], 
                                  image_width: int, image_height: int) -> List[str]:
        """
        将检测结果转换为YOLO格式
        
        YOLO格式:class_id center_x center_y width height
        所有坐标都是归一化的 (0-1)
        
        Args:
            detections: 检测结果
            image_width: 图像宽度
            image_height: 图像高度
            
        Returns:
            YOLO格式标注行列表
        """
        yolo_lines = []
        
        for det in detections:
            bbox = det['bbox']
            class_id = det['class_id']
            
            # 边界框坐标
            x1, y1, x2, y2 = bbox
            
            # 计算中心点和宽高
            center_x = (x1 + x2) / 2 / image_width
            center_y = (y1 + y2) / 2 / image_height
            width = (x2 - x1) / image_width
            height = (y2 - y1) / image_height
            
            # 确保坐标在有效范围内
            center_x = max(0, min(1, center_x))
            center_y = max(0, min(1, center_y))
            width = max(0, min(1, width))
            height = max(0, min(1, height))
            
            yolo_line = f"{class_id} {center_x:.6f} {center_y:.6f} {width:.6f} {height:.6f}"
            yolo_lines.append(yolo_line)
        
        return yolo_lines
    
    def detections_to_voc_format(self, detections: List[Dict], 
                                 image_filename: str, image_width: int, 
                                 image_height: int) -> str:
        """
        将检测结果转换为VOC格式XML
        
        Args:
            detections: 检测结果
            image_filename: 图像文件名
            image_width: 图像宽度
            image_height: 图像高度
            
        Returns:
            VOC格式XML字符串
        """
        from xml.etree.ElementTree import Element, SubElement, tostring
        from xml.dom import minidom
        
        # 创建根元素
        root = Element('annotation')
        
        # 添加文件夹和文件名
        folder = SubElement(root, 'folder')
        folder.text = 'annotations'
        
        filename = SubElement(root, 'filename')
        filename.text = image_filename
        
        # 添加图像尺寸
        size = SubElement(root, 'size')
        width_elem = SubElement(size, 'width')
        width_elem.text = str(image_width)
        height_elem = SubElement(size, 'height')
        height_elem.text = str(image_height)
        depth_elem = SubElement(size, 'depth')
        depth_elem.text = '3'
        
        # 添加每个检测对象
        for det in detections:
            bbox = det['bbox']
            class_name = det['class_name']
            
            obj = SubElement(root, 'object')
            
            name = SubElement(obj, 'name')
            name.text = class_name
            
            # 添加边界框
            bndbox = SubElement(obj, 'bndbox')
            xmin = SubElement(bndbox, 'xmin')
            xmin.text = str(int(bbox[0]))
            ymin = SubElement(bndbox, 'ymin')
            ymin.text = str(int(bbox[1]))
            xmax = SubElement(bndbox, 'xmax')
            xmax.text = str(int(bbox[2]))
            ymax = SubElement(bndbox, 'ymax')
            ymax.text = str(int(bbox[3]))
        
        # 格式化XML
        xml_str = tostring(root, 'utf-8')
        parsed_xml = minidom.parseString(xml_str)
        pretty_xml = parsed_xml.toprettyxml(indent="  ")
        
        return pretty_xml
    
    def save_yolo_annotations(self, detections: List[Dict], image_path: str, 
                              output_dir: str):
        """
        保存YOLO格式标注文件
        
        Args:
            detections: 检测结果
            image_path: 图像路径
            output_dir: 输出目录
        """
        # 读取图像获取尺寸
        image = cv2.imread(str(image_path))
        if image is None:
            raise ValueError(f"无法读取图像: {image_path}")
        
        image_height, image_width = image.shape[:2]
        
        # 转换为YOLO格式
        yolo_lines = self.detections_to_yolo_format(detections, image_width, image_height)
        
        # 保存标注文件
        output_path = Path(output_dir) / f"{Path(image_path).stem}.txt"
        
        with open(output_path, 'w') as f:
            for line in yolo_lines:
                f.write(line + '\n')
    
    def save_voc_annotations(self, detections: List[Dict], image_path: str, 
                             output_dir: str):
        """
        保存VOC格式标注文件
        
        Args:
            detections: 检测结果
            image_path: 图像路径
            output_dir: 输出目录
        """
        # 读取图像获取尺寸
        image = cv2.imread(str(image_path))
        if image is None:
            raise ValueError(f"无法读取图像: {image_path}")
        
        image_height, image_width = image.shape[:2]
        
        # 转换为VOC格式
        image_filename = Path(image_path).name
        voc_xml = self.detections_to_voc_format(detections, image_filename, 
                                               image_width, image_height)
        
        # 保存XML文件
        output_path = Path(output_dir) / f"{Path(image_path).stem}.xml"
        
        with open(output_path, 'w', encoding='utf-8') as f:
            f.write(voc_xml)

3.3 PyQt5图形界面实现

import sys
from PyQt5.QtWidgets import (QApplication, QMainWindow, QWidget, QVBoxLayout, 
                             QHBoxLayout, QLabel, QPushButton, QFileDialog,
                             QComboBox, QSlider, QSpinBox, QCheckBox, QTextEdit,
                             QTabWidget, QGroupBox, QProgressBar, QMessageBox,
                             QListWidget, QSplitter, QAction, QMenuBar, QMenu)
from PyQt5.QtCore import Qt, QThread, pyqtSignal, pyqtSlot, QTimer
from PyQt5.QtGui import QImage, QPixmap, QIcon, QFont
import qdarkstyle

class DetectionThread(QThread):
    """检测线程,避免界面卡顿"""
    
    detection_finished = pyqtSignal(object, object)  # 信号:检测完成
    progress_updated = pyqtSignal(int)  # 信号:进度更新
    
    def __init__(self, detector, image_paths, output_dir, conf_threshold):
        super().__init__()
        self.detector = detector
        self.image_paths = image_paths
        self.output_dir = output_dir
        self.conf_threshold = conf_threshold
        self.annotator = AutoAnnotator(detector.class_names)
        self.is_running = True
        
    def run(self):
        """执行批量检测"""
        total_images = len(self.image_paths)
        
        for idx, image_path in enumerate(self.image_paths):
            if not self.is_running:
                break
                
            # 读取图像
            image = cv2.imread(str(image_path))
            if image is None:
                continue
            
            # 执行检测
            detections = self.detector.detect(image, self.conf_threshold)
            
            # 保存标注
            if self.output_dir:
                self.annotator.save_yolo_annotations(detections, image_path, self.output_dir)
            
            # 绘制标注框
            annotated_image = self.detector.draw_detections(image, detections)
            
            # 发射信号
            self.detection_finished.emit(annotated_image, detections)
            self.progress_updated.emit(int((idx + 1) / total_images * 100))
    
    def stop(self):
        """停止检测"""
        self.is_running = False

class MainWindow(QMainWindow):
    """主窗口"""
    
    def __init__(self):
        super().__init__()
        self.detector = None
        self.current_image = None
        self.current_detections = []
        self.detection_thread = None
        
        self.init_ui()
        self.init_detector()
        
    def init_ui(self):
        """初始化用户界面"""
        self.setWindowTitle("YOLOv8目标检测与自动标注工具")
        self.setGeometry(100, 100, 1400, 800)
        
        # 设置应用样式
        self.setStyleSheet(qdarkstyle.load_stylesheet_pyqt5())
        
        # 创建中心部件
        central_widget = QWidget()
        self.setCentralWidget(central_widget)
        
        # 主布局
        main_layout = QHBoxLayout(central_widget)
        
        # 左侧控制面板
        control_panel = self.create_control_panel()
        main_layout.addWidget(control_panel, 1)
        
        # 右侧显示区域
        display_panel = self.create_display_panel()
        main_layout.addWidget(display_panel, 3)
        
        # 创建菜单栏
        self.create_menu_bar()
        
    def create_control_panel(self):
        """创建控制面板"""
        panel = QWidget()
        layout = QVBoxLayout(panel)
        
        # 模型选择组
        model_group = QGroupBox("模型设置")
        model_layout = QVBoxLayout()
        
        self.model_combo = QComboBox()
        self.model_combo.addItems(["yolov8n.pt", "yolov8s.pt", "yolov8m.pt", "yolov8l.pt", "yolov8x.pt"])
        self.model_combo.setCurrentIndex(0)
        model_layout.addWidget(QLabel("选择模型:"))
        model_layout.addWidget(self.model_combo)
        
        self.device_combo = QComboBox()
        self.device_combo.addItems(["自动选择", "CPU", "GPU (CUDA)"])
        self.device_combo.setCurrentIndex(0)
        model_layout.addWidget(QLabel("计算设备:"))
        model_layout.addWidget(self.device_combo)
        
        model_group.setLayout(model_layout)
        layout.addWidget(model_group)
        
        # 检测参数组
        params_group = QGroupBox("检测参数")
        params_layout = QVBoxLayout()
        
        # 置信度阈值
        params_layout.addWidget(QLabel("置信度阈值:"))
        self.conf_slider = QSlider(Qt.Horizontal)
        self.conf_slider.setRange(10, 90)
        self.conf_slider.setValue(25)
        self.conf_slider.valueChanged.connect(self.update_conf_label)
        params_layout.addWidget(self.conf_slider)
        
        self.conf_label = QLabel("0.25")
        params_layout.addWidget(self.conf_label)
        
        params_group.setLayout(params_layout)
        layout.addWidget(params_group)
        
        # 输入源组
        input_group = QGroupBox("输入源")
        input_layout = QVBoxLayout()
        
        self.btn_open_image = QPushButton("打开图片")
        self.btn_open_image.clicked.connect(self.open_image)
        input_layout.addWidget(self.btn_open_image)
        
        self.btn_open_folder = QPushButton("打开文件夹")
        self.btn_open_folder.clicked.connect(self.open_folder)
        input_layout.addWidget(self.btn_open_folder)
        
        self.btn_open_video = QPushButton("打开视频")
        self.btn_open_video.clicked.connect(self.open_video)
        input_layout.addWidget(self.btn_open_video)
        
        self.btn_open_camera = QPushButton("打开摄像头")
        self.btn_open_camera.clicked.connect(self.open_camera)
        input_layout.addWidget(self.btn_open_camera)
        
        input_group.setLayout(input_layout)
        layout.addWidget(input_group)
        
        # 标注设置组
        annotation_group = QGroupBox("标注设置")
        annotation_layout = QVBoxLayout()
        
        self.btn_save_annotations = QPushButton("保存当前标注")
        self.btn_save_annotations.clicked.connect(self.save_annotations)
        annotation_layout.addWidget(self.btn_save_annotations)
        
        self.format_combo = QComboBox()
        self.format_combo.addItems(["YOLO格式 (.txt)", "VOC格式 (.xml)", "COCO格式 (.json)"])
        annotation_layout.addWidget(QLabel("标注格式:"))
        annotation_layout.addWidget(self.format_combo)
        
        annotation_group.setLayout(annotation_layout)
        layout.addWidget(annotation_group)
        
        # 进度条
        self.progress_bar = QProgressBar()
        layout.addWidget(self.progress_bar)
        
        # 日志区域
        log_group = QGroupBox("日志")
        log_layout = QVBoxLayout()
        
        self.log_text = QTextEdit()
        self.log_text.setMaximumHeight(150)
        self.log_text.setReadOnly(True)
        log_layout.addWidget(self.log_text)
        
        log_group.setLayout(log_layout)
        layout.addWidget(log_group)
        
        layout.addStretch()
        
        return panel
    
    def create_display_panel(self):
        """创建显示面板"""
        panel = QWidget()
        layout = QVBoxLayout(panel)
        
        # 图像显示标签
        self.image_label = QLabel()
        self.image_label.setAlignment(Qt.AlignCenter)
        self.image_label.setMinimumSize(800, 600)
        self.image_label.setStyleSheet("border: 2px solid #505050; background-color: #1e1e1e;")
        layout.addWidget(self.image_label)
        
        # 检测结果统计
        stats_group = QGroupBox("检测统计")
        stats_layout = QHBoxLayout()
        
        self.total_objects_label = QLabel("检测对象: 0")
        self.total_objects_label.setFont(QFont("Arial", 10, QFont.Bold))
        stats_layout.addWidget(self.total_objects_label)
        
        self.confidence_label = QLabel("平均置信度: 0.00")
        stats_layout.addWidget(self.confidence_label)
        
        stats_group.setLayout(stats_layout)
        layout.addWidget(stats_group)
        
        # 检测结果列表
        results_group = QGroupBox("检测结果")
        results_layout = QVBoxLayout()
        
        self.results_list = QListWidget()
        results_layout.addWidget(self.results_list)
        
        results_group.setLayout(results_layout)
        layout.addWidget(results_group)
        
        return panel
    
    def create_menu_bar(self):
        """创建菜单栏"""
        menubar = self.menuBar()
        
        # 文件菜单
        file_menu = menubar.addMenu("文件")
        
        open_image_action = QAction("打开图片", self)
        open_image_action.triggered.connect(self.open_image)
        file_menu.addAction(open_image_action)
        
        open_folder_action = QAction("批量处理文件夹", self)
        open_folder_action.triggered.connect(self.open_folder)
        file_menu.addAction(open_folder_action)
        
        file_menu.addSeparator()
        
        exit_action = QAction("退出", self)
        exit_action.triggered.connect(self.close)
        file_menu.addAction(exit_action)
        
        # 工具菜单
        tool_menu = menubar.addMenu("工具")
        
        settings_action = QAction("设置", self)
        tool_menu.addAction(settings_action)
        
        # 帮助菜单
        help_menu = menubar.addMenu("帮助")
        
        about_action = QAction("关于", self)
        about_action.triggered.connect(self.show_about)
        help_menu.addAction(about_action)
    
    def init_detector(self):
        """初始化检测器"""
        try:
            model_path = self.model_combo.currentText()
            device = "cuda" if self.device_combo.currentText() == "GPU (CUDA)" else "cpu"
            self.detector = YOLOv8Detector(model_path, device)
            self.log_message("检测器初始化成功!")
        except Exception as e:
            self.log_message(f"检测器初始化失败: {str(e)}")
            QMessageBox.critical(self, "错误", f"无法初始化检测器: {str(e)}")
    
    def open_image(self):
        """打开图片"""
        file_path, _ = QFileDialog.getOpenFileName(
            self, "选择图片", "", 
            "图片文件 (*.jpg *.jpeg *.png *.bmp *.tiff)"
        )
        
        if file_path:
            self.process_image(file_path)
    
    def process_image(self, image_path):
        """处理单张图片"""
        try:
            # 读取图像
            self.current_image = cv2.imread(image_path)
            if self.current_image is None:
                raise ValueError("无法读取图像")
            
            # 显示原始图像
            self.display_image(self.current_image)
            
            # 执行检测
            conf_threshold = self.conf_slider.value() / 100.0
            annotated_image, detections = self.detector.detect_and_annotate(
                self.current_image, conf_threshold
            )
            
            # 显示检测结果
            self.display_image(annotated_image)
            self.update_detection_results(detections)
            
            self.log_message(f"处理完成: {Path(image_path).name}, 检测到 {len(detections)} 个对象")
            
        except Exception as e:
            self.log_message(f"处理失败: {str(e)}")
            QMessageBox.critical(self, "错误", f"处理图像时出错: {str(e)}")
    
    def display_image(self, image):
        """在界面上显示图像"""
        # 转换颜色空间 BGR -> RGB
        image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        
        # 调整图像大小以适应显示区域
        h, w, ch = image_rgb.shape
        target_width = self.image_label.width() - 10
        target_height = self.image_label.height() - 10
        
        if w > target_width or h > target_height:
            scale = min(target_width / w, target_height / h)
            new_w = int(w * scale)
            new_h = int(h * scale)
            image_rgb = cv2.resize(image_rgb, (new_w, new_h))
            h, w = new_h, new_w
        
        # 创建QImage
        bytes_per_line = ch * w
        qt_image = QImage(image_rgb.data, w, h, bytes_per_line, QImage.Format_RGB888)
        
        # 显示图像
        self.image_label.setPixmap(QPixmap.fromImage(qt_image))
    
    def update_detection_results(self, detections):
        """更新检测结果显示"""
        self.current_detections = detections
        self.results_list.clear()
        
        if not detections:
            self.total_objects_label.setText("检测对象: 0")
            self.confidence_label.setText("平均置信度: 0.00")
            return
        
        # 更新统计信息
        total_objects = len(detections)
        avg_confidence = sum(d['confidence'] for d in detections) / total_objects
        
        self.total_objects_label.setText(f"检测对象: {total_objects}")
        self.confidence_label.setText(f"平均置信度: {avg_confidence:.2f}")
        
        # 更新结果列表
        for det in detections:
            class_name = det['class_name']
            confidence = det['confidence']
            bbox = det['bbox']
            
            item_text = f"{class_name}: {confidence:.2f} [{bbox[0]:.0f}, {bbox[1]:.0f}, {bbox[2]:.0f}, {bbox[3]:.0f}]"
            self.results_list.addItem(item_text)
    
    def log_message(self, message):
        """添加日志消息"""
        from datetime import datetime
        timestamp = datetime.now().strftime("%H:%M:%S")
        self.log_text.append(f"[{timestamp}] {message}")
    
    def update_conf_label(self):
        """更新置信度阈值显示"""
        conf_value = self.conf_slider.value() / 100.0
        self.conf_label.setText(f"{conf_value:.2f}")
    
    def save_annotations(self):
        """保存标注文件"""
        if not self.current_detections:
            QMessageBox.warning(self, "警告", "没有检测结果可以保存")
            return
        
        # 选择保存目录
        save_dir = QFileDialog.getExistingDirectory(self, "选择保存目录")
        if not save_dir:
            return
        
        try:
            # 根据选择的格式保存标注
            format_index = self.format_combo.currentIndex()
            
            if format_index == 0:  # YOLO格式
                # 这里需要当前图像路径,简化处理,使用临时文件
                temp_path = Path(save_dir) / "temp_image.jpg"
                cv2.imwrite(str(temp_path), self.current_image)
                
                annotator = AutoAnnotator(self.detector.class_names)
                annotator.save_yolo_annotations(
                    self.current_detections, temp_path, save_dir
                )
                
                # 删除临时文件
                temp_path.unlink()
                
                self.log_message(f"YOLO格式标注已保存到: {save_dir}")
                
            elif format_index == 1:  # VOC格式
                # 类似处理
                pass
            
            QMessageBox.information(self, "成功", "标注文件保存成功!")
            
        except Exception as e:
            self.log_message(f"保存失败: {str(e)}")
            QMessageBox.critical(self, "错误", f"保存标注文件时出错: {str(e)}")
    
    def show_about(self):
        """显示关于对话框"""
        about_text = """
        <h2>YOLOv8目标检测与自动标注工具</h2>
        <p>版本: 1.0.0</p>
        <p>基于Ultralytics YOLOv8构建</p>
        <p>功能:</p>
        <ul>
            <li>支持上百类目标检测</li>
            <li>图片/视频/摄像头输入</li>
            <li>自动标注与格式转换</li>
            <li>批量处理功能</li>
        </ul>
        <p>© 2024 技术博客. 保留所有权利.</p>
        """
        QMessageBox.about(self, "关于", about_text)

def main():
    """主函数"""
    app = QApplication(sys.argv)
    app.setStyle('Fusion')
    
    window = MainWindow()
    window.show()
    
    sys.exit(app.exec_())

if __name__ == "__main__":
    main()

4 使用教程

4.1 环境配置

# 创建虚拟环境
python -m venv yolov8_env
source yolov8_env/bin/activate  # Linux/Mac
# 或
yolov8_env\Scripts\activate  # Windows

# 安装依赖
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install ultralytics opencv-python pyqt5 qdarkstyle

4.2 快速开始

  1. 启动应用:运行 python main.py

  2. 选择模型:从下拉菜单中选择YOLOv8模型(n/s/m/l/x)

  3. 打开图像:点击“打开图片”选择要检测的图像

  4. 调整参数:通过滑块调整置信度阈值

  5. 查看结果:检测结果会显示在图像上和结果列表中

  6. 保存标注:点击“保存当前标注”选择保存格式和目录

4.3 批量处理

对于数据集标注,可以使用批量处理功能:

  • 点击"打开文件夹"选择包含多张图像的目录

  • 软件会自动检测所有图像并保存标注文件

  • 支持中断和恢复功能

5 高级功能扩展

5.1 自定义模型训练

from ultralytics import YOLO

# 加载预训练模型
model = YOLO('yolov8n.pt')

# 训练自定义数据集
results = model.train(
    data='custom_dataset.yaml',
    epochs=100,
    imgsz=640,
    batch=16,
    name='custom_model'
)

# 导出模型
model.export(format='onnx')

5.2 视频流处理

class VideoProcessor:
    """视频处理器"""
    
    def process_video(self, video_path, detector, output_path=None):
        cap = cv2.VideoCapture(video_path)
        fps = int(cap.get(cv2.CAP_PROP_FPS))
        
        # 创建视频写入器
        if output_path:
            fourcc = cv2.VideoWriter_fourcc(*'mp4v')
            out = cv2.VideoWriter(output_path, fourcc, fps, 
                                 (int(cap.get(3)), int(cap.get(4))))
        
        while cap.isOpened():
            ret, frame = cap.read()
            if not ret:
                break
            
            # 检测和标注
            annotated_frame, detections = detector.detect_and_annotate(frame)
            
            # 写入输出视频
            if output_path:
                out.write(annotated_frame)
            
            # 显示实时结果
            cv2.imshow('Video Detection', annotated_frame)
            if cv2.waitKey(1) & 0xFF == ord('q'):
                break
        
        cap.release()
        if output_path:
            out.release()
        cv2.destroyAllWindows()

6 性能优化建议

  1. GPU加速:确保安装CUDA版本的PyTorch

  2. 批处理:对于批量图像,使用批处理提高推理速度

  3. 模型量化:使用INT8量化减少模型大小和提高推理速度

  4. TensorRT部署:对于生产环境,使用TensorRT进一步优化

7 总结与展望

本文介绍了一个完整的基于YOLOv8的多目标检测与自动标注工具。该工具不仅提供了强大的检测能力,还通过自动标注功能大大简化了数据准备流程。关键特性包括:

  • 即用型图形界面:无需编程经验即可使用

  • 多格式支持:支持YOLO、VOC和COCO标注格式

  • 高效批处理:快速处理大规模数据集

  • 灵活扩展:支持自定义模型和类别

未来可能的改进方向:

  1. 集成更多的检测模型(DETR、RT-DETR等)

  2. 添加半自动标注功能,支持人工修正

  3. 集成云端存储和协作标注功能

  4. 添加模型性能评估和可视化工具

通过这个工具,快速构建自己的目标检测系统,加速计算机视觉项目的开发流程。

Logo

魔乐社区(Modelers.cn) 是一个中立、公益的人工智能社区,提供人工智能工具、模型、数据的托管、展示与应用协同服务,为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作,由全产业链共同建设、共同运营、共同享有,推动国产AI生态繁荣发展。

更多推荐