yolov5训练网络数据集
1.1 下载网络数据集本次使用IP102(下附数据集原作者)这是一个害虫数据集,用于农业防治。下面是解压缩以后的文件树首先我们看VOC2007这个文件夹这里面有三个文件:1.训练集与测试集的图片编号第一个文件是:test.txt是测试集的图片编号所在文件同理,trainval.txt是训练集所在文件2.xml也就是目标检测中那个框和标签所在文件,注意此时是测试集和训练集都在一起此时一张图片一个xm
1.数据集下载与简介
1.1 下载网络数据集
本次使用IP102(下附数据集原作者)
Wu, Xiaoping, et al. "IP102: A Large-Scale Benchmark Dataset for Insect Pest Recognition." IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 8787-8796.
这是一个害虫数据集,用于农业防治。
下载资源包:【免费】人工智能+害虫检测+yolov5资源-CSDN文库
下面是解压缩以后的文件树
Detection-20241229T163316Z-001
└─Detection
└─VOC2007
└─ImageSets
└─Main
首先我们看VOC2007这个文件夹
这里面有三个文件:
1.训练集与测试集的图片编号

第一个文件是:test.txt是测试集的图片编号所在文件同理,trainval.txt是训练集所在文件
2.xml也就是目标检测中那个框和标签所在文件,注意此时是测试集和训练集都在一起

此时一张图片一个xml文件等下要依据此进行变换
3.图片存放地,所有图片都在这里(包括训练集和测试集)

2.数据集处理与变换
2.1 标签处理(说白了xml转为txt文件的变换)
2.1.1首先打开pycharm

打开以后就是:下图所示文件夹

首先建立一个yolo_txt文件夹 和transform.py,如下图所见。

这是建立好的文件树:

然后书写transform.py文件
import xml.etree.ElementTree as ET
import pickle
import os
from os import listdir, getcwd
from os.path import join
def convert(size, box):
# size=(width, height) b=(xmin, xmax, ymin, ymax)
# x_center = (xmax+xmin)/2 y_center = (ymax+ymin)/2
# x = x_center / width y = y_center / height
# w = (xmax-xmin) / width h = (ymax-ymin) / height
x_center = (box[0] + box[1]) / 2.0
y_center = (box[2] + box[3]) / 2.0
x = x_center / size[0]
y = y_center / size[1]
w = (box[1] - box[0]) / size[0]
h = (box[3] - box[2]) / size[1]
# print(x, y, w, h)
return (x, y, w, h)
def convert_annotation(xml_files_path, save_txt_files_path, classes):
xml_files = os.listdir(xml_files_path)
step = 0
# print(xml_files)
for xml_name in xml_files:
# print(xml_name)
xml_file = os.path.join(xml_files_path, xml_name)
out_txt_path = os.path.join(save_txt_files_path, xml_name.split('.')[0] + '.txt')
out_txt_f = open(out_txt_path, 'w')
tree = ET.parse(xml_file)
root = tree.getroot()
size = root.find('size')
w = int(size.find('width').text)
h = int(size.find('height').text)
for obj in root.iter('object'):
difficult = obj.find('difficult').text
cls = obj.find('name').text
cls_class = int(cls)
cls_class = classes[cls_class]
if cls == '86':
step= step + 1
if step == 308:
print(step)
if cls_class not in classes or int(difficult) == 1:
continue
cls_id = classes.index(cls_class)
xmlbox = obj.find('bndbox')
b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text),
float(xmlbox.find('ymax').text))
# b=(xmin, xmax, ymin, ymax)
print(w, h, b)
bb = convert((w, h), b)
out_txt_f.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')
if __name__ == "__main__":
# 测试程序
# classes = ['hard_hat', 'other', 'regular', 'long_hair', 'braid', 'bald', 'beard']
# xml_files = r'D:\ZF\1_ZF_proj\3_脚本程序\2_voc格式转yolo格式\voc_labels'
# save_txt_files = r'D:\ZF\1_ZF_proj\3_脚本程序\2_voc格式转yolo格式\yolo_labels'
# convert_annotation(xml_files, save_txt_files, classes)
# ====================================================================================================
# 把帽子头发胡子的voc的xml标签文件转化为yolo的txt标签文件
# 1、帽子头发胡子的类别
classes1 = ['riceleafroller','riceleafcaterpillar','paddystemmaggot','asiaticriceborer','yellowriceborer','ricegallmidge','RiceStemfly',
'brownplanthopper','whitebackedplanthopper','smallbrownplanthopper',
'ricewaterweevil','riceleafhopper','grainspreaderthrips','riceshellpest','grub',
'molecricket','wireworm','whitemarginedmoth','blackcutworm','largecutworm','yellowcutworm',
'redspider','cornborer','armyworm','aphids','Potosiabrevitarsis','peachborer','englishgrainaphid','greenbug',
'birdcherry-oataphid','wheatblossommidge','penthaleusmajor','longleggedspidermite','wheatphloeothrips','wheatsawfly',
'cerodontadenticornis','beetfly','fleabeetle','cabbagearmyworm','beetarmyworm','Beetspotflies','meadowmoth','beetweevil',
'sericaorientalismotschulsky','alfalfaweevil','flaxbudworm','alfalfaplantbug','tarnishedplantbug','Locustoidea','lyttapolita','legumeblisterbeetle',
'blisterbeetle','therioaphismaculataBuckton','odontothripsloti','Thrips','alfalfaseedchalcid',
'Pieriscanidia','Apolyguslucorum','Limacodidae','Viteusvitifoliae','Colomerusvitis','BrevipoalpuslewisiMcGregor','oidesdecempunctata',
'Polyphagotarsonemuslatus','PseudococcuscomstockiKuwana','parathreneregalis','Ampelophaga','Lycormadelicatula','Xylotrechus','Cicadellaviridis','Miridae',
'Trialeurodesvaporariorum','Erythroneuraapicalis','Papilioxuthus','PanonchuscitriMcGregor','Phyllocoptesoleiverusashmead','IceryapurchasiMaskell','Unaspisyanonensis','Ceroplastesrubens',
'Chrysomphalusaonidum','ParlatoriazizyphusLucus','Nipaecoccusvastalor','Aleurocanthusspiniferus','TetradacuscBactroceraminax','Dacusdorsalis(Hendel)','Bactroceratsuneonis','Prodenialitura',
'Adristyrannus','PhyllocnistiscitrellaStainton','Toxopteracitricidus','Toxopteraaurantii','AphiscitricolaVanderGoot','ScirtothripsdorsalisHood','Dasineurasp','LawanaimitataMelichar','SalurnismarginellaGuerr','DeporausmarginatusPascoe','Chlumetiatransversa','Mangoflatbeakleafhopper',
'Rhytidoderabowriniiwhite','Sternochetusfrigidus','Cicadellidae']
print(len(classes1))
# 2、voc格式的xml标签文件路径
xml_files1 = r'Annotations/Annotations'
# 3、转化为yolo格式的txt标签文件存储路径
save_txt_files1 = r'Annotations/yolo_txt'
convert_annotation(xml_files1, save_txt_files1, classes1)
如果和我建立的一样的话这里就是下图

2.1.2 解决文件错误
经过测试发现有如下几处错误:
错误1:文件有误(报错信息如下)

错误1解决方式:
第一步:打开yolo_txt,找到最后一个txt文件并复制相关编号。
第二步:打开下图文件夹并找到对应的xml文件所在位置

第三步:所在位置如下图,发现这里面多了txt文件

第四步:如下图操作

第五步:点入

搜索框搜索.txt,将出现的所有全删了

解决完错误1后,发现错误2
错误2:文件内容错误(同理打开yolo_txt,找到最后一个txt文件并复制相关编号,找到xml文件)如下图:

步骤1:删除多余内容
将下面的内容删除,这里面两个内容重复了,删除一个就行,再次运行transform.py
<annotation verified="no">
<folder>IP103_final_new3</folder>
<filename>IP087000986</filename>
<path>C:\Users\dell\Desktop\IP103_final_new3\IP087000986.jpg</path>
<source>
<database>Unknown</database>
</source>
<size>
<width>392</width>
<height>280</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>86</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>33</xmin>
<ymin>61</ymin>
<xmax>388</xmax>
<ymax>208</ymax>
</bndbox>
</object>
</annotation>
2.2 文件分类(按照yolo格式)
建立下图所示文件夹:(在voc2007层级下)

文件树如下:

在VOC2007文件夹下建立classfication_labels.py和classfication_images.py(这两文件内容不一样)如下图:

classfication_labels.py
import os
import shutil
# 定义源文件夹和目标文件夹的路径
source_folder = 'Annotations/yolo_txt' # 源文件夹,包含需要分类的文件
target_folder = 'mydata/labels' # 目标文件夹,用于存放分类后的文件
# 确保目标文件夹存在
if not os.path.exists(target_folder):
os.makedirs(target_folder)
# 创建trainval和test子文件夹
trainval_target_folder = os.path.join(target_folder, 'train')
test_target_folder = os.path.join(target_folder, 'test')
os.makedirs(trainval_target_folder, exist_ok=True)
os.makedirs(test_target_folder, exist_ok=True)
# 读取trainval.txt和test.txt文件
with open('ImageSets/Main/trainval.txt', 'r') as trainval_file, open('ImageSets/Main/test.txt', 'r') as test_file:
trainval_file_names = trainval_file.readlines()
test_file_names = test_file.readlines()
# 去除文件名两侧的空白字符,并构建完整的源文件路径和目标文件路径
trainval_file_names = [file_name.strip() for file_name in trainval_file_names]
test_file_names = [file_name.strip() for file_name in test_file_names]
# 复制文件到对应的目标文件夹
for file_name in trainval_file_names:
source_file_path = os.path.join(source_folder, file_name + '.txt') # 假设文件名与txt后缀直接相连
if os.path.exists(source_file_path):
target_file_path = os.path.join(trainval_target_folder, file_name + '.txt')
shutil.copy(source_file_path, target_file_path)
print(f"File {file_name}.txt copied to {trainval_target_folder}")
else:
print(f"File {file_name}.txt not found in {source_folder}")
for file_name in test_file_names:
source_file_path = os.path.join(source_folder, file_name + '.txt') # 假设文件名与txt后缀直接相连
if os.path.exists(source_file_path):
target_file_path = os.path.join(test_target_folder, file_name + '.txt')
shutil.copy(source_file_path, target_file_path)
print(f"File {file_name}.txt copied to {test_target_folder}")
else:
print(f"File {file_name}.txt not found in {source_folder}")
print("File classification completed.")
classfication_images.py
import os
import shutil
# 定义源文件夹和目标文件夹的路径
source_folder = 'JPEGImages/JPEGImages' # 源文件夹,包含需要分类的文件
target_folder = 'mydata/images' # 目标文件夹,用于存放分类后的文件
# 确保目标文件夹存在
if not os.path.exists(target_folder):
os.makedirs(target_folder)
# 创建trainval和test子文件夹
trainval_target_folder = os.path.join(target_folder, 'train')
test_target_folder = os.path.join(target_folder, 'test')
os.makedirs(trainval_target_folder, exist_ok=True)
os.makedirs(test_target_folder, exist_ok=True)
# 读取trainval.txt和test.txt文件
with open('ImageSets/Main/trainval.txt', 'r') as trainval_file, open('ImageSets/Main/test.txt', 'r') as test_file:
trainval_file_names = trainval_file.readlines()
test_file_names = test_file.readlines()
# 去除文件名两侧的空白字符,并构建完整的源文件路径和目标文件路径
trainval_file_names = [file_name.strip() for file_name in trainval_file_names]
test_file_names = [file_name.strip() for file_name in test_file_names]
# 复制文件到对应的目标文件夹
for file_name in trainval_file_names:
source_file_path = os.path.join(source_folder, file_name + '.jpg') # 假设文件名与txt后缀直接相连
if os.path.exists(source_file_path):
target_file_path = os.path.join(trainval_target_folder, file_name + '.jpg')
shutil.copy(source_file_path, target_file_path)
print(f"File {file_name}.jpg copied to {trainval_target_folder}")
else:
print(f"File {file_name}.jpg not found in {source_folder}")
for file_name in test_file_names:
source_file_path = os.path.join(source_folder, file_name + '.jpg') # 假设文件名与txt后缀直接相连
if os.path.exists(source_file_path):
target_file_path = os.path.join(test_target_folder, file_name + '.jpg')
shutil.copy(source_file_path, target_file_path)
print(f"File {file_name}.jpg copied to {test_target_folder}")
else:
print(f"File {file_name}.jpg not found in {source_folder}")
print("File classification completed.")
分类好以后就是下图

3.导入数据集以及yaml文件的编写
3.1 复制mydata整个文件到yolov5算法所在文件夹下
如下图

3.2 在data文件夹下建立myip102.yaml
如下图

myip102.yaml
train: mydata/images/train # images
val: mydata/images/test # test images
nc: 102
names: ['riceleafroller','riceleafcaterpillar','paddystemmaggot','asiaticriceborer','yellowriceborer','ricegallmidge','RiceStemfly',
'brownplanthopper','whitebackedplanthopper','smallbrownplanthopper','ricewaterweevil','riceleafhopper','grainspreaderthrips',
'riceshellpest','grub','molecricket','wireworm','whitemarginedmoth','blackcutworm','largecutworm','yellowcutworm',
'redspider','cornborer','armyworm','aphids','Potosiabrevitarsis','peachborer','englishgrainaphid','greenbug',
'birdcherry-oataphid','wheatblossommidge','penthaleusmajor','longleggedspidermite','wheatphloeothrips','wheatsawfly',
'cerodontadenticornis','beetfly','fleabeetle','cabbagearmyworm','beetarmyworm','Beetspotflies','meadowmoth','beetweevil',
'sericaorientalismotschulsky','alfalfaweevil','flaxbudworm','alfalfaplantbug','tarnishedplantbug','Locustoidea','lyttapolita',
'legumeblisterbeetle','blisterbeetle','therioaphismaculataBuckton','odontothripsloti','Thrips','alfalfaseedchalcid',
'Pieriscanidia','Apolyguslucorum','Limacodidae','Viteusvitifoliae','Colomerusvitis','BrevipoalpuslewisiMcGregor','oidesdecempunctata',
'Polyphagotarsonemuslatus','PseudococcuscomstockiKuwana','parathreneregalis','Ampelophaga','Lycormadelicatula','Xylotrechus','Cicadellaviridis','Miridae',
'Trialeurodesvaporariorum','Erythroneuraapicalis','Papilioxuthus','PanonchuscitriMcGregor','Phyllocoptesoleiverusashmead','IceryapurchasiMaskell',
'Unaspisyanonensis','Ceroplastesrubens','Chrysomphalusaonidum','ParlatoriazizyphusLucus','Nipaecoccusvastalor','Aleurocanthusspiniferus',
'TetradacuscBactroceraminax','Dacusdorsalis(Hendel)','Bactroceratsuneonis','Prodenialitura','Adristyrannus','PhyllocnistiscitrellaStainton','Toxopteracitricidus',
'Toxopteraaurantii','AphiscitricolaVanderGoot','ScirtothripsdorsalisHood','Dasineurasp','LawanaimitataMelichar','SalurnismarginellaGuerr','DeporausmarginatusPascoe',
'Chlumetiatransversa','Mangoflatbeakleafhopper','Rhytidoderabowriniiwhite','Sternochetusfrigidus','Cicadellidae']
4.训练

求教!!!:
在我训练了300轮左右我的模型还是很差,如果有更好的模型请在评论区和我商讨一下,感谢各位大佬,如果愿意指教我一下也感谢。
魔乐社区(Modelers.cn) 是一个中立、公益的人工智能社区,提供人工智能工具、模型、数据的托管、展示与应用协同服务,为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作,由全产业链共同建设、共同运营、共同享有,推动国产AI生态繁荣发展。
更多推荐


所有评论(0)