【Tensorflow】tf.image的部分方法详解+基于随机子空间方法（RSM）的图像增强

基于随机子空间方法（RSM）的图像增强在数据图像处理中，Random Erasing是指随机选择图像中的一个或者多个区域进行擦除的操作，擦除之后的图像是原图像的一个随机子空间RSM，其随机保留了图像样本的部分特征而不是全部特征。若结合集成学习的方法训练多个网络，可以在不进行数据增强的情况下增加训练样本的数量，并且提高模型的泛化能力。import matplotlib.pyplot as pltim

JinSu_

1033人浏览 · 2021-02-03 15:02:09

JinSu_ · 2021-02-03 15:02:09 发布

基于随机子空间方法（RSM）的图像增强

在数据图像处理中，Random Erasing是指随机选择图像中的一个或者多个区域进行擦除的操作，擦除之后的图像是原图像的一个随机子空间RSM，其随机保留了图像样本的部分特征而不是全部特征。

若结合集成学习的方法训练多个网络，可以在不进行数据增强的情况下增加训练样本的数量，并且提高模型的泛化能力。

import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
import cv2
import random

def RSM_DataGenerator(path,im_w=256, im_h=256,im_channels=3,divide=16,lowerlimit=0.97,upperlimit=1.0,postfix='bmp'):
    alpha = random.uniform(lowerlimit, upperlimit)
    print('random_ratio: ',alpha)
    print('N: ',divide * divide)
    part = int(divide * divide * alpha)
    index = np.arange(0, divide * divide)
    # print(index)
    random.shuffle(index)
    # print(index)

    kb = index[0:part]
    kb = np.sort(kb)
    print('kb: ')
    print(list(kb),sep=",")
    # kb=np.array([0, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 29, 30, 31, 32, 34, 35, 36, 37, 39, 40, 41, 42, 43, 44, 45, 46, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63])

    image = tf.io.read_file(path)  # 读取图片
    if postfix=='bmp':
        image = tf.image.decode_bmp(image, channels=im_channels)
    if postfix=='jpeg' or postfix=='jpg':
        image = tf.image.decode_jpeg(image, channels=im_channels)
        print(type(image))
    image = tf.image.resize(image, [im_h,im_w])
    # image = tf.cast(image, dtype=tf.float32)
    # print(type(image),image)
    # shape=image.get_shape().as_list()
    # print(shape)
    img_shape = image.shape
    print(img_shape,type(img_shape))
    rows = img_shape[0]
    cols = img_shape[1]
    w_cellsize = int(cols / divide)
    h_cellsize = int(rows / divide)
    img_list_h = []
    for i in range(0, divide):
        img_list_w = []
        for j in range(0, divide):
            offset_height = i * h_cellsize
            offset_width = j * w_cellsize
            target_height = h_cellsize
            target_width = w_cellsize
            img_temp = tf.image.crop_to_bounding_box(image, offset_height, offset_width, target_height, target_width)

            if i * divide + j in kb:
                # print(i * divide + j, offset_height, offset_width, target_height, target_width)
                img_list_w.append(img_temp / 255.0)
            else:
                img_list_w.append(img_temp * 0)
        img_concat_w = tf.concat([im for im in np.array(img_list_w)], axis=1)
        img_list_h.append(img_concat_w)

    img_concat = tf.concat([im for im in np.array(img_list_h)], axis=0)
    return img_concat

if __name__=='__main__':
    image=RSM_DataGenerator('./v1/cat.jpg',im_w=300,im_h=400,divide=8,lowerlimit=0.7,upperlimit=0.7,postfix='jpg')
    cv2.imshow('rsm_image',np.array(image*255,dtype='uint8')[:, :, [2, 1, 0]])
    cv2.waitKey(0)

将输入图像分为N个不相互重叠的子图像，用于构建随机子空间。每一个子空间包含α⋅N 个子图像，0≤α≤1 。

我们用一个随机索引向量kb∈ $Z^{\alpha N}$ 生成随机子空间b (b=1,2,…,B) ，kb 中的每一个元素都不重复，且元素值在1和N之间。

原图 divide=16，alpha=0.7 divide=8，alpha=0.7

tf.image的部分方法及相关处理图像的方法，解释如下：

tf.image.decode_jpeg或者tf.image.decode_bmp

https://tensorflow.google.cn/versions/r2.1/api_docs/python/tf/io/decode_jpeg

channels为解码图像所需的颜色通道数量。

ratio为下采样倍数，为2则尺寸缩小2倍，即是height长和width宽缩小2倍，面积缩小4倍。

tf.cast转化数据类型

https://tensorflow.google.cn/versions/r2.1/api_docs/python/tf/cast

The operation supports data types (for x and dtype) of uint8, uint16, uint32, uint64, int8, int16, int32, int64, float16, float32, float64, complex64, complex128, bfloat16.

In case of casting from complex types (complex64, complex128) to real types, only the real part of x is returned.

In case of casting from real types to complex types (complex64, complex128), the imaginary part of the returned value is set to 0.

The handling of complex types here matches the behavior of numpy.

用tf.image.decode_jpeg读取的图片数据格式为uint8，如果后续要做算术运算的话要先转float32

tf.Tensor获取尺寸：方法get_shape().as_list()或者shape数据成员

https://tensorflow.google.cn/versions/r2.1/api_docs/python/tf/Tensor

tf.Tensor类的get_shape方法返回tf.TensorShape对象

https://github.com/tensorflow/tensorflow/blob/v2.1.0/tensorflow/python/framework/ops.py#L572-L574

https://tensorflow.google.cn/versions/r2.1/api_docs/python/tf/TensorShape（tf.TensorShape类）

tf.TensorShape类的as_list方法，返回一个list，其中包含每个维度的大小

tf.Tensor类的shape数据成员

tf.Tensor类的shape数据成员本身是一个tf.TensorShape对象，

不论是tf.TensorShape对象还是tf.TensorShape.as_list返回的list对象，都可以用中括号进行取值，

数据内容为：[height,width,channels]，对应，[行数，列数，通道数]

tf.image.resize

https://tensorflow.google.cn/versions/r2.1/api_docs/python/tf/image/resize

这里注意输入的size=[height，width]对应图像的高和宽。

resize默认返回的tensor的数据格式为float32类型。

tf.image.crop_to_bounding_box

https://tensorflow.google.cn/versions/r2.1/api_docs/python/tf/image/crop_to_bounding_box

offset_height和offset_width表示要获取的boundingbox的左上角点在图像中的位置，offset_width是宽度方向的坐标值，offset_height是高度方向的坐标值

target_height和target_width表示要获取的boundingbox的高和宽，而且注意到，这个boundingbox的取值范围包含左上角点。

并且，这个函数返回值和batchsize无关：

如果是4Dtensor，则返回 [batch, target_height, target_width, channels]

如果是3Dtensor，则返回 [target_height, target_width, channels]

tf.concat

https://tensorflow.google.cn/versions/r2.1/api_docs/python/tf/concat

按照指定维度方向对多个tensor进行拼接，出了指定维度方向之外其他维度的大小必须相等，axis的取值参考tf.TensorShape对象

在本文的随机子空间方法（RSM）中，我先对宽度方向拼接（axis=1），然后对高度方向拼接（axis=0）

当axis=-1，相当于对最后按照tensor的最后一维进行拼接。

魔乐社区

魔乐社区（Modelers.cn) 是一个中立、公益的人工智能社区，提供人工智能工具、模型、数据的托管、展示与应用协同服务，为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作，由全产业链共同建设、共同运营、共同享有，推动国产AI生态繁荣发展。

更多推荐

全家桶集齐！Qwen3.5四款小模型上线魔乐社区，附昇腾全套实践教程

魔乐社区

Pont - 搭建前后端之桥：高效、灵活的接口管理工具

Pont 是一款强大的数据服务层解决方案，它能够帮助开发者快速搭建前后端之间的桥梁，实现接口的高效管理和代码自动生成。无论是新手还是有经验的开发者，都能通过 Pont 轻松处理接口文档、生成类型安全的 API 代码，从而显著提升开发效率。[![Pont 工具标志](https://raw.gitcode.com/gh_mirrors/po/pont/raw/3f1b7d4bbba3fd2dda