python array太慢_python - gdal ReadAsarray对于vrt非常慢 - 堆栈内存溢出

迟到了，但这是我根据Rutger的优秀答案写的剧本。它以某种方式优化了磁贴大小，以便您可以读取最少的块。这几乎肯定不是你能做的最好的，但我注意到它在处理大小为[1440000 560000]的地理栅格时大大提高了我的运行时间。这里有两个函数：optimal_tile_size和split_raster_into_tiles。后者将为给定的输入栅格对象提供(优化的wrt块大小)瓦片的...

weixin_39936403

444人浏览 · 2020-12-07 13:08:08

weixin_39936403 · 2020-12-07 13:08:08 发布

import numpy as np

import gdal

def split_raster_into_tiles(rast_obj, N, overlap=0, aspect=0):

"""

Given an input raster object (created via gdal) and the number of tiles

you wish to break it into, returns an Nx4 array of coordinates corresponding

to the corners of boxes that tile the input. Tiles are created to minimize

the number of blocks read from the raster file. An optional

overlap value will adjust the tiles such that they each overlap by the

specified pixel amount.

INPUTS:

rast_obj - obj: gdal raster object created by gdal.Open()

N - int: number of tiles to split raster into

overlap - int: (optional) number of pixels each tile should overlap

aspect - float or str: (optional) if set to 0, aspect is not considered. If

set to 'use_raster', aspect of raster is respected.

Otherwise can provide an aspect = dx/dy.

OUTPUTS:

tiles - np array: Nx4 array where N is number of tiles and the four

columns correspond to: [xmin, xmax, ymin, ymax]

"""

# Get optimal tile size

dx, dy = optimal_tile_size(rast_obj, N, aspect=aspect)

ncols = rast_obj.RasterXSize

nrows = rast_obj.RasterYSize

# Generate nonoverlapping tile edge coordinates

ulx = np.array([0])

uly = np.array([0])

while ulx[-1] < ncols:

ulx = np.append(ulx, ulx[-1]+dx)

while uly[-1] < nrows:

uly = np.append(uly, uly[-1]+dy)

# In case of overshoots, remove the last points and replace with raster extent

ulx = ulx[:-1]

uly = uly[:-1]

ulx = np.append(ulx, ncols)

uly = np.append(uly, nrows)

# Ensure final tile is large enough; if not, delete second-from-last point

if ulx[-1] - ulx[-2] < 2*overlap:

ulx = np.delete(ulx, -2)

if uly[-1] - uly[-2] < 2*overlap:

uly = np.delete(uly, -2)

# Create tiles array where each row corresponds to [xmin, xmax, ymin, ymax]

tiles = np.empty(((len(ulx)-1)*(len(uly)-1),4), dtype=int)

rowct = 0

for i in np.arange(0,len(ulx[:-1])):

for j in np.arange(0,len(uly[:-1])):

tiles[rowct,0] = ulx[i]

tiles[rowct,1] = ulx[i+1]

tiles[rowct,2] = uly[j]

tiles[rowct,3] = uly[j+1]

rowct = rowct + 1

# Adjust tiles for overlap

if overlap > 0:

tiles[tiles[:,0] > overlap, 0] = tiles[tiles[:,0] > overlap, 0] - overlap

tiles[tiles[:,1] < (ncols - overlap), 1] = tiles[tiles[:,1] < (ncols - overlap), 1] + overlap

tiles[tiles[:,2] > overlap, 2] = tiles[tiles[:,2] > overlap, 2] - overlap

tiles[tiles[:,3] < (nrows - overlap), 3] = tiles[tiles[:,3] < (nrows - overlap), 3] + overlap

print('Tile size X, Y is {}, {}.'.format(dx, dy))

return tiles

def optimal_tile_size(rast_obj, N, aspect=0):

"""

Returns a tile size that optimizes reading a raster by considering the

blocksize of the raster. The raster is divided into (roughly) N tiles. If

the shape of the tiles is unimportant (aspect=0), optimization

considers only the blocksize. If an aspect ratio is provided, optimization

tries to respect it as much as possible.

INPUTS:

rast_obj - obj: gdal raster object created by gdal.Open()

N - int: number of tiles to split raster into

aspect - float or str: (optional) - If no value is provided, the

aspect ratio is set only by the blocksize. If aspect is set

to 'use_raster', aspect is obtained from the aspect of the

given raster. Optionally, an aspect may be provided where

aspect = dx/dy.

OUTPUTS:

dx - np.int: optimized number of columns of each tile

dy - np.int: optimized number of rows of each tile

"""

# # If a vrt, try to get the underlying raster blocksize

# filepath = rast_obj.GetDescription()

# extension = filepath.split('.')[-1]

# if extension == 'vrt':

# sample_tif = rast_obj.GetFileList()[-1]

# st = gdal.Open(sample_tif)

# blocksize = st.GetRasterBand(1).GetBlockSize()

# else:

# blocksize = rast_obj.GetRasterBand(1).GetBlockSize()

blocksize = rast_obj.GetRasterBand(1).GetBlockSize()

ncols = rast_obj.RasterXSize

nrows = rast_obj.RasterYSize

# Compute ratios for sizing

totalpix = ncols * nrows

pix_per_block = blocksize[0] * blocksize[1]

pix_per_tile = totalpix / N

if aspect == 0: # optimize tile size for fastest I/O

n_blocks_per_tile = np.round(pix_per_tile / pix_per_block)

if n_blocks_per_tile >= 1:

# This assumes the larger dimension of the block size should be retained for sizing tiles

if blocksize[0] > blocksize[1] or blocksize[0] == blocksize[1]:

dx = blocksize[0]

dy = np.round(pix_per_tile / dx)

ndy = dy / nrows

if ndy > 1.5:

dx = dx * np.round(ndy)

dy = np.round((pix_per_tile / dx) / blocksize[1]) * blocksize[1]

dy = np.min((dy, nrows))

if dy == 0:

dy = blocksize[1]

else:

dy = blocksize[1]

dx = np.round(pix_per_tile / dy)

ndx = dx / ncols

if ndx > 1.5:

dy = dy * np.round(ndx)

dx = np.round((pix_per_tile / dy) / blocksize[0]) * blocksize[0]

dx = np.min((dx, ncols))

if dx == 0:

dx = blocksize[0]

else:

print('Block size is smaller than tile size; setting tile size to block size.')

dy = blocksize[0]

dx = blocksize[1]

else: # optimize but respect the aspect ratio as much as possible

if aspect == 'use_raster':

aspect = ncols / nrows

dya = np.round(np.sqrt(pix_per_tile / aspect))

dxa = np.round(aspect * dya)

dx = np.round(dxa / blocksize[0]) * blocksize[0]

dx = np.min((dx, ncols))

dy = np.round(dya / blocksize[1]) * blocksize[1]

dy = np.min((dy, nrows))

# Set dx,dy to blocksize if they're zero

if dx == 0:

dx = blocksize[0]

if dy == 0:

dy = blocksize[1]

return dx, dy

作为一个快速的基准测试，我尝试将42000 x 36000虚拟光栅(带有一些额外的计算)平铺到30个图块中。启用优化后，运行时间为120秒。没有它，运行时间为596秒。如果您要对大文件进行平铺，那么将块大小考虑在内将是值得的。

魔乐社区

魔乐社区（Modelers.cn) 是一个中立、公益的人工智能社区，提供人工智能工具、模型、数据的托管、展示与应用协同服务，为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作，由全产业链共同建设、共同运营、共同享有，推动国产AI生态繁荣发展。

更多推荐

量化挑战赛冠军专访：4小时啃下W4A8量化，我靠的是这些经验

魔乐社区

「极限压缩量化未来」Modelers GeekDay上海站圆满落幕

魔乐社区

替你试过了，消费级显卡可以跑的开源文生图SOTA模型，顶级渲染、高密度文本绘图

魔乐社区

所有评论(0)

查看更多评论

weixin_39936403

@weixin_39936403

已为社区贡献10条内容

python array太慢_python - gdal ReadAsarray对于vrt非常慢 - 堆栈内存溢出

weixin_39936403

所有评论(0)

温馨提示：您尚未绑定手机号

weixin_39936403