python array太慢_python - gdal ReadAsarray对于vrt非常慢 - 堆栈内存溢出
迟到了,但这是我根据Rutger的优秀答案写的剧本。它以某种方式优化了磁贴大小,以便您可以读取最少的块。这几乎肯定不是你能做的最好的,但我注意到它在处理大小为[1440000 560000]的地理栅格时大大提高了我的运行时间。这里有两个函数:optimal_tile_size和split_raster_into_tiles。后者将为给定的输入栅格对象提供(优化的wrt块大小)瓦片的...
迟到了,但这是我根据Rutger的优秀答案写的剧本。 它以某种方式优化了磁贴大小,以便您可以读取最少的块。 这几乎肯定不是你能做的最好的,但我注意到它在处理大小为[1440000 560000]的地理栅格时大大提高了我的运行时间。 这里有两个函数:optimal_tile_size和split_raster_into_tiles。 后者将为给定的输入栅格对象提供(优化的wrt块大小)瓦片的坐标。
import numpy as np
import gdal
def split_raster_into_tiles(rast_obj, N, overlap=0, aspect=0):
"""
Given an input raster object (created via gdal) and the number of tiles
you wish to break it into, returns an Nx4 array of coordinates corresponding
to the corners of boxes that tile the input. Tiles are created to minimize
the number of blocks read from the raster file. An optional
overlap value will adjust the tiles such that they each overlap by the
specified pixel amount.
INPUTS:
rast_obj - obj: gdal raster object created by gdal.Open()
N - int: number of tiles to split raster into
overlap - int: (optional) number of pixels each tile should overlap
aspect - float or str: (optional) if set to 0, aspect is not considered. If
set to 'use_raster', aspect of raster is respected.
Otherwise can provide an aspect = dx/dy.
OUTPUTS:
tiles - np array: Nx4 array where N is number of tiles and the four
columns correspond to: [xmin, xmax, ymin, ymax]
"""
# Get optimal tile size
dx, dy = optimal_tile_size(rast_obj, N, aspect=aspect)
ncols = rast_obj.RasterXSize
nrows = rast_obj.RasterYSize
# Generate nonoverlapping tile edge coordinates
ulx = np.array([0])
uly = np.array([0])
while ulx[-1] < ncols:
ulx = np.append(ulx, ulx[-1]+dx)
while uly[-1] < nrows:
uly = np.append(uly, uly[-1]+dy)
# In case of overshoots, remove the last points and replace with raster extent
ulx = ulx[:-1]
uly = uly[:-1]
ulx = np.append(ulx, ncols)
uly = np.append(uly, nrows)
# Ensure final tile is large enough; if not, delete second-from-last point
if ulx[-1] - ulx[-2] < 2*overlap:
ulx = np.delete(ulx, -2)
if uly[-1] - uly[-2] < 2*overlap:
uly = np.delete(uly, -2)
# Create tiles array where each row corresponds to [xmin, xmax, ymin, ymax]
tiles = np.empty(((len(ulx)-1)*(len(uly)-1),4), dtype=int)
rowct = 0
for i in np.arange(0,len(ulx[:-1])):
for j in np.arange(0,len(uly[:-1])):
tiles[rowct,0] = ulx[i]
tiles[rowct,1] = ulx[i+1]
tiles[rowct,2] = uly[j]
tiles[rowct,3] = uly[j+1]
rowct = rowct + 1
# Adjust tiles for overlap
if overlap > 0:
tiles[tiles[:,0] > overlap, 0] = tiles[tiles[:,0] > overlap, 0] - overlap
tiles[tiles[:,1] < (ncols - overlap), 1] = tiles[tiles[:,1] < (ncols - overlap), 1] + overlap
tiles[tiles[:,2] > overlap, 2] = tiles[tiles[:,2] > overlap, 2] - overlap
tiles[tiles[:,3] < (nrows - overlap), 3] = tiles[tiles[:,3] < (nrows - overlap), 3] + overlap
print('Tile size X, Y is {}, {}.'.format(dx, dy))
return tiles
def optimal_tile_size(rast_obj, N, aspect=0):
"""
Returns a tile size that optimizes reading a raster by considering the
blocksize of the raster. The raster is divided into (roughly) N tiles. If
the shape of the tiles is unimportant (aspect=0), optimization
considers only the blocksize. If an aspect ratio is provided, optimization
tries to respect it as much as possible.
INPUTS:
rast_obj - obj: gdal raster object created by gdal.Open()
N - int: number of tiles to split raster into
aspect - float or str: (optional) - If no value is provided, the
aspect ratio is set only by the blocksize. If aspect is set
to 'use_raster', aspect is obtained from the aspect of the
given raster. Optionally, an aspect may be provided where
aspect = dx/dy.
OUTPUTS:
dx - np.int: optimized number of columns of each tile
dy - np.int: optimized number of rows of each tile
"""
# # If a vrt, try to get the underlying raster blocksize
# filepath = rast_obj.GetDescription()
# extension = filepath.split('.')[-1]
# if extension == 'vrt':
# sample_tif = rast_obj.GetFileList()[-1]
# st = gdal.Open(sample_tif)
# blocksize = st.GetRasterBand(1).GetBlockSize()
# else:
# blocksize = rast_obj.GetRasterBand(1).GetBlockSize()
blocksize = rast_obj.GetRasterBand(1).GetBlockSize()
ncols = rast_obj.RasterXSize
nrows = rast_obj.RasterYSize
# Compute ratios for sizing
totalpix = ncols * nrows
pix_per_block = blocksize[0] * blocksize[1]
pix_per_tile = totalpix / N
if aspect == 0: # optimize tile size for fastest I/O
n_blocks_per_tile = np.round(pix_per_tile / pix_per_block)
if n_blocks_per_tile >= 1:
# This assumes the larger dimension of the block size should be retained for sizing tiles
if blocksize[0] > blocksize[1] or blocksize[0] == blocksize[1]:
dx = blocksize[0]
dy = np.round(pix_per_tile / dx)
ndy = dy / nrows
if ndy > 1.5:
dx = dx * np.round(ndy)
dy = np.round((pix_per_tile / dx) / blocksize[1]) * blocksize[1]
dy = np.min((dy, nrows))
if dy == 0:
dy = blocksize[1]
else:
dy = blocksize[1]
dx = np.round(pix_per_tile / dy)
ndx = dx / ncols
if ndx > 1.5:
dy = dy * np.round(ndx)
dx = np.round((pix_per_tile / dy) / blocksize[0]) * blocksize[0]
dx = np.min((dx, ncols))
if dx == 0:
dx = blocksize[0]
else:
print('Block size is smaller than tile size; setting tile size to block size.')
dy = blocksize[0]
dx = blocksize[1]
else: # optimize but respect the aspect ratio as much as possible
if aspect == 'use_raster':
aspect = ncols / nrows
dya = np.round(np.sqrt(pix_per_tile / aspect))
dxa = np.round(aspect * dya)
dx = np.round(dxa / blocksize[0]) * blocksize[0]
dx = np.min((dx, ncols))
dy = np.round(dya / blocksize[1]) * blocksize[1]
dy = np.min((dy, nrows))
# Set dx,dy to blocksize if they're zero
if dx == 0:
dx = blocksize[0]
if dy == 0:
dy = blocksize[1]
return dx, dy
作为一个快速的基准测试,我尝试将42000 x 36000虚拟光栅(带有一些额外的计算)平铺到30个图块中。 启用优化后,运行时间为120秒。 没有它,运行时间为596秒。 如果您要对大文件进行平铺,那么将块大小考虑在内将是值得的。
魔乐社区(Modelers.cn) 是一个中立、公益的人工智能社区,提供人工智能工具、模型、数据的托管、展示与应用协同服务,为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作,由全产业链共同建设、共同运营、共同享有,推动国产AI生态繁荣发展。
更多推荐



所有评论(0)