使用python将多个字幕文件合并
下载电影字幕的时候,经常会出现一个字幕文件被分成cd1.srt、cd2.srt,但电影是一整块的尴尬情况。为了解决此问题,写了以下代码来合并多个这样的文件读取字幕文件def read_srt(path):content = ""with open(path) as f:content = f.read()return content小测试content = read_srt('1.srt')pri
·
下载电影字幕的时候,经常会出现一个字幕文件被分成cd1.srt、cd2.srt,但电影是一整块的尴尬情况。为了解决此问题,写了以下代码来合并多个这样的文件
读取字幕文件
def read_srt(path):
content = ""
with open(path) as f:
content = f.read()
return content
小测试
content = read_srt('1.srt')
print(content)

content = read_srt('2.srt')
print(content)

可以看到字幕的合并要做到两点,一个是内容要合并,一个是序号要对齐,因为2.srt中序号也是从1开始的。
字幕拆分
def get_sequences(content):
sequences = content.split('\n\n')
sequences = [sequence.split('\n') for sequence in sequences]
# 去除每一句空值
sequences = [list(filter(None, sequence)) for sequence in sequences]
# 去除整体空值
return list(filter(None, sequences))
小测试
sequences = get_sequences(content)
sequences

字幕修改
def change_sequences(sequences, start_index):
for sequence in sequences:
sequence[0] = str(start_index)
start_index += 1
生成新字幕
def save_srt(names):
new_content = []
start_index = 1
for name in names:
content = read_srt(name)
sequences = get_sequences(content)
change_sequences(sequences, start_index)
start_index = len(sequences) + 1
new_content += sequences
new_content = ['\n'.join(word) for word in new_content]
new_content = '\n\n'.join(new_content)
print(new_content)
with open('result.srt', 'a') as f:
f.write(new_content)
save_srt(['1.srt', '2.srt'])
魔乐社区(Modelers.cn) 是一个中立、公益的人工智能社区,提供人工智能工具、模型、数据的托管、展示与应用协同服务,为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作,由全产业链共同建设、共同运营、共同享有,推动国产AI生态繁荣发展。
更多推荐

所有评论(0)