python中sgd模型_语言模型SGD

转述：Recurrent Neural Networks Tutorial, Part 2 – Implementing a RNN with Python, Numpy and Theano这里主要记录下上面那篇文字里所说的loss的计算和SGD。作者采用的是交叉熵的Loss，公式如下这里把一个字作为一个训练样本，一个句子作为一个mini-batch。计算整个训练语料Loss的代码块为：def

知乎圈子

878人浏览 · 2021-01-11 19:58:49

知乎圈子 · 2021-01-11 19:58:49 发布

转述：Recurrent Neural Networks Tutorial, Part 2 – Implementing a RNN with Python, Numpy and Theano

这里主要记录下上面那篇文字里所说的loss的计算和SGD。

作者采用的是交叉熵的Loss，公式如下

这里把一个字作为一个训练样本，一个句子作为一个mini-batch。计算整个训练语料Loss的代码块为：

def calculate_total_loss(self, x, y):

L = 0

# For each sentence...

for i in np.arange(len(y)):

o, s = self.forward_propagation(x[i])

# We only care about our prediction of the "correct" words

correct_word_predictions = o[np.arange(len(y[i])), y[i]]

# Add to the loss based on how off we were

L += -1 * np.sum(np.log(correct_word_predictions))

return L

def calculate_loss(self, x, y):

# Divide the total loss by the number of training examples

N = np.sum((len(y_i) for y_i in y))

return self.calculate_total_loss(x,y)/N

RNNNumpy.calculate_total_loss = calculate_total_loss

RNNNumpy.calculate_loss = calculate_loss

这里x是一个二维矩阵，每一行代表一个句子，y和x的形状是一样的，只是每行y[i]是从x[i]的第二个元素开始依次取值。

o[np.arange(len(y[i])), y[i]]

这句话比较难懂，解释如下：

o是一个矩阵，行中的每一个元素代表当前位置预测为一个词的概率，o[i]的长度为vocabulary size。y是一个列向量，y[i]是一个整数index，表示o[i]第几个位置是真实的标签预测值。这句是从一个二维矩阵里取每一行的某个值，得到一个输出向量。

训练开始的函数。X_train是一个二维矩阵，每一行代表一个句子，y和x的形状是一样的，这个里面的SGD每次拿一个句子训练，完成一轮训练后用calculate_loss在X_train上计算一次总的loss，其实就是一个eval动作。

# Outer SGD Loop

# - model: The RNN model instance

# - X_train: The training data set

# - y_train: The training data labels

# - learning_rate: Initial learning rate for SGD

# - nepoch: Number of times to iterate through the complete dataset

# - evaluate_loss_after: Evaluate the loss after this many epochs

def train_with_sgd(model, X_train, y_train, learning_rate=0.005, nepoch=100, evaluate_loss_after=5):

# We keep track of the losses so we can plot them later

losses = []

num_examples_seen = 0

for epoch in range(nepoch):

# Optionally evaluate the loss

if (epoch % evaluate_loss_after == 0):

loss = model.calculate_loss(X_train, y_train)

losses.append((num_examples_seen, loss))

time = datetime.now().strftime('%Y-%m-%d %H:%M:%S')

print "%s: Loss after num_examples_seen=%d epoch=%d: %f" % (time, num_examples_seen, epoch, loss)

# Adjust the learning rate if loss increases

if (len(losses) > 1 and losses[-1][1] > losses[-2][1]):

learning_rate = learning_rate * 0.5

print "Setting learning rate to %f" % learning_rate

sys.stdout.flush()

# For each training example...

for i in range(len(y_train)):

# One SGD step

model.sgd_step(X_train[i], y_train[i], learning_rate)

num_examples_seen += 1

魔乐社区

魔乐社区（Modelers.cn) 是一个中立、公益的人工智能社区，提供人工智能工具、模型、数据的托管、展示与应用协同服务，为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作，由全产业链共同建设、共同运营、共同享有，推动国产AI生态繁荣发展。

更多推荐

如何使用AutoDL平台进行深度学习训练——详细步骤指南

魔乐社区

大数据毕业设计选题推荐-基于大数据的农作物产量数据分析与可视化系统-Hadoop-Spark-数据可视化-BigData

魔乐社区

大模型推理适配实战：手把手带你完成vLLM Ascend迁移实操

魔乐社区

所有评论(0)

查看更多评论

知乎圈子

@weixin_28872105

已为社区贡献1条内容