转述:Recurrent Neural Networks Tutorial, Part 2 – Implementing a RNN with Python, Numpy and Theano

这里主要记录下上面那篇文字里所说的loss的计算和SGD。

作者采用的是交叉熵的Loss,公式如下

这里把一个字作为一个训练样本,一个句子作为一个mini-batch。计算整个训练语料Loss的代码块为:

def calculate_total_loss(self, x, y):

L = 0

# For each sentence...

for i in np.arange(len(y)):

o, s = self.forward_propagation(x[i])

# We only care about our prediction of the "correct" words

correct_word_predictions = o[np.arange(len(y[i])), y[i]]

# Add to the loss based on how off we were

L += -1 * np.sum(np.log(correct_word_predictions))

return L

def calculate_loss(self, x, y):

# Divide the total loss by the number of training examples

N = np.sum((len(y_i) for y_i in y))

return self.calculate_total_loss(x,y)/N

RNNNumpy.calculate_total_loss = calculate_total_loss

RNNNumpy.calculate_loss = calculate_loss

这里x是一个二维矩阵,每一行代表一个句子,y和x的形状是一样的,只是每行y[i]是从x[i]的第二个元素开始依次取值。

o[np.arange(len(y[i])), y[i]]

这句话比较难懂,解释如下:

o是一个矩阵,行中的每一个元素代表当前位置预测为一个词的概率,o[i]的长度为vocabulary size。y是一个列向量,y[i]是一个整数index,表示o[i]第几个位置是真实的标签预测值。这句是从一个二维矩阵里取每一行的某个值,得到一个输出向量。

训练开始的函数。X_train是一个二维矩阵,每一行代表一个句子,y和x的形状是一样的,这个里面的SGD每次拿一个句子训练,完成一轮训练后用calculate_loss在X_train上计算一次总的loss,其实就是一个eval动作。

# Outer SGD Loop

# - model: The RNN model instance

# - X_train: The training data set

# - y_train: The training data labels

# - learning_rate: Initial learning rate for SGD

# - nepoch: Number of times to iterate through the complete dataset

# - evaluate_loss_after: Evaluate the loss after this many epochs

def train_with_sgd(model, X_train, y_train, learning_rate=0.005, nepoch=100, evaluate_loss_after=5):

# We keep track of the losses so we can plot them later

losses = []

num_examples_seen = 0

for epoch in range(nepoch):

# Optionally evaluate the loss

if (epoch % evaluate_loss_after == 0):

loss = model.calculate_loss(X_train, y_train)

losses.append((num_examples_seen, loss))

time = datetime.now().strftime('%Y-%m-%d %H:%M:%S')

print "%s: Loss after num_examples_seen=%d epoch=%d: %f" % (time, num_examples_seen, epoch, loss)

# Adjust the learning rate if loss increases

if (len(losses) > 1 and losses[-1][1] > losses[-2][1]):

learning_rate = learning_rate * 0.5

print "Setting learning rate to %f" % learning_rate

sys.stdout.flush()

# For each training example...

for i in range(len(y_train)):

# One SGD step

model.sgd_step(X_train[i], y_train[i], learning_rate)

num_examples_seen += 1

Logo

魔乐社区(Modelers.cn) 是一个中立、公益的人工智能社区,提供人工智能工具、模型、数据的托管、展示与应用协同服务,为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作,由全产业链共同建设、共同运营、共同享有,推动国产AI生态繁荣发展。

更多推荐