Training problem #15

DrYangLiu · 2019-08-05T08:29:45Z

@ConnorJL Thanks for the great work.

Unfortunately, I found out my training using OpenWebTextCorpus is too slow even for 117M model. The cross entropy loss function decreases rapidly before 10k steps using a batch size of 64. After that it stayed around 3.0. Is this a known phenomenon or is it a dataset problem? I found the loss function in model_fns is not shifted. It should be loss_batch = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=output["logits"][:, :-1],labels=features[:, 1:]) , am I right?

ConnorJL · 2019-08-05T08:38:14Z

Unfortunately, this is a known phenomena, and I haven't been able to fix it. I perform the shifting of the labels in the input function (it's done in an ugly way, I'd do it differently now, but the effect should be the same). If I didn't shift, the model should converge to 0 loss very rapidly since it's just copying the input. I'm very open to any other ideas of what may be causing this problem. Maybe it is the dataset after all?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training problem #15

Training problem #15

DrYangLiu commented Aug 5, 2019

ConnorJL commented Aug 5, 2019

Training problem #15

Training problem #15

Comments

DrYangLiu commented Aug 5, 2019

ConnorJL commented Aug 5, 2019