You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Unfortunately, I found out my training using OpenWebTextCorpus is too slow even for 117M model. The cross entropy loss function decreases rapidly before 10k steps using a batch size of 64. After that it stayed around 3.0. Is this a known phenomenon or is it a dataset problem? I found the loss function in model_fns is not shifted. It should be loss_batch = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=output["logits"][:, :-1],labels=features[:, 1:]) , am I right?
The text was updated successfully, but these errors were encountered:
Unfortunately, this is a known phenomena, and I haven't been able to fix it. I perform the shifting of the labels in the input function (it's done in an ugly way, I'd do it differently now, but the effect should be the same). If I didn't shift, the model should converge to 0 loss very rapidly since it's just copying the input. I'm very open to any other ideas of what may be causing this problem. Maybe it is the dataset after all?
@ConnorJL Thanks for the great work.
Unfortunately, I found out my training using OpenWebTextCorpus is too slow even for 117M model. The cross entropy loss function decreases rapidly before 10k steps using a batch size of 64. After that it stayed around 3.0. Is this a known phenomenon or is it a dataset problem? I found the loss function in model_fns is not shifted. It should be loss_batch = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=output["logits"][:, :-1],labels=features[:, 1:]) , am I right?
The text was updated successfully, but these errors were encountered: