Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with saving model checkpoints #1

Open
AdityaNikhil opened this issue May 3, 2021 · 1 comment
Open

Issue with saving model checkpoints #1

AdityaNikhil opened this issue May 3, 2021 · 1 comment

Comments

@AdityaNikhil
Copy link

AdityaNikhil commented May 3, 2021

I am trying to train while saving checkpoints at every 5 epochs.
Here's the code of checkpoints inside train.py,

model_checkpoint_cb = tf.keras.callbacks.ModelCheckpoint(filepath=args.checkpoints,
                                 save_weights_only = True,save_freq=5,
                                 verbose=1) 

And I changed the model.fit to,

model.fit(train_generator,
                      validation_data=val_generator,
                      epochs=args.epochs,
                      callbacks=[model_checkpoint_cb],
                      verbose=2,
                      )

I am getting the following error,

Epoch 00001: saving model to ./checkpoints
Traceback (most recent call last):
  File "train.py", line 69, in <module>
    main(args)
  File "train.py", line 60, in main
    model.fit(train_generator,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/training.py", line 108, in _method_wrapper
    return method(self, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/training.py", line 1103, in fit
    callbacks.on_train_batch_end(end_step, logs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/callbacks.py", line 440, in on_train_batch_end
    self._call_batch_hook(ModeKeys.TRAIN, 'end', batch, logs=logs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/callbacks.py", line 289, in _call_batch_hook
    self._call_batch_end_hook(mode, batch, logs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/callbacks.py", line 309, in _call_batch_end_hook
    self._call_batch_hook_helper(hook_name, batch, logs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/callbacks.py", line 342, in _call_batch_hook_helper
    hook(batch, logs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/callbacks.py", line 1240, in on_train_batch_end
    self._save_model(epoch=self._current_epoch, logs=logs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/callbacks.py", line 1310, in _save_model
    self.model.save_weights(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/training.py", line 2101, in save_weights
    self._trackable_saver.save(filepath, session=session, options=options)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/tracking/util.py", line 1199, in save
    save_path, new_feed_additions = self._save_cached_when_graph_building(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/tracking/util.py", line 1136, in _save_cached_when_graph_building
    feed_additions) = self._gather_saveables(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/tracking/util.py", line 1103, in _gather_saveables
    feed_additions) = self._graph_view.serialize_object_graph()
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/tracking/graph_view.py", line 386, in serialize_object_graph
    return self._serialize_gathered_objects(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/tracking/graph_view.py", line 342, in _serialize_gathered_objects
    object_names[obj] = _object_prefix_from_path(path)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/tracking/graph_view.py", line 62, in _object_prefix_from_path
    return "/".join(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/tracking/graph_view.py", line 63, in <genexpr>
    (_escape_local_name(trackable.name)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/tracking/graph_view.py", line 57, in _escape_local_name
    return (name.replace(_ESCAPE_CHAR, _ESCAPE_CHAR + _ESCAPE_CHAR)
AttributeError: 'NoneType' object has no attribute 'replace'
2021-05-03 13:59:12.379732: W tensorflow/core/kernels/data/generator_dataset_op.cc:103] Error occurred when finalizing GeneratorDataset iterator: Failed precondition: Python interpreter state is not initialized. The process may be terminated.
	 [[{{node PyFunc}}]]

Can you please look into the issue. Also, I really admire your repository. I am trying to reproduce the PVT paper and learning how you did it in the first place.

@wangermeng2021
Copy link
Owner

The bug is fixed. you can try it. Let me know if there is a problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants