-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CodeCamp2023-325] Find the proper learning rate #1318
base: main
Are you sure you want to change the base?
Conversation
Thank you for your contribution. Your PR message and docstring helped me understand your design. Before we delve into implementation details, let's consider the relationship between the |
@HAOCHENYE Thank you for your feedback. Your suggestion resonated with me, and I believe you've raised a valid point. In line with your suggestion, I've added a class method to the For example, users can now employ the following streamlined approach: runner = Runner.from_tuning(
runner_cfg=runner_cfg,
hparam_spec={
'optim_wrapper.optimizer.lr': {
'type': 'continuous',
'lower': 1e-5,
'upper': 1e-3
}
},
monitor='loss',
rule='less',
num_trials=16,
tuning_epoch=2,
searcher_cfg=dict(type='NevergradSearcher'),
)
runner.train() This design not only enhances code readability but also provides users with a seamless experience of automatic hyperparameter discovery integrated with end-to-end training using the I appreciate your valuable insights and would love to hear any further thoughts or feedback you might have. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution! I believe the current solution is reasonable. However, I'm curious if it's possible to integrate the process of calling from_tuning
into the Runner.train
and have Runner control whether or not to use the Tunner through a parameter. What do you think are the potential risks associated with this approach?
mmengine/tune/_report_hook.py
Outdated
report_op (str, optional): The operation to report the score. | ||
Options are 'latest', 'mean'. Defaults to 'latest'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to describe the meaning of latest
, mean
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In line with your suggestion, I have added comments to the meaning and role of report_op
, explaining the options like latest
and mean
.
mmengine/tune/_report_hook.py
Outdated
max_scoreboard_len (int, optional): | ||
The maximum length of the scoreboard. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scoreboard
is a new conception for users. We need to introduce it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify the newly introduced concept of the scoreboard
, I have incorporated additional comments in the relevant section to guide users regarding its purpose and usage.
mmengine/tune/_report_hook.py
Outdated
|
||
tag, _ = runner.log_processor.get_log_after_iter( | ||
runner, batch_idx, 'train') | ||
score = tag.get(self.monitor, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
score = tag.get(self.monitor, None) | |
score = tag.get(self.monitor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest adding a prefix to monitor
like train/loss
and val/accuracy
. We only check the monitor at specific phase (train or validation) according to its prefix, and raise an error immediately if it is not defined in tag. Reporting an error to users immediately could be better than raising an error after the whole tuning round.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to check the monitored value is a number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following your advice, I have enhanced the monitoring process by specifying prefixes to it. Moreover, I've embedded logic to verify that the monitored values are numerical to prevent potential errors.
broadcast_object_list(scores_to_broadcast, src=0) | ||
score = scores_to_broadcast[0] | ||
if is_main_process(): | ||
self._searcher.record(hparam, score) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, the loss
is not synchronized across ranks in mmengine (convenient for watch loss of different ranks), which means that searcher of different ranks could receive different scores and suggest different hparams. Maybe we should apply reduce-mean op to score
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have modified the process to communicate score
through a reduce-mean operation, aligning with your observation to enhance the consistency across different ranks.
if is_main_process(): | ||
hparams_to_broadcast = [self._searcher.suggest()] | ||
else: | ||
hparams_to_broadcast = [None] # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not mistaken, the reason for only calling suggest
within the main process is that there might be randomness in the results from the searcher for the same scores. Maybe we should highlight this point (randomness) in the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You were absolutely right regarding the randomness associated with the suggest
method. To provide a clearer picture, I have appended comments highlighting that the method is solely executed in the main process due to the inherent randomness in the outcomes.
mmengine/tune/tuner.py
Outdated
for k in keys[:-1]: | ||
if isinstance(cfg, list): | ||
idx = int(k) | ||
if idx >= len(cfg) or idx < 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A negative index is allowed for list
. Is it necessary to raise an error here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To accommodate the utility of negative indices, I have removed the assertion that previously restricted this, allowing for more flexibility in list operations.
self._logger.info(f'Best hyperparameters obtained: {best_hparam}') | ||
self._logger.info(f'Best score obtained: {best_score}') | ||
self._logger.info('Tuning completed.') | ||
return dict(hparam=best_hparam, score=best_score) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the Runner, several global variables are defined, including Visualizer
, Logger
, and DefaultScope
(which inherits from ManagerMixin). We should clean these variables after each trial. Furthermore, it might be more advantageous to conduct the tuning process within a temporary directory and delete it once it is completed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking cues from your insights, I have incorporated a mechanism to clear global variables following each trial, referencing the testing function for the implementation. Additionally, I have configured the tuning process to transpire within a temporary directory, ensuring its deletion upon completion to maintain a clean workspace.
Co-authored-by: Mashiro <[email protected]>
Hello, @HAOCHENYE Thank you very much for your thoughtful suggestion. I appreciate the proactive approach to potentially integrating the However, I think there is a potential risk in combining the two methods. Firstly, at the time of invoking the tuning within the train method, the runner instance has already been instantiated. This means that we will have co-existing instances - the caller runner and the callee runner within the tuner. Both these instances maintain their own models, optimizers, and data loaders, potentially increasing the memory usage considerably. Furthermore, replacing the attributes of the caller runner with those of the tuned attributes from the post-tuning could introduce considerable complexities. We might find ourselves having to define intricate logic to safely replace the attributes without any adverse side effects. For instance, if we aim to tune the learning rate, we would need to alter the optimizer’s state dict; if we intend to modify the number of data samples, it would require rebuilding the data loader, among other potential modifications. Pre-defining rules for attribute replacement can be a challenging task given the numerous potential scenarios and combinations that would need to be accounted for. If we are considering integrating tuning within the I am eager to hear your esteemed opinion on whether my concerns are valid or feedback on this matter. |
I'm very sorry for the delayed response. I think your considerations are very reasonable. MMEngine introduced Currently, during initialization, Runner instantiates components like the model, visualizer, and logger. If you want to find the best learning rate during the For me, both searching for the learning rate during the training phase and searching for it through the from_tuning interface are acceptable. You can implement it according to your preference 😄 . |
Hello, @HAOCHENYE , First and foremost, I'd like to express my gratitude for your comprehensive feedback on the proposal. Your insights and the clarity with which you've approached the problem have been immensely beneficial. I am thankful for your dual-pronged approach suggestion, providing both an integration within the
I sincerely hope my approach aligns well with the vision of Thank you once again for your guidance. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1318 +/- ##
=======================================
Coverage ? 70.65%
=======================================
Files ? 160
Lines ? 14498
Branches ? 2995
=======================================
Hits ? 10244
Misses ? 3780
Partials ? 474
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.
Motivation
The primary aim of this pull request is to introduce a tuning methodology for automatically determining the optimal learning rate for model training. Hyperparameter tuning, especially finding the optimal learning rate, is crucial for effective model training. The optimal learning rate serves as a starting point that can significantly reduce the required time and resources in broader hyperparameter space exploration. Given the inherent expensive nature of experiments, adopting a black-box optimization formulation, where the input is the hyperparameter and the output corresponds to model performance, is a strategic choice.
Modification
In this PR, we've integrated a tuning concept that focuses on black-box optimization strategies, such as evolutionary algorithms and Bayesian optimization, to discover the best learning rates. Recognizing the intricate nature of these strategies, instead of implementing from scratch, we've incorporated external libraries like
Nevergrad
(developed by META), ensuring robustness and efficiency in our search process.Structure & Roles
Tuner
The Tuner serves as the main orchestrator for the hyperparameter tuning process.
Report Hook
This component acts as an intermediary, gathering results from the training and evaluation phases and formatting them for further analysis.
Searcher
The Searcher operates in the realm of the hyperparameter space. Using historical data and sophisticated optimization techniques, it suggests the next set of hyperparameters to be evaluated.
BC-breaking (Optional)
Does the modification introduce changes that break the backward-compatibility of the downstream repos?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.
Use cases (Optional)
Checklist
References
TODO