-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass backend_args to BaseDataset #1094
base: main
Are you sure you want to change the base?
Conversation
mmengine/dataset/base_dataset.py
Outdated
backend_args (dict, optional): Arguments to instantiate the | ||
corresponding backend. Defaults to None. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a new parameter in the middle position may cause a bc issue. Suggest moving it to the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Fixed.
Hi, thanks for your contribution. Please sign the CLA before we merge this PR. |
192bc3d
to
534173a
Compare
534173a
to
ef5b5f0
Compare
@@ -435,7 +439,7 @@ def load_data_list(self) -> List[dict]: | |||
""" # noqa: E501 | |||
# `self.ann_file` denotes the absolute annotation file path if | |||
# `self.root=None` or relative path if `self.root=/path/to/data/`. | |||
annotations = load(self.ann_file) | |||
annotations = load(self.ann_file, backend_args=self.backend_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding backend_args
to the load
function is a correct approach, but it may introduce bc-breaking. Assuming self.ann_file
was originally a local path, providing backend_args
could cause the load
method to use another file backend (not local backend) to read the file (self.ann_file), but the file may not exist on that backend.
Suggesting avoiding this bc-breaking
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your comment! It's great to see that you're interested in understanding the potential bc-breaking issue.
However, I'm not sure I understand how the bc-braking case may occur. Could you please give an example?
Thoughts:
In my world when we pass backend_args to the dataset we want this backend to be used and the annotation file is the only file that has to be read by the BaseDataset class by default.
If one has annotation file on one backend, but other files (let's say images) on a different backend - this may be defined for each of the pipeline steps separately.
One possible solution to the the file may not exist on that backend
issue might be to wrap FileNotFound exception inside the load function with a fallback to a local backend, but this option creates a biased decision towards local backend.
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose the annotation file is an absolute path like '/home/username/data/annotation.txt'
.
After adding backend_args (such as AWSBackend)
for load
function, it will try to load the file from AWS but it can not be found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I apologize, I am still unable to comprehend the difference. I fail to grasp why anyone would deliberately include wrong backend_args
being aware of their annotation file's existence on the local drive and expect a different behavior. The default behavior remains unchanged if the backend is not specified. I use absolute paths for path mapping in various backends as well, and it has not presented any problems.
Here is an example for 'aws' backend:
dict(
backend="aws",
path_mapping=dict(
{
"./data/dataset": "aws://your_bucket/dataset", # option one
"/code/data/dataset": "aws://your_bucket/dataset", # option two
}
)
)
I strongly feel that it would be beneficial if you could provide a commit that details your concept, since you possess knowledge of corner cases that I may not be aware of. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @tyomj , I opened a PR in mmdet #10247. In this case, the annotation can not be found in petrel
backend.
# self.ann_file will be /home/username/annotations/instances_train2017.json
# Since backend_args is set, `load` reads `anno_file` from `petrel`, but this file is only in disk.
annotations = load(self.ann_file, backend_args=self.backend_args)
Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.
Motivation
Resolves #1090
Modification
Passing
backend_args
to BaseDataset and using it forload
function as well.Checklist