Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass backend_args to BaseDataset #1094

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tyomj
Copy link

@tyomj tyomj commented Apr 21, 2023

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

Resolves #1090

Modification

Passing backend_args to BaseDataset and using it for load function as well.

Checklist

  1. Pre-commit or other linting tools are used to fix the potential lint issues.
  2. The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
  3. If the modification has potential influence on downstream projects, this PR should be tested with downstream projects, like MMDet or MMCls.
  4. The documentation has been modified accordingly, like docstring or example tutorials.

@CLAassistant
Copy link

CLAassistant commented Apr 21, 2023

CLA assistant check
All committers have signed the CLA.

@tyomj tyomj changed the title Pass backend_args to BaseDataset Pass backend_args to BaseDataset [closes #1090] Apr 21, 2023
@tyomj tyomj changed the title Pass backend_args to BaseDataset [closes #1090] Pass backend_args to BaseDataset Apr 21, 2023
Comment on lines 176 to 177
backend_args (dict, optional): Arguments to instantiate the
corresponding backend. Defaults to None.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a new parameter in the middle position may cause a bc issue. Suggest moving it to the end.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Fixed.

@zhouzaida zhouzaida added this to the 0.7.4 milestone Apr 22, 2023
@zhouzaida
Copy link
Collaborator

CLA assistant check Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

Artem Kozlov seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Hi, thanks for your contribution. Please sign the CLA before we merge this PR.

@tyomj tyomj force-pushed the feature/backend_args branch from 192bc3d to 534173a Compare April 24, 2023 11:04
@tyomj tyomj force-pushed the feature/backend_args branch from 534173a to ef5b5f0 Compare April 24, 2023 11:09
@@ -435,7 +439,7 @@ def load_data_list(self) -> List[dict]:
""" # noqa: E501
# `self.ann_file` denotes the absolute annotation file path if
# `self.root=None` or relative path if `self.root=/path/to/data/`.
annotations = load(self.ann_file)
annotations = load(self.ann_file, backend_args=self.backend_args)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding backend_args to the load function is a correct approach, but it may introduce bc-breaking. Assuming self.ann_file was originally a local path, providing backend_args could cause the load method to use another file backend (not local backend) to read the file (self.ann_file), but the file may not exist on that backend.

Suggesting avoiding this bc-breaking.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your comment! It's great to see that you're interested in understanding the potential bc-breaking issue.
However, I'm not sure I understand how the bc-braking case may occur. Could you please give an example?

Thoughts:
In my world when we pass backend_args to the dataset we want this backend to be used and the annotation file is the only file that has to be read by the BaseDataset class by default.
If one has annotation file on one backend, but other files (let's say images) on a different backend - this may be defined for each of the pipeline steps separately.
One possible solution to the the file may not exist on that backend issue might be to wrap FileNotFound exception inside the load function with a fallback to a local backend, but this option creates a biased decision towards local backend.

Thank you!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose the annotation file is an absolute path like '/home/username/data/annotation.txt'.
After adding backend_args (such as AWSBackend) for load function, it will try to load the file from AWS but it can not be found.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize, I am still unable to comprehend the difference. I fail to grasp why anyone would deliberately include wrong backend_args being aware of their annotation file's existence on the local drive and expect a different behavior. The default behavior remains unchanged if the backend is not specified. I use absolute paths for path mapping in various backends as well, and it has not presented any problems.

Here is an example for 'aws' backend:

dict(
    backend="aws",
    path_mapping=dict(
        {
            "./data/dataset": "aws://your_bucket/dataset", # option one
            "/code/data/dataset": "aws://your_bucket/dataset", # option two
        }
    )
)

I strongly feel that it would be beneficial if you could provide a commit that details your concept, since you possess knowledge of corner cases that I may not be aware of. Thank you!

Copy link
Collaborator

@zhouzaida zhouzaida Apr 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @tyomj , I opened a PR in mmdet #10247. In this case, the annotation can not be found in petrel backend.

# self.ann_file will be /home/username/annotations/instances_train2017.json
# Since backend_args is set, `load` reads `anno_file` from `petrel`, but this file is only in disk.
annotations = load(self.ann_file, backend_args=self.backend_args)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] backend_args for BaseDataset
3 participants