Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Label Filtering #239

Open
jue-henry opened this issue Aug 8, 2024 · 3 comments
Open

Optimize Label Filtering #239

jue-henry opened this issue Aug 8, 2024 · 3 comments
Labels
database optimization question Further information is requested

Comments

@jue-henry
Copy link
Contributor

jue-henry commented Aug 8, 2024

Currently, we are still seeing query timeouts on mongodb , and the requests seem to be exclusively involve some form of label filtering. I am proposing 2 options:

  1. optimize existing label filtering, currently the label filtering via the mongodb query seems very unintuitive. I think research can be done to potentially root out why it was initially done this way and improve on it
  2. move label filtering server side. This is the next most intuitive option to attempt to address this issue, as filtering can be fairly easily done through code. We query the db with the other options provided and after we filter via the labels. The main worry with this option is to not break pagination if we must rerequest images if the minimum is not reached due to the server side label filtering
@jue-henry jue-henry added question Further information is requested database optimization labels Aug 8, 2024
@jue-henry jue-henry self-assigned this Aug 8, 2024
@nathanielrindlaub
Copy link
Member

@jue-henry thanks for continuing to push on this! What do you mean by "label validation" exactly? Are you talking about image queries that include some kind of validation state check, or are you talking about actual label validation mutations (e.g. when a user validates a label)?

My recommendation would be start with option 1. I think a fresh set of eyes would be super welcome here, and if we can make any of the label validation more efficient I'm all for it.

Number 2 is interesting, but I'd also add the concern that that could consume a huge amount of memory if we're needing to load potentially millions of image records in memory and then filter them.

@nathanielrindlaub
Copy link
Member

Related: #119

@jue-henry
Copy link
Contributor Author

jue-henry commented Aug 8, 2024

@nathanielrindlaub Sorry will clarify in the issue itself as well, but I mean label filtering/searching via the labels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
database optimization question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants