Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LTR plugin digest #26

Open
noCharger opened this issue Nov 15, 2023 · 2 comments
Open

LTR plugin digest #26

noCharger opened this issue Nov 15, 2023 · 2 comments
Labels
documentation Improvements or additions to documentation

Comments

@noCharger
Copy link

noCharger commented Nov 15, 2023

Workflow

Screenshot 2023-11-15 at 8 50 37 AM

Core mapping: Grade (from judgment) - Features (feature name 1, feature name 2, ...) - document identifier

Sequence Diagram

Screenshot 2023-11-15 at 8 46 33 AM

Step 1: Create ltr index

ltr index conatains metadata about features and models

curl -X PUT "localhost:9200/_ltr"
{"acknowledged":true,"shards_acknowledged":true,"index":".ltrstore"}%

Step 2: Create feature set

Features are templated OpenSearch Queries. Users can select and experiment with features.

A feature set is a list of features (with unique names) that has been grouped together for logging & model evaluation.

POST _ltr/_featureset/more_movie_features
{
   "featureset": {
        "features": [
            {
                "name": "title_query",
                "params": [
                    "keywords"
                ],
                "template_language": "mustache",
                "template": {
                    "match": {
                        "title": "{{keywords}}"
                    }
                }
            },
            {
                "name": "title_query_boost",
                "params": [
                    "some_multiplier"
                ],
                "template_language": "derived_expression",
                "template": "title_query * some_multiplier"
            },
            {
                "name": "custom_title_query_boost",
                "params": [
                    "some_multiplier"
                ],
                "template_language": "script_feature",
                "template": {
                    "lang": "painless",
                    "source": "params.feature_vector.get('title_query') * (long)params.some_multiplier",
                    "params": {
                        "some_multiplier": "some_multiplier"
                    }
                }
            }
        ]
   }
}

Step 3: Logging feature values with docs

POST tmdb/_search
{
    "query": {
        "bool": {
            "filter": [
                {
                    "terms": {
                        "_id": ["7555", "1370", "1369"]
                    }
                },
                {
                    "sltr": {
                        "_name": "logged_featureset",
                        "featureset": "more_movie_features",
                        "params": {
                            "keywords": "rambo"
                        }
                }}
            ]
        }
    },
    "ext": {
        "ltr_log": {
            "log_specs": {
                "name": "log_entry1",
                "named_query": "logged_featureset"
            }
        }
    }
}
  1. The SLTR query is rewritten into a ranker query, which has a list of disjunct queries, each of which is rewritten from features (Query Phase).
  2. Use a named query (_name) to label all docs the SLTR query matched (MatchedQueriesPhase)
  3. Ranker query has a HitLogConsumer to log features (feature name, score as the value) by append DocumentField on each SearchHit (LoggingFetchSubPhase)
Screenshot 2023-11-29 at 9 30 18 AM Screenshot 2023-11-29 at 9 30 58 AM
        public void process(HitContext hitContext) throws IOException {
            if (scorer != null && scorer.iterator().advance(hitContext.docId()) == hitContext.docId()) {
                loggers.forEach((l) -> l.nextDoc(hitContext.hit()));
                // Scoring will trigger log collection
                scorer.score();
            }
        }
        void nextDoc(SearchHit hit) {
            DocumentField logs = hit.getFields().get(FIELD_NAME);
            if (logs == null) {
                logs = newLogField();
                hit.setDocumentField(FIELD_NAME, logs);
            }
            Map<String, List<Map<String, Object>>> entries = logs.getValue();
            rebuild();
            currentHit = hit;
            entries.put(name, currentLog);
        }

Logs in search response

"fields": {
          "_ltrlog": [
            {
              "log_entry1": [
                {
                  "name": "1",
                  "value": 0.25069216
                },
                {
                  "name": "2",
                  "value": 0.226041
                }
              ]
            }
          ]
        },

Search with models

Rescore Phase with SLTR Query: In the rescore phase, you apply the SLTR model to rerank the top documents returned by the query phase based on the features defined in your learning-to-rank model.

@noCharger noCharger moved this from 🆕 New to 👀 In review in Search Project Board Nov 15, 2023
@msfroh msfroh added documentation Improvements or additions to documentation and removed untriaged labels Nov 15, 2023
@msfroh
Copy link

msfroh commented Nov 15, 2023

Should this be part of a plugin README? Maybe contribute to the doc website once we launch?

@macohen
Copy link
Collaborator

macohen commented Nov 20, 2023

Generally, I see the "how to use it" documentation in the doc website. The how to build it/details of how it works should go into the repo as a README (great idea). The sequence diagram should be part of the README along with details about the code. Requests and responses should go into the docs. Also, even if we just have a self-install plugin, but it works we can add this to the documentation site.

@noCharger do you want to make an attempt at this separation when you get a chance? BTW, nice job on the diagram. Maybe good for a review in an upcoming public search relevance meeting.

cc: @epugh for any feedback...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
Status: 👀 In review
Development

No branches or pull requests

3 participants