Skip to content

Latest commit

 

History

History
444 lines (333 loc) · 15.3 KB

README.md

File metadata and controls

444 lines (333 loc) · 15.3 KB

Build Status npm version

AngularJS Search Service

Splainer Search is an Angular Solr, OpenSearch and Elasticsearch search library focussed on relevance diagnostics with some experimental support for other search engines, starting with Vectara. It's used in the relevancy tuning tools Quepid and Splainer. It is available for anyone to use (see license).

Splainer search utilizes a JSONP wrapper for communication with Solr. Elasticsearch, OpenSearch, and Vectara communication happens with simple HTTP and JSON via CORS. All fields are explained and highlighted if requested. A friendly interface is provided to specify the arguments in terms of a Javascript object. See below for basic examples.

Basic usage

Solr

Splainer-search will perform the specified search against Solr attempting to highlight and extract explain info. To request highlighting on a specific field, prefix the fieldname with "hl:" i.e: hl:overview.

// searcher that searches id, title, body, author
var searcher = searchSvc.createSearcher(
  ['id', 'title', 'hl:body', 'author'],
  'http://localhost:8983/solr/select',
  {
    'q': ['*:*'],
    'fq': ['title:Moby*', 'author:Herman']
  }
);

searcher.search()
.then(function() {
  angular.forEach(searcher.docs, function(doc) {
    console.log(doc.source().title);
    // highlights. You need to pass id as that's how Solr
    // organizes the explain. See below for a friendlier/higher-level
    // interface with normalDocs
    console.log(doc.highlight(doc.source().id, 'title', '<b>', '</b>');
    // explain info
    console.log(doc.explain(doc.source().id);
  });
});

Elasticsearch and OpenSearch

Note: For now, we have a set of es*.js files that support both search engines

Splainer-search supports these search engines, using the same API, and passing the query DSL in the same way ES expects it:

var searcher = searchSvc.createSearcher(
  ['id:_id', 'title', 'body', 'author'],
  'http://localhost:9200/books/_search',
  {
    'query': {
      'match': {
        'title': '#$query##'
      }
    }
  }
);

Vectara

Splainer-search has experimental support for Vectara. You can send queries in the Vectara format but must also pass in the authorization headers as custom headers, e.g.

var searcher = searchSvc.createSearcher(
  ['id:_id', 'title', 'body', 'author'],
  'https://api.vectara.io:443/v1/query',
  {
    "query": [
      {
        "query": "#$query##",
        "numResults": 10,
        "corpusKey": [
          {
            "customerId": 123456789,
            "corpusId": 1
          }
        ]
      }
    ]
  }, 
  {
    'customHeaders': {
      "customer-id": "123456789",
      "x-api-key": "api_key"
    }
  },
  'vectara'
);

Please note that the Vectara integration currently does not support explain or other advanced Splainer-search functionality.

Custom Search API

Splainer-search has experimental support for Custom APIs. You can send in queries as GET or POST and your API must respond with JSON formatted response.

The magic of the Custom Search API is that you provide some mapping JavaScript code to convert from the JSON format of your API to the native structures that splainer-search uses. Imagine your response looks like:

[
    {
        "publication_id": "12345678",
        "publish_date_int": "20230601",
        "score": 0.5590707659721375,
        "title": "INFOGRAPHIC: Automakers' transition to EVs speeds up"
    },
    {
        "publication_id": "1234567",
        "publish_date_int": "20230608",
        "score": 0.5500463247299194,
        "title": "Tesla - March 2023 (LTM): Peer Snapshot"
    }
];

Then you would define two custom mappers, where data is your JSON:

var options = { apiMethod: 'GET' };
options.numberOfResultsMapper = function(data){
  return data.length;
}
options.docsMapper = function(data){    
  let docs = [];
  for (let doc of data) {
    docs.push ({
      id: doc.publication_id,
      publish_date_int: doc.publish_date_int,
      title: doc.title,
    })
  }
  return docs
}

Pass those options in as your normally would:

var searcher = searchSvc.createSearcher(
  ['id:id', 'title', 'publish_data_int'], 'http://mycompany.com/api/search',
  'query=tesla', options, 'searchapi'
);

Paging

Paging is done by asking the original searcher for another searcher. This searcher is already setup to get the next page for the current search results. Tell that searcher to search() just like you did above.

var results = [];
searcher.search()
.then(function() {
  angular.forEach(searcher.docs, function(doc) {
    results.push(doc.source().title));
  });
  // once results returned, get a new searcher for the next
  // page of results, just rerun the search later exactly as
  // its run here
  searcher = searcher.pager();
});

// sometime later we page...
searcher.search()
.then(function() {

});

Explain Other

Let's say you have performed a search for tacos and you get a bunch of results, but the chef comes back to you and says:

Hey! My new creation "La Bomba" is not showing up, fix it!!!!

So you are puzzled as to why it is not showing up, since it's clearly marked as a taco in the db. Wouldn't it be nice if splainer-search gave you some help?

Don't worry, we've got your back :)

Solr

So assuming you already have something like this:

var options = {
  fields:       ['id', 'title', 'price'],
  url:          'http://localhost:8983/solr/select',
  args:         { 'q': ['#$query##'] },
  query:        'tacos',
  config:       {},
  searchEngine: 'solr'
};
var searcher = searchSvc.createSearcher(options.fields, options.url, options.args, options.query, options.config, options.searchEngine);

searcher.search();

You would want to create a new searcher with the same options/context, and use the explainOther() function:

var fieldSpec       = fieldSpecSvc.createFieldSpec(options.fields);
var explainSearcher = searchSvc.createSearcher(options.fields, options.url, options.args, options.query, options.config, options.searchEngine); # same options as above

 # assuming that we know "El Bomba" has id 63148
explainSearcher.explainOther('id:63148', fieldSpec);

The explainOther() function returns the same promise as the search() function so you can you retrieve the results in the same way.

Elasticsearch

In ES, the explainOther() function behaves the same way, except that it does not need a fieldSpec param to be passed in.

var options = {
  fields:       ['id', 'title', 'price'],
  url:          'http://localhost:9200/tacos/_search',
  args:         {
    'query': {
      'match': {
        'title': '#$query##'
      }
    }
  },
  query:        'tacos',
  config:       {},
  searchEngine: 'es'
};
var searcher = searchSvc.createSearcher(options.fields, options.url, options.args, options.query, options.config, options.searchEngine);

searcher.search();

var explainSearcher = searchSvc.createSearcher(options.fields, options.url, options.args, options.query, options.config, options.searchEngine); # same options as above

 # assuming that we know "El Bomba" has id 63148
explainSearcher.explainOther('id:63148');

The explainOther() function returns the same promise as the search() function so you can you retrieve the results in the same way.

Normalizing docs with normalDocs/fieldSpec

This library was originally written for dealing with debug tools such as Quepid and Splainer. As such, it provides a lot of help taking a user specified list of fields and associated roles, then once search is done turning the raw docs out of the Solr searcher into something more normalized based on that config (a normalDoc).

The normalDoc provides a friendlier, more standard interface. This includes friendlier parsing of explain information as needed.

var userFieldSpec = "id:uuid, title, body, authors"
var fs = fieldSpecSvc.createFieldSpec(userFieldSpec)
var searcher = searchSvc.createSearcher(
  fs.fieldList(),
  'http://localhost:8983/solr/select',
  {
    'q': ['*:*'],
    'fq': ['title:Moby*', 'authors:Herman']
  }
);

searcher.search()
.then(function() {
  var  bestScore = 0;
  angular.forEach(searcher.docs, function(doc) {
    var normalDoc = normalDocSvc.createNormalDoc(fs, doc);
    // access unique id and title
    // (above specified to be uuid and title)
    console.log("ID is:" + normalDoc.id);
    console.log("Title is:" + normalDoc.id);

    // snippets -- best try to highlight the field
    angular.forEach(normalDoc.subSnippets, function(snippet, fieldName) {
      console.log('hopefully this is a highlight! ' + snippet);
    });

    // prettier and heavily sanitized explain info:
    // (the explain modal on Splainer shows this)
    console.log(normalDoc.explain());

    // hot matches contains the most important matches
    // this drives the horizontal graph bars in Quepid/Splainer
    var matches = normalDoc.hotMatches();

    // Give hotMatchesOutOf a maximum score (for all docs returned) and you'll
    // get the hot matches as a percentage of thewhole
    if (normalDoc.score() > bestScore) {
      bestScore = normalDoc.score();
    }
    var normalDoc.matchesOutOf(bestScore);

    // a link to the document in Solr is handy:
    console.log(normalDoc._url())
  })
});

Specifying search engine version number

Most of what splainer-search does should be compatible with all versions of Solr and Elasticsearch. There are times though where one of these projects introducing a breaking change and it becomes necessary to specify the version number used.

For example, ES deprecated the fields parameter in favor of stored_fields (https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking_50_search_changes.html#_literal_fields_literal_parameter). So it's necessary to tell splainer-search which version you are using in order to send the appropriate request.

To do so you only need to specify the version number in the config param when constructing a new searcher:

Elasticsearch

var options = {
  fields:       ['id', 'title', 'price'],
  url:          'http://localhost:9200/tacos/_search',
  args:         {
    'query': {
      'match': {
        'title': '#$query##'
      }
    }
  },
  query:        'tacos',
  config:       { version: 5.1 },
  searchEngine: 'es'
};
var searcher = searchSvc.createSearcher(options.fields, options.url, options.args, options.query, options.config, options.searchEngine);

searcher.search();

And splainer-search will take care of using the correct name in the parameters.

NB: The default behavior will be that of 5.x, so if you are on that version you do not need to do anything, whereas if you are on a previous version number you should provide the version number.

Highlighting of results

If the individual search result field is a string then it is automatically highlighted.

However, if the selected value is an array or a JSON object, it doesn't coerce it to a string (and as a result doesn't highlight it, either).

Secondly, if any component in the selected path results in array, the rest of the path is spread over the array value. To explain:

Data: { "variants": [ { "name": "red" }, { "name": "blue" } ] }
Path (or _field name_): "variants.name"
Result: [ "red", "blue" ]

Understanding query parameters

Sometimes we want to understand what queries are being sent to the search engine, and it can be a bit opaque if we are going through a API or if we have parameters being appended inside the search engine (think Paramsets in Solr or templates in ES).

Consule the searcher.parsedQueryDetails property to get a search engine specific JSON data structure.

Solr

For Solr we check if the responseHeader.params array exists, and return that. Send echoParams=all to Solr to trigger this behavior.

Elasticsearch

There doesn't appear to be an equivalent feature.

Understanding query input parsing

Frequently we want to understand what the search engine is doing to the raw query input.
Consult the searcher.parsedQueryDetails property to get a search engine specific JSON data structure.

Solr

For Solr we filter through all the keys in the debug section of the output, filtering out the keys var keysToIgnore = ['track', 'timing', 'explain', 'explainOther'];. Everything else is added to the searcher.parsedQueryDetails property.

Elasticsearch

In ES we default profile=true property, and nest everything under the profile key in the response is to the searcher.parsedQueryDetails property.

Querqy Rules Library Support

Querqy is a query rewriting library. It helps you to tune your search results for specific search terms. Understanding what Querqy is doing to your queries is critical to achieving great search results.

Solr

The searcher.parsedQueryDetails property surfaces all the debugging information about what rewriting Querqy is doing to the input query. Assuming you are also requesting the details on what rules are being matched via the querqy.infoLogging=on query parameter, then you will also see that information in the searcher.parsedQueryDetails structure.

Development Notes

Splainer-search is written using AngularJS project. It requires npm and grunt:

npm install -g grunt-cli

To run the tests:

npm install
npm test

Tip: add an f in front of any describe or it in your unit tests to run just that unit test.

We need to build a splainer-search.js file as part of the build.

npm run-script build

Release Process

We use NP to publish splainer-search to npmjs.org.

  1. You need to update the CHANGELOG.md with your new version and the date, but you don't need to touch package.json, the np script bumps that file! Check that file in.

  2. Now install the 'np' script if you don't have it, and run it to create the release:

npm install --global np
np --no-2fa
  1. This will also pop open a browser window on GitHub to create a new release for the project. Use the "Generate Release Notes" button on Github to make the template, and then paste in the contents of CHANGELOG.md into the Whats Changed section.

Thanks to...

Development for this library is done primarily by OpenSource Connections for search relevance tools Splainer and Quepid

Original author is Doug Turnbull