Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grounding API: non-verifiable and/or subjective statements return true/false rather than ConflictingInformationError #1130

Open
theodore-evans opened this issue Jan 15, 2025 · 1 comment

Comments

@theodore-evans
Copy link

theodore-evans commented Jan 15, 2025

Making a query to the grounding API with some classic statements without truth values gives some inconsistent or unexpected results:

$ curl https://g.jina.ai/the%20current%20king%20of%20the%20usa%20is%20bald   -H "Accept: application/json"   -H "Authorization: Bearer " | jq
{
  "code": 200,
  "status": 20000,
  "data": {
    "factuality": 0.1,
    "result": false,
    "reason": "The statement claims that the current king of the USA is bald. However, the reference indicates that the current monarch is King Edward II, but does not provide any information about his physical appearance, including whether he is bald or not. Therefore, the statement cannot be verified based on the provided reference. Additionally, as of my last knowledge update, there is no actual king of the USA, making the statement misleading. Thus, the statement is rather incorrect due to the lack of factual support regarding the king's appearance.",
    "references": [
      {
        "url": "https://king-of-america.fandom.com/wiki/List_of_American_monarchs",
        "keyQuote": "The current monarch is King Edward II, who ascended the ...",
        "isSupportive": true
      }
    ],
    "usage": {
      "tokens": 9420
    }
  }
}
$ curl https://g.jina.ai/this%20statement%20is%20false   -H "Accept: application/json"   -H "Authorization: Bearer " | jq
{
  "data": null,
  "code": 422,
  "name": "ConflictingInformationError",
  "status": 42204,
  "message": "The statement 'this statement is false' is a classic example of the Liar Paradox, which creates a self-referential contradiction. If the statement is true, then it must be false, and if it is false, then it must be true. This paradox indicates that the statement does not have a definitive truth value, as it leads to a contradiction. While some references argue that it does not constitute a well-formed statement, the prevailing view in logic is that it exemplifies a paradox rather than being simply true or false. Therefore, it is more accurate to consider it as not verifiable.",
  "readableMessage": "ConflictingInformationError: The information found during the search is contradictory, with no clear consensus, preventing a definitive fact-check."
}

dubious source of ground truth in the first case notwithstanding, my feeling is that the grounding API could be more conservative in providing a truth value for statements of questionable verifiability, especially in the case where the non-verifiability of the statement is even mentioned in the reasoning!

the same applies for subjective statements, here are a few examples (references truncated):

$ curl https://g.jina.ai/salvador%20dali%20was%20the%20greatest%20surrealist%20painter   -H "Accept: application/json"   -H "Authorization: Bearer " | jq
{
  "code": 200,
  "status": 20000,
  "data": {
    "factuality": 0.9,
    "result": true,
    "reason": "The statement that Salvador Dalí was the greatest surrealist painter is supported by multiple references that highlight his significant contributions to the surrealist movement and his status as a master of surrealism. While 'greatest' is subjective, the consensus among art historians and critics is that Dalí is one of the most influential and recognized figures in surrealism. His technical skill, flamboyant personality, and diverse artistic output further reinforce his prominence in the art world. Therefore, the statement is considered rather correct.",
...
}

curl https://g.jina.ai/salvador%20dali%20was%20a%20good%20person   -H "Accept: application/json"   -H "Authorization: Bearer " | jq
{
  "code": 200,
  "status": 20000,
  "data": {
    "factuality": 0.2,
    "result": false,
    "reason": "The references indicate that Salvador Dalí had controversial aspects to his character, including support for the Spanish dictator Francisco Franco and problematic behavior towards women. These actions have led to a negative perception of him among some art historians and the public, suggesting that he may not be viewed as a 'good person' by many. The movement to 'cancel' him further highlights the scrutiny of his legacy. Therefore, the statement that he was a good person is not supported by the available evidence.",
...
}

$ curl https://g.jina.ai/barack%20obama%20was%20a%20good%20person   -H "Accept: application/json"   -H "Authorization: Bearer " | jq
{
  "code": 200,
  "status": 20000,
  "data": {
    "factuality": 0.8,
    "result": true,
    "reason": "The statement that Barack Obama was a good person is supported by multiple references highlighting his calm demeanor, thoughtful approach, and commitment to service. Public opinion polls indicate a favorable view of him, with a significant percentage of people approving of his leadership. Additionally, his international popularity and high approval ratings suggest that many perceive him positively. While opinions on political figures can vary, the overall evidence leans towards a favorable assessment of Obama's character, making the statement more likely to be correct.",
...
}

while I can't fault the reasoning given, if the response of the Grounding API is simply a representation of 'this is what most people would say', then it somewhat undermines trust in it as a source of truth--ie. with respect to statements that can meaningfully be said to have a truth value

@florian-hoenicke
Copy link
Member

@theodore-evans Thanks a lot for raising this issue.
You are making some very valid points.
As you stated correctly, the examples you show are opinionated.
Each answer itself seems to be fine because the explanation makes sense.
But the results after multiple runs are not consistent with each other anymore.
I think this is because for each request, there is a large set of possible correct answers.
The set even becomes larger when the request is more ambiguous.
Therefore the likelihood increases that an alternative answer is picked.

The word 'good' could mean that the person was qualified or that the person contributed to society, or that others liked that person and so on.
In every run the grounding picks a slightly different interpretation of that term leading to a bit of a different outcome.

It is like with LLM. You run it multiple times and always get a different output.

To really solve this issue, it would need to handle an internal state to give consistent answers across different requests. But this is tricky.
Maybe, we could store all requests and then have a RAG application on top to reuse what was generated before. Like a sophisticated cache. But this cache would become huge.
Do you have an idea how to overcome this bad user experience?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants