Minor typing improvements #953

MartinoMensio · 2024-08-14T13:58:05Z

This is a small pull request related to some types that are causing my IDE to show some red lines:

1: deepeval.evaluate:

Argument of type "list[LLMTestCase]" cannot be assigned to parameter "test_cases" of type "List[LLMTestCase | ConversationalTestCase]" in function "evaluate"
  "list[LLMTestCase]" is incompatible with "List[LLMTestCase | ConversationalTestCase]"
    Type parameter "_T@list" is invariant, but "LLMTestCase" is not the same as "LLMTestCase | ConversationalTestCase"
    Consider switching from "list" to "Sequence" which is covariant

2: deepeval.scorer.scorer.bert_score: it returns dict of numpy arrays

Argument of type "list[LLMTestCase]" cannot be assigned to parameter "test_cases" of type "List[LLMTestCase | ConversationalTestCase]" in function "evaluate" "list[LLMTestCase]" is incompatible with "List[LLMTestCase | ConversationalTestCase]" Type parameter "_T@list" is invariant, but "LLMTestCase" is not the same as "LLMTestCase | ConversationalTestCase" Consider switching from "list" to "Sequence" which is covariant

vercel · 2024-08-14T13:58:09Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
evals-docs	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Aug 14, 2024 1:58pm

penguine-ip · 2024-08-14T18:24:51Z

@MartinoMensio Thanks for the PR - when are you encountering this error? I've seen this error before so curious how I can reproduce it?

MartinoMensio · 2024-08-15T07:00:30Z

Hi @penguine-ip, thanks for your message.
I am attaching an example where you can see both issues (the script runs fine, only the IDE for typing is showing the problem).

from deepeval import evaluate
from deepeval.metrics import AnswerRelevancyMetric, BaseMetric
from deepeval.scorer import Scorer
from deepeval.test_case import LLMTestCase


class BertMetric(BaseMetric):
    def __init__(self, threshold: float = 0.5):
        self.threshold = threshold

    def measure(self, test_case: LLMTestCase):
        assert test_case.expected_output is not None, "Expected output is None"
        result = Scorer.bert_score(
            predictions=test_case.actual_output,
            references=test_case.expected_output,
        )
        self.score = result["bert-f1"][0]
        # line above shows: "__getitem__" method not defined on type "float"

        assert isinstance(
            self.score, (float, int)
        ), f"Score is not a float or int: {self.score} {type(self.score)}"
        self.success = self.score >= self.threshold
        return self.score

    async def a_measure(self, test_case: LLMTestCase):
        return self.measure(test_case)

    def is_successful(self):
        return self.success

    @property
    def __name__(self):
        return "BERT Metric"


test_case_data = [
    LLMTestCase(
        input="What if these shoes don't fit?",
        actual_output="We offer a 30-day full refund at no extra costs.",
        retrieval_context=[
            "All customers are eligible for a 30 day full refund at no extra costs."
        ],
        context=["A customer is asking about the return policy for shoes."],
    ),
]

metrics = [
    AnswerRelevancyMetric(),
    BertMetric(),
]

results = evaluate(test_case_data, metrics)
# line above shows:
# Argument of type "list[LLMTestCase]" cannot be assigned to parameter "test_cases" of type "List[LLMTestCase | ConversationalTestCase]" in function "evaluate"
#   "list[LLMTestCase]" is incompatible with "List[LLMTestCase | ConversationalTestCase]"
#     Type parameter "_T@list" is invariant, but "LLMTestCase" is not the same as "LLMTestCase | ConversationalTestCase"
#     Consider switching from "list" to "Sequence" which is covariant
res = results[0]
for metric in res.metrics_data:
    print(metric.name, metric.success, metric.score, metric.reason)

I'm using VS code with the pylance extension enabled, not sure if other IDE require some other extensions to show the error.
The script runs fine, it's only a typing detail.

MartinoMensio added 2 commits August 14, 2024 15:43

bert scorer wrong type annotated

40a1b81

vercel bot deployed to Preview August 14, 2024 13:58 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minor typing improvements #953

Minor typing improvements #953

MartinoMensio commented Aug 14, 2024

vercel bot commented Aug 14, 2024 •

edited

Loading

penguine-ip commented Aug 14, 2024

MartinoMensio commented Aug 15, 2024

Minor typing improvements #953

Are you sure you want to change the base?

Minor typing improvements #953

Conversation

MartinoMensio commented Aug 14, 2024

vercel bot commented Aug 14, 2024 • edited Loading

penguine-ip commented Aug 14, 2024

MartinoMensio commented Aug 15, 2024

vercel bot commented Aug 14, 2024 •

edited

Loading