Skip to content
This repository has been archived by the owner on Mar 8, 2023. It is now read-only.

MITADS - Transcript roman numbers #100

Open
Mte90 opened this issue Aug 28, 2020 · 4 comments
Open

MITADS - Transcript roman numbers #100

Mte90 opened this issue Aug 28, 2020 · 4 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@Mte90
Copy link
Member

Mte90 commented Aug 28, 2020

We have the issue that the text corpus include roman numbers but we need to convert those as usual numbers but also to spot fake positives and so on.

We need a way to detect roman numbers and not other text that include that letters.

@Mte90 Mte90 added enhancement New feature or request good first issue Good for newcomers hacktoberfest labels Aug 28, 2020
@ilyasmg
Copy link
Collaborator

ilyasmg commented Oct 1, 2020

I see that there's a roman_numbers.py script. What's the problem? It's not accurate enough?

@Mte90
Copy link
Member Author

Mte90 commented Oct 1, 2020

It isn't perfect we had various fake positive with that.

@eziolotta
Copy link
Contributor

eziolotta commented Dec 6, 2020

Which importers do you have more sentences with Roman numbers?
About Ted Importer there is an issue refer roman number.
In function maybe_normalize (ted_importer.py) parameter roman_normalization is False
so function do_roman_normalization is not performed ( see utils.roman_numbers)

@Mte90
Copy link
Member Author

Mte90 commented Dec 6, 2020

We removed in ted that normalization because had a lot of fake positives

@nefastosaturo nefastosaturo changed the title Transcript roman numbers for Mitads MITADS - Transcript roman numbers Dec 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

4 participants