Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FYI: Language detector Lingua outperforming Optimaize #107

Open
pemistahl opened this issue Feb 8, 2020 · 4 comments
Open

FYI: Language detector Lingua outperforming Optimaize #107

pemistahl opened this issue Feb 8, 2020 · 4 comments

Comments

@pemistahl
Copy link

pemistahl commented Feb 8, 2020

Hello everyone who is reading this. I hope it is okay to open this issue as GitHub does not provide a better way of communication. If not, then feel free to delete this issue again.

I'm the developer of a competing language detection library called Lingua that clearly outperforms Optimaize's library. If you are looking for an accurate and regularly updated language detection library for the JVM that knows how to deal with both short and long text, then please give Lingua a try.

https://github.com/pemistahl/lingua

Please be assured that it is not my intention to offend this library's owner in any way. Some of you might say that I'm impudent to promote my own project here. However, as I come here and look at the newly created issues on a regular basis, I encounter that people do not seem to be aware that there are alternatives to Optimaize's library here on GitHub. They ask questions here whether this project is still maintained (it is not, obviously). Instead, they could simply use a proper one such as mine. Maybe you did not find it on Google or searched only for Java-based projects (mine has been implemented in Kotlin).

Just take a look at the comparison chart below. Optimaize's library is the worst among all language detection libraries that run on the JVM. If you want to spend your time on improving Optimaize's project, then it's perfectly fine. But if you are simply looking for good language detection, then please choose one of the alternatives. They exist. Thank you for your attention.

language detection comparison

@vvmar
Copy link

vvmar commented Feb 8, 2020

Thanks for the post, Peter. Is maven build planned in the near future?

@pemistahl
Copy link
Author

@vvmar No, I switched from Maven to Gradle because it is much more flexible. Why is Maven important for you? You can simply add my library as a dependency to your Maven-based projects if that is your actual question.

@vvmar
Copy link

vvmar commented Feb 8, 2020

Thanks. I will try it as a maven dep first.

@james-s-w-clark
Copy link

james-s-w-clark commented May 28, 2020

Did a handful of comparisons for CJK, and Lingua was much more accurate than Optimaize. I tried detection on some text in Optimaize issues to demonstrate:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants