Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use AlternateLookup with ReadOnlySpan<char> #3

Merged
merged 1 commit into from
Nov 20, 2024

Conversation

russcam
Copy link
Collaborator

@russcam russcam commented Nov 20, 2024

Lingua performs lots of lookups into Dictionary<string, double> to get n-gram probabilities. In net9.0, use the new .GetAlternateLookup<T>() API to lookup using a ReadOnlySpan<char>.

Benchmarking shows a nice perf gain

Before

BenchmarkDotNet v0.14.0, Windows 11 (10.0.22631.4391/23H2/2023Update/SunValley3)
13th Gen Intel Core i9-13900K, 1 CPU, 32 logical and 24 physical cores
.NET SDK 9.0.100
  [Host]   : .NET 9.0.0 (9.0.24.52809), X64 RyuJIT AVX2
  ShortRun : .NET 9.0.0 (9.0.24.52809), X64 RyuJIT AVX2

Job=ShortRun  IterationCount=3  LaunchCount=1
WarmupCount=3
Method Text Mean Error StdDev Ratio RatioSD Gen0 Gen1 Gen2 Allocated Alloc Ratio
LinguaLowAccuracy On n(...)own. [125] 79.22 us 16.115 us 0.883 us 1.00 0.01 7.3242 - - 136.42 KB 1.00
Lingua On n(...)own. [125] 77.41 us 2.903 us 0.159 us 0.98 0.01 7.3242 - - 136.42 KB 1.00
LanguageDetection On n(...)own. [125] 108.66 us 40.258 us 2.207 us 1.37 0.03 2.6855 2.3193 1.3428 260.51 KB 1.91
NTextCat On n(...)own. [125] 170.72 us 7.196 us 0.394 us 2.16 0.02 6.1035 0.7324 - 116.13 KB 0.85

After

BenchmarkDotNet v0.14.0, Windows 11 (10.0.22631.4391/23H2/2023Update/SunValley3)
13th Gen Intel Core i9-13900K, 1 CPU, 32 logical and 24 physical cores
.NET SDK 9.0.100
  [Host]   : .NET 9.0.0 (9.0.24.52809), X64 RyuJIT AVX2
  ShortRun : .NET 9.0.0 (9.0.24.52809), X64 RyuJIT AVX2

Job=ShortRun  IterationCount=3  LaunchCount=1
WarmupCount=3
Method Text Mean Error StdDev Ratio RatioSD Gen0 Gen1 Gen2 Allocated Alloc Ratio
LinguaLowAccuracy On n(...)own. [125] 67.25 us 9.423 us 0.516 us 1.00 0.01 6.2256 - - 115.33 KB 1.00
Lingua On n(...)own. [125] 66.32 us 3.677 us 0.202 us 0.99 0.01 6.2256 - - 115.33 KB 1.00
LanguageDetection On n(...)own. [125] 112.16 us 22.231 us 1.219 us 1.67 0.02 2.6855 2.3193 1.3428 260.48 KB 2.26
NTextCat On n(...)own. [125] 172.06 us 23.023 us 1.262 us 2.56 0.02 6.1035 0.7324 - 116.13 KB 1.01

@russcam russcam merged commit 938d74d into main Nov 20, 2024
1 check passed
@russcam russcam deleted the use-alternatelookup-span branch November 20, 2024 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant