Replies: 2 comments 2 replies
-
Interesting! I don't have an immediate answer for this. The current way the index is organized does segment indexed words alphabetically, which means we can't look partway through a word without loading all data. If the problem was only words at the end, we could possibly write a second set of indexes alphabetized from the rear and match backwards, in this case looking for (e.g.) "words ending in erät". But I fear that would only kick the can down the road, as it doesn't provide any pathway to match the middle word from a triple compound, and it also doesn't support stemming in the way searching from the start of a word does. The best solution would be to find a way to tokenize these words at indexing time, similar to the support for Japanese/Chinese languages that aren't delimited by whitespace characters. I haven't yet found a good library for handling this in German, and I think it would require a full dictionary to do so, but that isn't a massive hurdle at indexing time (if such a library does exist). |
Beta Was this translation helpful? Give feedback.
-
Openend issue #738 for the problem, suggesting the follwing: There seems to be no easy solution to that problem. But a first step would be to add support for soft hyphen characters. Pagefind should treat the soft hyphen as a word boundary. This would enable the generators of the static html to include this hints for pagefind in the page. |
Beta Was this translation helpful? Give feedback.
-
Hi,
is there a way to also find the occurrence of a search term at the end of a longer word?
In German words are often a composition of simpler words. For example „Hochspannungsnetzgerät“ is a composition of „Hochspannung“ (High voltage) and „Netzgerät“ (power supply). If I search for „Netzgerät“ with pagefind it currently does not find „Hochspannungsnetzgerät“ though the term is included and semantically it most certainly is a kind of „Netzgerät“, so the user would expect to find it.
This may be impossible with pagefind because of the way the index is organized? This would be a severe restriction for using with German pages.
Beta Was this translation helpful? Give feedback.
All reactions