You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks so much for pagefind. I'm implementing it for the first time. The site I'm implementing it on has the typical marketing type pages, but it also has documentation pages for multiple versions of a product. I want pagefind to exclude all but the most recent version of the product from being indexed.
Ideally, pagefind could read robots.txt and utilize the allow and disallow patterns in there. But that's essentially what I'm looking for. If I search for a term, I want the results from the most recent version of the product to be returned, not legacy or unsupported versions.
The only way I've found to do this is some trickery with the build process which is less than ideal.
The text was updated successfully, but these errors were encountered:
Pagefind's globs configure what files are ingested. Globs can contain multiple patterns, e.g:
# in pagefind.ymlglob: "{pages/**/*.html,about/**/*.html,/docs/latest/**/*.html}"
data-pagefind-body
If any pages on your site have this attribute, Pagefind will only index pages which have it configured. In this case, assuming the site is built via some static site generator, my go-to would be to have the data-pagefind-body attribute added to the template for all of the marketing pages, and for the latest version of the documentation pages. I'd give the older versions of the documentation a different layout/template that didn't include this attribute, which would make Pagefind omit their content.
For anything super custom, either the Python API or the NodeJS API can be a good avenue. This lets you do more custom logic via those programming languages when building out your index.
Ideally, pagefind could read robots.txt and utilize the allow and disallow patterns in there
I do like this idea also! It would have to be opt-in as I have (and have seen) many use-cases where content is disallowed from external search engines, but in-scope for Pagefind. It does seem like a great setting to have available though.
It's not something I have time to jump on at the moment, so hopefully one of the existing approaches will be able to solve your use-case for now. Let me know how you get on :)
Thanks so much for pagefind. I'm implementing it for the first time. The site I'm implementing it on has the typical marketing type pages, but it also has documentation pages for multiple versions of a product. I want pagefind to exclude all but the most recent version of the product from being indexed.
Ideally, pagefind could read robots.txt and utilize the allow and disallow patterns in there. But that's essentially what I'm looking for. If I search for a term, I want the results from the most recent version of the product to be returned, not legacy or unsupported versions.
The only way I've found to do this is some trickery with the build process which is less than ideal.
The text was updated successfully, but these errors were encountered: