-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Searching the documentation with filters #131
base: master
Are you sure you want to change the base?
Conversation
This way pylint will notice if you accidentally use the wrong key. I'm not sure if this is officially "pythonic". It seems like more could be done translating the json to actual data structures with named fields; then these lookup tables wouldn't be necessary! But this might be a lot of work for little gain.
use class instead of dict for structure field names
UPDATE: Fixes pushed, please pull to see them.
The script that has been used for almost a year and is running in the background for the search to function in this PR too is being ran by a SharedWorker which is not compatible with most browsers on a phone and Safari and IE on a desktop. This is not ideal, I understand, but will be improved in the future as it is not part of the scope at this time for this PR. |
#deploy |
This PR has been successfully deployed at http://leanprover-community.github.io/mathlib_docs_demo! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's super cool :)
mathlib_data_structures.py
Outdated
@@ -0,0 +1,53 @@ | |||
class mathlibStructures: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, that would make sense to make those inherit from enum.Enum
also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did try using Enum at first, yes, but there was some issue - not sure what it was since I haven't focused on that in a while. Actually none of the classes here are used at the moment, this is going to be included in another PR which focuses on organizational structure of the python script itself, so that it's easier to read and update. I should probably remove this file from this PR and will look into using Enums again for the next one where those will be needed.
} | ||
} | ||
|
||
return hasKindFilter && hasAttrFilter ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to split these 3 ternary expressions into 3-ifs or early return?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ternary operator by itself is faster than if/else if no additional computation is involved during its execution, but what's more important I think is that this is more readable and easier to change or extend to more types of filters in the long run. I can return the result before getting here, of course - where the check for hasKindFilter is, for example, just return if there is no hasAttrFilter. Also return directly when hasAttrFilter gives a value to isAttrFilterIncluded, if there is no hasKindFilter. But this kind of structuring mixes two different actions: checking if the result matches any of the filters, and deciding whether this is sufficient. To be more exact, if we decide to change the logic from "you need to cover both the attribute filters and the kind filters, if both are present" to "you can cover either filter selected, as long as at least one of any of them matches your attributes or kind" we will have to change the return statement here only. Otherwise we will have to follow through the whole logic in detail and decide whether to move each return statement and how are they connected at all for the purpose of the method itself. I'd leave it like it is, to be honest
#deploy |
This PR has been successfully deployed at http://leanprover-community.github.io/mathlib_docs_demo! |
#deploy |
This PR has been successfully deployed at http://leanprover-community.github.io/mathlib_docs_demo! |
searchWorker.js
Outdated
// Adapted from the default tokenizer and | ||
// https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BZ%7D&abb=on&c=on&esc=on&g=&i= | ||
const SEPARATOR = /[._\n\r \u00A0\u1680\u2000-\u200A\u2028\u2029\u202F\u205F\u3000]+/u | ||
req.open('GET', 'searchable_data.json', false /* blocking */); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is 7-8x larger than the decl.bmp
we currently load- I worry this will substantially increase load time for every search.
One other thing that would be great to have before this goes live is an easily accessible / easy-to-find page describing all the search features. Maybe an extra section on the mostly-blank index page https://leanprover-community.github.io/mathlib_docs_demo/ will do? |
I think the only blocking thing here is the size of the json file as Eric points out. Was the reason for using a bmp before compression or caching? If we zip the json file when we generate it and unzip locally, I guess browsers won't cache the unzipped version, right? For @bryangingechen 's suggestion, I think that's a good place to describe the search features and we could probably use some text from this PR. |
The reason for the |
#deploy |
This PR has been successfully deployed at http://leanprover-community.github.io/mathlib_docs_demo! |
The bmp is now 1660854 bytes, and loads in half a second on my machine - I think it was 15mB before compression before, so the file extension change definitely helps. However, the call to |
I've created polibb#4 to briefly document the search. I agree with Eric that there's still a slowdown, but it's not near 28 seconds for me. To me the new performance is acceptable, and I'm okay merging, what do others think? |
#deploy |
This PR has been successfully deployed at http://leanprover-community.github.io/mathlib_docs_demo! |
Hmm, the 404 page needs to be updated as well: |
#deploy |
This PR has been successfully deployed at http://leanprover-community.github.io/mathlib_docs_demo! |
#deploy |
This PR has been successfully deployed at http://leanprover-community.github.io/mathlib_docs_demo! |
Searching the mathlib library contents in the documentation and filtering of the resulting set of declarations.
One is now able to:
More information: