-
Notifications
You must be signed in to change notification settings - Fork 27
Support more index types besides ivfflat #224
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have some tests for embedding? Is there any blockers for testing it on CI?
I've proposed PR #225 for installing pgvector on CI. After merging that PR, we can add test cases for embedding. |
import sentence_transformers # type: ignore reportMissingImports | ||
|
||
model = sentence_transformers.SentenceTransformer(model_name) # type: ignore reportUnknownVariableType | ||
embedding_dimension: int = model[1].word_embedding_dimension # From models.Pooling | ||
except: | ||
raise NotImplementedError( | ||
"Model '{model_name}' doesn't provide embedding dimension information" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe import error here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think import error is for modules, not models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense
) | ||
if method is not None: | ||
assert method == "hnsw" or method == "ivfflat" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since there may be more method , assert method in ["ivfflat", "hnsw"]
may be bettor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! That's easier to read and write indeed. Changed.
A new test case for embeddings is added in To support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Previously, we only support indexing embeddings using the
ivfflat
access method in pgvector.
Recently, a new access method
hnsw
has been added to pgvector.hnsw
is believed to be more performant and accurate thanivfflat
.To allow for more flexibility, we add a new parameter
method
toallow user to choose which access method to use when creating index.
Also, a new parameter
embedding_dimension
is added to support moremodels, since dimension is required for pgvector to create index.
A new test case for embeddings is added in
tests/test_embedding.py
.To support
set allow_system_table_mods = on;
, Postgres is upgradedfrom 12 to 13 on CI.