Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

支持使用 MongoDB Atlas 作为向量数据库 #345

Merged
merged 7 commits into from
Dec 31, 2024

Conversation

Hoshino-Yumetsuki
Copy link
Contributor

@Hoshino-Yumetsuki Hoshino-Yumetsuki commented Dec 31, 2024

Summary by CodeRabbit

  • New Features

    • Added MongoDB support for vector store configuration.
    • Expanded vector store options to include MongoDB integration.
    • Implemented dynamic MongoDB client importing and vector store registration.
  • Documentation

    • Updated configuration instructions to include MongoDB setup guidelines.
    • Added new entries for MongoDB settings in English and Chinese localization files.
  • Improvements

    • Enhanced configuration flexibility for vector store management.
    • Added support for MongoDB Atlas vector search capabilities.

Hoshino-Yumetsuki and others added 5 commits December 29, 2024 00:02
…t segmentation

- Updated the dependency from `@node-rs/jieba` to `jieba-wasm` in package.json.
- Refactored the text segmentation logic in `similarity.ts` to utilize the new `cut` function from `jieba-wasm`, enhancing compatibility and performance.
…ity.ts

- Improved the BM25 similarity calculation by introducing term frequency maps for both documents.
- Added a smoothing factor and adjusted the scoring formula to normalize against the theoretical maximum score.
- Enhanced code readability and maintainability by restructuring the logic for term frequency and IDF calculations.
Copy link

coderabbitai bot commented Dec 31, 2024

Caution

Review failed

The pull request is closed.

Walkthrough

The pull request introduces MongoDB support for the vector store service. The changes extend the configuration interface and schema to include MongoDB-specific parameters like connection URL, database name, and collection name. A new module mongodb.ts is added to handle MongoDB vector store integration, providing functionality to dynamically import the MongoDB client, create vector stores, and manage document operations like adding, deleting, and saving embeddings. Additionally, localization files are updated to include MongoDB settings in both English and Chinese.

Changes

File Change Summary
packages/vector-store-service/src/index.ts - Added mongodbUrl, mongodbDbName, mongodbCollectionName to Config interface
- Updated schema to include MongoDB configuration
- Added MongoDB as a constant vector store option
packages/vector-store-service/src/vectorstore/mongodb.ts - Created new module for MongoDB vector store integration
- Implemented apply function to set up MongoDB vector store
- Added importMongoDB function for dynamic client import
- Defined wrapper store with document management functions
packages/vector-store-service/src/locales/en-US.schema.yml - Added MongoDB settings: mongodbUrl, mongodbDbName, mongodbCollectionName
packages/vector-store-service/src/locales/zh-CN.schema.yml - Added MongoDB settings: mongodbUrl, mongodbDbName, mongodbCollectionName

Sequence Diagram

sequenceDiagram
    participant App as Application
    participant VectorStore as Vector Store Service
    participant MongoDB as MongoDB Client
    
    App->>VectorStore: Configure with MongoDB
    VectorStore->>VectorStore: Import MongoDB Client
    VectorStore->>MongoDB: Connect to Database
    MongoDB-->>VectorStore: Connection Established
    VectorStore->>VectorStore: Create Vector Store
    App->>VectorStore: Add/Delete/Save Documents
    VectorStore->>MongoDB: Perform Document Operations
Loading

Poem

🐰 MongoDB hops into our code today,
Vector stores dancing in a new array,
Embeddings stored with rabbit-like grace,
Connecting data at lightning pace!
A vector adventure begins to play 🚀


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9d5cfe2 and b23e80f.

📒 Files selected for processing (2)
  • packages/vector-store-service/src/locales/en-US.schema.yml (1 hunks)
  • packages/vector-store-service/src/locales/zh-CN.schema.yml (1 hunks)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (7)
packages/vector-store-service/src/vectorstore/lunavdb.ts (1)

67-76: Helpful extension for advanced vector store operations.

By defining the ExtendedVectorStoreInput interface, you enable custom functionalities (similaritySearchVectorWithScoreFunction() and freeFunction()) while still leveraging the common structure from ChatLunaSaveableVectorStoreInput. This design keeps the base interface flexible and open for specialized use cases or cleanup logic.

It can be beneficial to add inline comments or docstrings for each optional method to clarify their intended usage for future maintainers.

+/**
+ * An interface that extends ChatLunaSaveableVectorStoreInput
+ * to incorporate advanced vector store operations.
+ */
 interface ExtendedVectorStoreInput<T extends VectorStore>
     extends ChatLunaSaveableVectorStoreInput<T> {
   similaritySearchVectorWithScoreFunction?: (
       store: T,
       query: number[],
       k?: number,
       filter?: object
   ) => Promise<[Document, number][]>

   freeFunction?: () => Promise<void>
 }
packages/vector-store-service/src/vectorstore/faiss.ts (1)

143-143: Avoid type assertions by declaring the type explicitly.

Rather than using as ExtendedVectorStoreInput<FaissStore>, consider assigning the type to the wrapperStore variable directly. This prevents potential unsafe assumptions.

-} as ExtendedVectorStoreInput<FaissStore>
+}) as const;

+const wrapperStore: ExtendedVectorStoreInput<FaissStore> = new ChatLunaSaveableVectorStore<FaissStore>(
+  faissStore,
+  {
+    /* ... */
+  }
+);
packages/vector-store-service/src/vectorstore/mongodb.ts (1)

9-9: Consider using a function-scoped logger instead of a global variable

By declaring logger in the global scope, there is a slight risk of unintended shared state or conflicts with other modules. Defining the logger within the apply function enhances clarity and may help avoid scoping issues.

packages/vector-store-service/src/vectorstore/milvus.ts (2)

15-23: Extended interface introduces consistent advanced similarity search

ExtendedVectorStoreInput unifies advanced search functionality across multiple backends. This helps keep your codebase consistent and modular.


214-217: Tweak parentheses for code clarity

The parentheses around (filter ?? '') are unnecessary. Consider updating for stylistic clarity:

- const filterStr = typeof filter === 'object' ? JSON.stringify(filter) : (filter ?? '')
+ const filterStr = typeof filter === 'object' ? JSON.stringify(filter) : filter ?? ''
🧰 Tools
🪛 GitHub Check: CodeFactor

[warning] 217-217: packages/vector-store-service/src/vectorstore/milvus.ts#L217
Replace (filter·??·'') with filter·??·'' (prettier/prettier)

packages/vector-store-service/src/index.ts (1)

85-85: Usage documentation includes MongoDB reference

Linking to the relevant documentation is helpful. Consider adding a short, direct example of how to configure MongoDB for an even smoother onboarding experience.

packages/vector-store-service/package.json (1)

64-64: Consider pinning the @langchain/mongodb version more strictly

The ^0.1.0 version constraint for @langchain/mongodb could lead to breaking changes as 0.x versions typically indicate unstable APIs. Consider using a more strict version constraint like ~0.1.0 or even pinning to an exact version until the package reaches 1.0.0.

-        "@langchain/mongodb": "^0.1.0",
+        "@langchain/mongodb": "~0.1.0",
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c225597 and ffa8be4.

📒 Files selected for processing (8)
  • packages/vector-store-service/package.json (2 hunks)
  • packages/vector-store-service/src/index.ts (4 hunks)
  • packages/vector-store-service/src/vectorstore.ts (2 hunks)
  • packages/vector-store-service/src/vectorstore/faiss.ts (2 hunks)
  • packages/vector-store-service/src/vectorstore/lunavdb.ts (3 hunks)
  • packages/vector-store-service/src/vectorstore/milvus.ts (3 hunks)
  • packages/vector-store-service/src/vectorstore/mongodb.ts (1 hunks)
  • packages/vector-store-service/src/vectorstore/redis.ts (1 hunks)
🧰 Additional context used
🪛 GitHub Check: CodeFactor
packages/vector-store-service/src/vectorstore/milvus.ts

[warning] 217-217: packages/vector-store-service/src/vectorstore/milvus.ts#L217
Replace (filter·??·'') with filter·??·'' (prettier/prettier)

🔇 Additional comments (19)
packages/vector-store-service/src/vectorstore/lunavdb.ts (3)

3-3: Looks good! Import statements provide necessary VectorStore references.

The addition of these imports helps unify the usage of the VectorStore and SaveableVectorStore from the @langchain/core/vectorstores. This aligns with best practices for modular and maintainable code.


8-11: Ensures consistent interface usage across modules.

Importing ChatLunaSaveableVectorStore and ChatLunaSaveableVectorStoreInput from koishi-plugin-chatluna/llm-core/model/base allows for consistent extension and usage of shared vector store structures. This step is essential for code consistency across different vector store implementations.


134-134: Casting to ExtendedVectorStoreInput for specialized functionalities.

Casting the wrapper store to ExtendedVectorStoreInput<LunaDBVectorStore> ensures that advanced methods are accessible if provided, while preserving flexibility for other vector store implementations. This approach is scalable if more specialized behaviors or overrides are introduced in the future.

packages/vector-store-service/src/vectorstore/faiss.ts (2)

2-5: Imports are consistent with the plugin’s architecture.

It’s good to see that you're importing ChatLunaSaveableVectorStore and ChatLunaSaveableVectorStoreInput from their dedicated module. This helps keep your vector store logic well-structured and decoupled.


13-13: Ensure version alignment and availability of 'VectorStore'.

Double-check that the version of @langchain/core in your dependencies provides the VectorStore export. In many libraries, naming can evolve, so confirm that '@langchain/core/vectorstores' is the correct path and that it's compatible with your environment.

packages/vector-store-service/src/vectorstore/mongodb.ts (3)

45-74: Delete logic looks appropriate

Your deletableFunction properly handles deletion by either clearing all documents or selectively removing those matching the provided IDs. The overall approach appears correct.


83-89: Check Node.js version for crypto.randomUUID()

crypto.randomUUID() requires Node.js 15.0.0 or higher. If you must support older Node versions, provide a suitable polyfill or alternative ID generator.


101-111: User-friendly error messages for missing dependencies

Logging the original error and throwing a descriptive message helps ensure developers can quickly diagnose and install the missing mongodb package.

packages/vector-store-service/src/vectorstore/milvus.ts (2)

5-8: Imports are consistent with overall vector store enhancements

Bringing in ChatLunaSaveableVectorStore and ChatLunaSaveableVectorStoreInput supports the extended store logic neatly.


290-290: Coherent typing with the extended vector store

Casting to ExtendedVectorStoreInput<Milvus> maintains a clear contract for advanced store operations.

packages/vector-store-service/src/vectorstore.ts (2)

9-9: MongoDB import successfully added

Introducing the mongodb vector store import keeps this file consistent with the newly created mongodb.ts.


24-24: Verifying middleware order

Including 'mongodb' in the middlewares array is straightforward, but confirm that the order of these middlewares does not cause unintended interactions with other stores. Generally, this looks good.

packages/vector-store-service/src/index.ts (3)

30-32: Additional MongoDB fields added to Config

Declaring mongodbUrl, mongodbDbName, and mongodbCollectionName is necessary and aligns with MongoDB integration.


42-43: MongoDB now an option in vectorStore schema

Extending the array of valid vector stores with 'mongodb' broadens user choice and is consistent with your new implementation.


61-67: Sensible defaults for MongoDB

Providing usable default values for mongodbUrl, mongodbDbName, and mongodbCollectionName helps users quickly get started without extensive setup.

packages/vector-store-service/src/vectorstore/redis.ts (1)

108-108: Unused parameter renamed for clarity

Renaming the function parameter to _store clarifies that it’s intentionally unused, aligning with common TypeScript/JavaScript best practices.

packages/vector-store-service/package.json (3)

74-75: LGTM! Dev dependencies are well configured

The version constraints for both koishi and mongodb are appropriate and consistent with the project's dependency management style.


Line range hint 42-54: Version pinning strategy looks good

The use of resolutions/overrides to pin @langchain/core to version 0.3.18 across all dependency trees is a good practice, especially when integrating new vector store backends like MongoDB that depend on the LangChain ecosystem.


81-92: Verify compatibility with chatluna plugin

The configuration for optional MongoDB support looks good. However, let's verify the compatibility between the MongoDB integration and the required chatluna plugin version.

✅ Verification successful

Let me gather more information to verify the compatibility.


Based on the analysis of the codebase, I can now provide the final response:

MongoDB integration is compatible with chatluna plugin

The code analysis shows that the MongoDB integration is properly implemented as a vector store provider and follows the same pattern as other vector stores (FAISS, Milvus, etc.). The implementation:

  • Correctly imports and uses the required interfaces from koishi-plugin-chatluna
  • Implements the necessary vector store registration through plugin.registerVectorStore('mongodb', ...)
  • Extends ChatLunaSaveableVectorStore with proper MongoDB-specific functionality
  • Uses the same pattern for optional dependencies as other vector stores in the codebase
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for potential compatibility issues between MongoDB integration and chatluna plugin

# Check if there are any MongoDB-related imports or usage in chatluna plugin
echo "Checking for MongoDB dependencies in chatluna plugin..."
rg -l "mongodb" $(fd -t f "package.json" | grep "chatluna" | grep -v "vector-store-service")

# Look for vector store related interfaces or types in chatluna that might need updates
echo "Checking for vector store interfaces..."
ast-grep --pattern 'interface $_ {
  $$$
  mongodb$_?: $_
  $$$
}'

Length of output: 559


Script:

#!/bin/bash
# Check for MongoDB-related code in vector store service
echo "Checking MongoDB implementation in vector store service..."
rg -A 5 "koishi-plugin-chatluna" packages/vector-store-service/src/

# Check package.json of chatluna plugin if available
echo "Checking chatluna plugin dependencies..."
fd -t f "package.json" | grep "chatluna" | xargs cat

# Look for vector store type definitions
echo "Checking vector store types..."
rg -A 10 "VectorStore|VectorStoreInterface" packages/vector-store-service/src/

Length of output: 37648

…tore service

- Introduced new MongoDB configuration parameters: mongodbUrl, mongodbDbName, and mongodbCollectionName.
- Updated the configuration schema to include MongoDB as a supported vector store option.
- Added documentation link for MongoDB configuration in the usage section.
@Hoshino-Yumetsuki Hoshino-Yumetsuki changed the title 支持使用 MongoDB Atlas 作为向量数据库,使用拓展接口解决编辑器警告 支持使用 MongoDB Atlas 作为向量数据库 Dec 31, 2024
@dingyi222666 dingyi222666 merged commit 3fc51ca into ChatLunaLab:v1-dev Dec 31, 2024
2 checks passed
@dingyi222666 dingyi222666 self-requested a review January 1, 2025 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants