⚠️ WARNING: This is pre-alpha incomplete code meant to serve as a basis for people building their own vector databases on the edge.
HyphalBase is a Cloudflare Worker-based vector database that leverages Durable Objects for storing and retrieving vector embeddings. It supports essential CRUD operations (put
, get
, delete
, deleteAll
) along with a built-in cosine similarity search
for finding the most relevant vectors. This project is designed to provide a minimal, self-hosted vector storage solution on Cloudflare’s edge network.
This is the third iteration of a vector database I have developed, drawing on insights from earlier implementations to refine and optimize this version. The concept is exceptionally powerful and versatile for enabling in-context learning (ICL) when applied to a normalized higher-dimensional abstraction of any dataset.
I’m eager to collaborate and would greatly appreciate your input and assistance!
# Install dependencies and start the hyphalbase server.
$ <pnpm|npm|yarn> install && wrangler dev
# (Optional) Define a reference id to make testing easier.
$ export EXAMPLE_RECORD_ID=123e4567-e89b-12d3-a456-426614174000
# Save a vector record with EXAMPLE_RECORD_ID (api will generate an a UUIDv4 if not supplied).
$ curl -X POST http://localhost:8787 -H "Content-Type: application/json" -d '{"operation":"put","payload":{"id":"${EXAMPLE_RECORD_ID}","namespace":"exampleNamespace","vector":[0.1,0.2,0.3],"content":"Sample content"}}'
# Get the vector record.
$ curl -X POST http://localhost:8787 -H "Content-Type: application/json" -d '{"operation":"get","payload":{"id":"${EXAMPLE_RECORD_ID}"}}'
# Search the records (todo: server-side filters [namespace, ect...]).
$ curl -X POST http://localhost:8787 -H "Content-Type: application/json" -d '{"operation":"search","payload":{"vector":[0.1,0.2,0.3],"topN":5}}'
# Delete the vector record.
$ curl -X POST http://localhost:8787 -H "Content-Type: application/json" -d '{"operation":"delete","payload":{"id":"${EXAMPLE_RECORD_ID}"}}'
# Delete all the records.
$ curl -X POST http://localhost:8787 -H "Content-Type: application/json" -d '{"operation":"deleteAll"}'
- Simplicity First: Keep the codebase easy to understand and maintain.
- Edge-Driven: Store data closer to the users and reduce latency by leveraging Cloudflare’s global network.
- Scalable: Durable Objects allow for isolated and horizontally scalable data storage.
- Put: Insert or update a vector (with an auto-generated or custom UUID).
- Get: Retrieve a specific vector by its UUID.
- Delete: Remove a specific vector from the storage.
- DeleteAll: Wipe out the entire data set (use with caution).
- Cosine Similarity: Quickly compute similarity to rank vectors.
- Top-N Results: Return only the top matches based on the similarity score.
- Binary Encoding: Vectors are stored as binary blobs using
Float32Array
. - Namespace Support: Organize your vectors within logical namespaces.
- Cloudflare Workers + Durable Objects: Each Durable Object can store data for one or more namespaces, providing horizontal scalability.
- Fast Global Access: Serve embeddings across the globe with minimal latency.
- Node.js (>=14.0.0 recommended)
- Cloudflare Wrangler CLI (for local development and deployment)
- Cloudflare Account (to create and manage your Durable Objects)
- Bun (optional, used for development scripts in this repository)
-
Clone or download this repository
git clone https://github.com/seemueller-io/hyphalbase.git cd hyphalbase
-
Install dependencies
pnpm install
or
bun install
-
Configure Wrangler Make sure you have Wrangler installed and that you’ve logged in to your Cloudflare account:
pnpm install -g wrangler wrangler login
Run a local version of your Worker with Durable Objects by using Wrangler’s dev mode:
pnpm run dev
This will spin up a local server on localhost:8787
. Any request path will be routed to the HyphalObject
Durable Object instance, which handles your put
, get
, delete
, deleteAll
, and search
operations.
Deploy your Worker to Cloudflare:
pnpm run deploy
Wrangler will package and upload your code, creating or updating the Durable Object if needed. Once deployed, your Worker will be live at the assigned domain/subdomain configured in your wrangler.toml
.
This project is mainly designed for a Worker environment, so direct usage outside the Cloudflare ecosystem isn’t the primary focus. However, you can interact with the Worker endpoints using standard fetch requests. For example:
// Example: PUT request to store a vector
fetch('https://your-worker-url/someObjectName', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
operation: 'put',
payload: {
id: 'optional-custom-uuid',
namespace: 'exampleNamespace',
vector: [0.1, 0.2, 0.3],
content: 'Sample content'
}
})
})
.then(res => res.json())
.then(data => {
console.log('Stored vector ID:', data.id);
});
You may need to provide environment variables via .env
or through Wrangler’s configuration interface, particularly if you plan to integrate with external embedding services (e.g., OpenAI). For instance:
FIREWORKS_API_KEY=sk-some-api-key
(Be sure to never commit actual API keys to source control!)
Your wrangler.toml
should declare a Durable Object binding:
[durable_objects]
bindings = [
{ name = "HYPHAL_OBJECT", class_name = "HyphalObject" }
]
This binding is what allows your Worker to create and manage the HyphalObject
Durable Object instance.
This repository includes scripts to test and seed the database as well as to run local development using Bun or npm:
# Bun examples
bun run dev # Start local dev server
bun run test # Run tests
bun run build # Build the project
bun run seed # Seed example data
# NPM equivalents
pnpm run dev
pnpm run test
pnpm run build
pnpm run seed
.
├── scripts/
│ ├── seed.ts # Example script for seeding data
│ ├── test_*.js # Demo scripts for deleting/putting data
└── src/
├── hyphal_object.ts # Core Durable Object logic
└── index.ts # Worker entrypoint, routes fetch requests
package.json
: Includes script definitions for deployment, local development, and testing.wrangler.toml
: Cloudflare Workers configuration (not shown in the snippet above, but typically required).worker-configuration.d.ts
: Declares theEnv
interface for TypeScript usage with Durable Objects.
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to your branch
- Open a Pull Request
- Write clear and concise TypeScript code.
- Include tests for any new features.
- Use descriptive commit messages.
- Follow existing code patterns for consistency.
- Document new methods and types in the README if needed.
Remember that Cloudflare Durable Objects have certain concurrency and storage constraints. For production-grade usage, ensure you assess your application’s scale and data throughput to avoid hitting resource limits.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. The Software is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose, and noninfringement. In no event shall the authors or copyright holders be liable for any claim, damages, or other liability, whether in an action of contract, tort, or otherwise, arising from, out of, or in connection with the Software or the use or other dealings in the Software.