Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating and using ChromaDB ids

Tags:

chromadb

I'm wondering how people deal with the ids in Chroma DB. I plan to store code-snippets (let's say single functions or classes) in the collection and need a unique id for each. These documents are going to be generated so the first problem is: how do I go about randomly generating an appropriate id.

I suppose it's possible that I may want to update a document at some point, so I'd need the id handy. This feels like a chicken and egg problem.. Am I supposed to store the ids in another db like postgres? And then how would I even know which id relates to which snippet? Query ChromaDB to first find the id of the most related document?

like image 943
samala7800 Avatar asked Oct 23 '25 16:10

samala7800


1 Answers

If you are going to be referencing the vector DB again by ID to find a specific entry that tells me that you have the entry IDs stored somewhere else. That being the case, I'd recommend using a combination of a regular database table and the vector db table. You could then use the auto-generated ID from the table to reference the vector ID.

You could do something like the following:

  1. Regular Database Table (Table A):

    • Insert your code snippets or documents into a regular database table. Let's call this table "CodeSnippets."
    • This table should have an auto-generated primary key (e.g., an incrementing integer or a UUID) to ensure each document has a unique ID.
  2. Retrieve Document IDs:

    • After inserting a document into "CodeSnippets," retrieve the newly generated unique ID for that document.
  3. Chroma DB Table (Table B):

    • Simultaneously, add your document embeddings and associate them with the document's ID from step 2 to a Chroma DB table. Let's call this table "Embeddings."
    • In "Embeddings," you can have two columns: one for the document ID (from Table A) and another for the document embeddings.
  4. Done!

    • You now have a system where you can easily reference your documents by their unique IDs, both in your regular database and Chroma DB.

Here's a simplified example using Python and a hypothetical database library (e.g., SQLAlchemy for SQL databases):

# Step 1: Insert data into the regular database (Table A)
# Assuming you have a SQLAlchemy model called CodeSnippet
from chromadb.utils import embedding_functions
from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
import chromadb

Base = declarative_base()

class CodeSnippet(Base):
    __tablename__ = 'CodeSnippets'

    id = Column(Integer, primary_key=True, autoincrement=True)
    code = Column(String)
    # Add other metadata columns as needed

engine = create_engine('sqlite:///database.db')
Base.metadata.create_all(engine)

# Create a session
Session = sessionmaker(bind=engine)
session = Session()

# Insert a code snippet into Table A
new_snippet = CodeSnippet(code='print("Hello World")')
session.add(new_snippet)
session.commit()

# Step 2: Retrieve the newly generated document ID
document_id = str(new_snippet.id)

# Step 3: Add embeddings to Chroma DB (Table B)
client = chromadb.PersistentClient("./data")
sentence_transformer_ef = embedding_functions.SentenceTransformerEmbeddingFunction(
    model_name="all-MiniLM-L6-v2"
)
collection = client.get_or_create_collection("code_snippets", embedding_function=sentence_transformer_ef)
collection.add([document_id], documents=[new_snippet.code])

# Step 4: You can now easily reference the document by its ID in both databases
# For example, you can retrieve the code snippet from Table A by its ID
result = session.query(CodeSnippet).filter(CodeSnippet.id == document_id).first()
print(result.code)

# Or you can retrieve the code snippet from Table B by its ID
result = collection.get(document_id)
print(result)

This approach ensures you have a clear mapping between your document data and embeddings, avoiding the chicken-and-egg problem you mentioned.

like image 125
KalebJS Avatar answered Oct 27 '25 03:10

KalebJS



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!