Cannot load persisted db using Chroma / Langchain

Question

I ingested all docs and created a collection / embeddings using Chroma. I have a local directory db. Within db there is chroma-collections.parquet and chroma-embeddings.parquet. These are not empty. Chroma-collections.parquet when opened returns a collection name, uuid, and null metadata.

When I load it up later using langchain, nothing is here.

from langchain.vectorstores import Chroma

embeddings = HuggingFaceEmbeddings(model_name=embeddings_model_name)
CHROMA_SETTINGS = Settings(
        chroma_db_impl='duckdb+parquet',
        persist_directory='db',
        anonymized_telemetry=False
)

db = Chroma(persist_directory='db', embedding_function=embeddings, client_settings=CHROMA_SETTINGS)

db.get() returns {'ids': [], 'embeddings': None, 'documents': [], 'metadatas': []}

I've tried lots of other alternate approaches online. E.g.

import chromadb

client = chromadb.Client(Settings(chroma_db_impl="duckdb+parquet",
                                    persist_directory='./db'))
coll = client.get_or_create_collection("langchain", embedding_function=embeddings)
coll.count() returns 0

I'm expecting all the docs and embeddings to be available. What am I missing?

deepak walia · Accepted Answer

We need to add collection_name while saving/loading Chromadb.

save to disk

db2 = Chroma.from_documents(docs, embedding_function,  persist_directory="./chroma_db", collection_name='v_db')
db2.persist()
docs = db2.similarity_search(query)

load from disk

db3 = Chroma(collection_name='v_db', persist_directory="./chroma_db", embedding_function)
docs = db3.similarity_search(query)
print(docs[0].page_content)

Per Feldvoss Olsen · Answer

It looks like the langchain dokumentation was wrong https://github.com/langchain-ai/langchain/issues/19807

You can change

from langchain_community.vectorstores import Chroma

to

from langchain_community.vectorstores.chroma import Chroma

Cannot load persisted db using Chroma / Langchain

Tags:

langchain

chromadb

gpt4all

privategpt

kaysuez

2 Answers

save to disk

load from disk

deepak walia

Per Feldvoss Olsen

Recent Activity

Donate For Us

Cannot load persisted db using Chroma / Langchain

Tags:

langchain

chromadb

gpt4all

privategpt

kaysuez

2 Answers

save to disk

load from disk

deepak walia

Per Feldvoss Olsen

Related questions

Recent Activity

Donate For Us