I ingested all docs and created a collection / embeddings using Chroma. I have a local directory db. Within db there is chroma-collections.parquet and chroma-embeddings.parquet. These are not empty. Chroma-collections.parquet when opened returns a collection name, uuid, and null metadata.
When I load it up later using langchain, nothing is here.
from langchain.vectorstores import Chroma
embeddings = HuggingFaceEmbeddings(model_name=embeddings_model_name)
CHROMA_SETTINGS = Settings(
chroma_db_impl='duckdb+parquet',
persist_directory='db',
anonymized_telemetry=False
)
db = Chroma(persist_directory='db', embedding_function=embeddings, client_settings=CHROMA_SETTINGS)
db.get()
returns {'ids': [], 'embeddings': None, 'documents': [], 'metadatas': []}
I've tried lots of other alternate approaches online. E.g.
import chromadb
client = chromadb.Client(Settings(chroma_db_impl="duckdb+parquet",
persist_directory='./db'))
coll = client.get_or_create_collection("langchain", embedding_function=embeddings)
coll.count() returns 0
I'm expecting all the docs and embeddings to be available. What am I missing?
We need to add collection_name while saving/loading Chromadb.
db2 = Chroma.from_documents(docs, embedding_function, persist_directory="./chroma_db", collection_name='v_db')
db2.persist()
docs = db2.similarity_search(query)
db3 = Chroma(collection_name='v_db', persist_directory="./chroma_db", embedding_function)
docs = db3.similarity_search(query)
print(docs[0].page_content)
It looks like the langchain dokumentation was wrong https://github.com/langchain-ai/langchain/issues/19807
You can change
from langchain_community.vectorstores import Chroma
to
from langchain_community.vectorstores.chroma import Chroma
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With