Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get all documents from ChromaDb using Python and langchain

I'm using langchain to process a whole bunch of documents which are in an Mongo database.

I can load all documents fine into the chromadb vector storage using langchain. Nothing fancy being done here. This is my code:

from langchain.embeddings.openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

from langchain.vectorstores import Chroma
db = Chroma.from_documents(docs, embeddings, persist_directory='db')
db.persist()

Now, after storing the data, I want to get a list of all the documents and embeddings WITH id's.

This is so I can store them back into MongoDb.

I also want to put them through Bertopic to get the topic categories.

How do I get all documents I've just stored in the Chroma database? I want the documents, and all the metadata.

like image 809
user791793 Avatar asked Dec 07 '25 07:12

user791793


1 Answers

Looking at the source code (https://github.com/hwchase17/langchain/blob/master/langchain/vectorstores/chroma.py)

You can just call below

db.get()

and you will get a json output with the id's, embeddings and docs data.

like image 144
carteakey Avatar answered Dec 08 '25 21:12

carteakey



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!