Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I not add metadata to documents loaded using Chroma.from_documents()

I wanted to add additional metadata to the documents being embedded and loaded into Chroma.
I'm unable to find a way to add metadata to documents loaded using
Chroma.from_documents(documents, embeddings)
For example, imagine I have a text file having details of a particular disease, I wanted to add species as a metadata that is a list of all species it affects.

As a round-about way I loaded it in a chromadb collection by adding required metadata and persisted it

client = chromadb.PersistentClient(path="chromaDB")

collection = client.get_or_create_collection(name="test",
                                             embedding_function=openai_ef,
                                             metadata={"hnsw:space": "cosine"})
collection.add(
     documents=documents,
     ids=ids,
     metadatas=metadata
)

This was the result,

collection.get(include=['embeddings','metadatas'])

Output:

{'ids': ['id0',
'id1',
'embeddings': [[-0.014580891467630863,
0.0003901976451743394,
0.00793908629566431,
-0.027648288756608963,
-0.009689063765108585,
0.010222840122878551,
-0.00946609303355217,
-0.002771923551335931,
-0.04675614833831787,
-0.02056729979813099,
0.014364678412675858,
...
{'species': 'XYZ', 'source': 'Flu.txt'},
{'species': 'ABC', 'source': 'Common_cold.txt'}],
'documents': None,
'uris': None,
'data': None}

Now I tried loading it from the directory persisted in the disk using Chroma.from_documents()

db = Chroma(persist_directory="chromaDB", embedding_function=embeddings)

But I don't see anything loaded. db.get() results in this,

db.get(include=['metadatas'])

Output:

{'ids': [],
'embeddings': None,
'metadatas': [],
'documents': None,
'uris': None,
'data': None}

Please help. Need to load metadata to the files being loaded.

like image 803
hollow_coder Avatar asked Oct 24 '25 06:10

hollow_coder


2 Answers

I would recommend you add the metadata to the document itself you are trying to load. This makes it clear exactly what metadata you are trying to add to what piece of content.

documents = [Document(page_content="Some content", metadata={"language": "EN", "author": "Unknown"}),]
Chroma.from_documents(documents=documents)
like image 168
Pieler Avatar answered Oct 26 '25 10:10

Pieler


Found the answer myself.

I haven't mentioned the collection name while loading.

Instead of doing this,

db = Chroma(persist_directory="chromaDB", embedding_function=embeddings)

Do this

db = Chroma(persist_directory="chromaDB", embedding_function=embeddings, collection_name = 'your_collection_name')

In my case, the collection name is 'test'.

like image 24
hollow_coder Avatar answered Oct 26 '25 10:10

hollow_coder



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!