I wanted to add additional metadata to the documents being embedded and loaded into Chroma.
I'm unable to find a way to add metadata to documents loaded using
Chroma.from_documents(documents, embeddings)
For example, imagine I have a text file having details of a particular disease, I wanted to add species as a metadata that is a list of all species it affects.
As a round-about way I loaded it in a chromadb collection by adding required metadata and persisted it
client = chromadb.PersistentClient(path="chromaDB")
collection = client.get_or_create_collection(name="test",
embedding_function=openai_ef,
metadata={"hnsw:space": "cosine"})
collection.add(
documents=documents,
ids=ids,
metadatas=metadata
)
This was the result,
collection.get(include=['embeddings','metadatas'])
Output:
{'ids': ['id0',
'id1',
'embeddings': [[-0.014580891467630863,
0.0003901976451743394,
0.00793908629566431,
-0.027648288756608963,
-0.009689063765108585,
0.010222840122878551,
-0.00946609303355217,
-0.002771923551335931,
-0.04675614833831787,
-0.02056729979813099,
0.014364678412675858,
...
{'species': 'XYZ', 'source': 'Flu.txt'},
{'species': 'ABC', 'source': 'Common_cold.txt'}],
'documents': None,
'uris': None,
'data': None}
Now I tried loading it from the directory persisted in the disk using Chroma.from_documents()
db = Chroma(persist_directory="chromaDB", embedding_function=embeddings)
But I don't see anything loaded. db.get() results in this,
db.get(include=['metadatas'])
Output:
{'ids': [],
'embeddings': None,
'metadatas': [],
'documents': None,
'uris': None,
'data': None}
Please help. Need to load metadata to the files being loaded.
I would recommend you add the metadata to the document itself you are trying to load. This makes it clear exactly what metadata you are trying to add to what piece of content.
documents = [Document(page_content="Some content", metadata={"language": "EN", "author": "Unknown"}),]
Chroma.from_documents(documents=documents)
Found the answer myself.
I haven't mentioned the collection name while loading.
Instead of doing this,
db = Chroma(persist_directory="chromaDB", embedding_function=embeddings)
Do this
db = Chroma(persist_directory="chromaDB", embedding_function=embeddings, collection_name = 'your_collection_name')
In my case, the collection name is 'test'.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With