OpenAI's GPT embedding models are used across all LlamaIndex examples, even though they seem to be the most expensive and worst performing embedding models compared to T5 and sentence-transformers models (see comparison below).
How do I use all-roberta-large-v1 as embedding model, in combination with OpenAI's GPT3 as "response builder"? I'm not even sure if I can use one model for creating/retrieving embedding tokens and another model to generate the response based on the retrieved embeddings.
Following is an example of what I'm looking for:
documents = SimpleDirectoryReader('data').load_data()
# Use Roberta or any other open-source model to generate embeddings
index = ???????.from_documents(documents)
# Use GPT3 here
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
print(response)

Source
You can set it up in a service_context, using either a local model or something from HuggingFace:
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from llama_index import LangchainEmbedding, ServiceContext
embed_model = LangchainEmbedding(
HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
)
service_context = ServiceContext.from_defaults(embed_model=embed_model)
You can then either pass this service_context, or set it globally:
from llama_index import set_global_service_context
set_global_service_context(service_context)
Here's how to do it with open-source:
# Load embedding model
def load_embedding_model() -> HuggingFaceEmbedding:
return HuggingFaceEmbedding(model_name="WhereIsAI/UAE-Large-V1")
def run_inference(
use_rag: bool, messages: list[ChatMessage]
) -> ChatResponse | Response:
llm = load_llm()
embedding_model = load_embedding_model()
if not use_rag:
return llm.chat(messages=messages)
set_global_tokenizer(
AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2").encode
)
service_context = ServiceContext.from_defaults(
llm=llm,
embed_model=embedding_model,
system_prompt="You are a bot that answers questions about podcast transcripts",
)
index_dir = DATA_DIR / "indices"
index = save_or_load_index(index_dir=index_dir, service_context=service_context)
query_engine = index.as_query_engine()
return query_engine.query(messages[1].content)
Full write up
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With