Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use LlamaIndex with different embeddings model

OpenAI's GPT embedding models are used across all LlamaIndex examples, even though they seem to be the most expensive and worst performing embedding models compared to T5 and sentence-transformers models (see comparison below).

How do I use all-roberta-large-v1 as embedding model, in combination with OpenAI's GPT3 as "response builder"? I'm not even sure if I can use one model for creating/retrieving embedding tokens and another model to generate the response based on the retrieved embeddings.

Example

Following is an example of what I'm looking for:

documents = SimpleDirectoryReader('data').load_data()

# Use Roberta or any other open-source model to generate embeddings
index = ???????.from_documents(documents)

# Use GPT3 here
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")

print(response)

Model Comparison

Embedding Models

Source

like image 229
Jay Avatar asked Mar 08 '26 19:03

Jay


2 Answers

You can set it up in a service_context, using either a local model or something from HuggingFace:

from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from llama_index import LangchainEmbedding, ServiceContext

embed_model = LangchainEmbedding(
  HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
)
service_context = ServiceContext.from_defaults(embed_model=embed_model)

You can then either pass this service_context, or set it globally:

from llama_index import set_global_service_context

set_global_service_context(service_context)
like image 70
Greg Funtusov Avatar answered Mar 11 '26 07:03

Greg Funtusov


Here's how to do it with open-source:

# Load embedding model
def load_embedding_model() -> HuggingFaceEmbedding:
    return HuggingFaceEmbedding(model_name="WhereIsAI/UAE-Large-V1")

def run_inference(
    use_rag: bool, messages: list[ChatMessage]
) -> ChatResponse | Response:
    llm = load_llm()
    embedding_model = load_embedding_model()
    if not use_rag:
        return llm.chat(messages=messages)

    set_global_tokenizer(
        AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2").encode
    )
    service_context = ServiceContext.from_defaults(
        llm=llm,
        embed_model=embedding_model,
        system_prompt="You are a bot that answers questions about podcast transcripts",
    )
    index_dir = DATA_DIR / "indices"

    index = save_or_load_index(index_dir=index_dir, service_context=service_context)
    query_engine = index.as_query_engine()
    return query_engine.query(messages[1].content)

Full write up

like image 34
cs_stackX Avatar answered Mar 11 '26 09:03

cs_stackX



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!