How to Send a Streaming Response via LlamaIndex to a FastAPI Endpoint?

Question

I need to send a streaming response using LlamaIndex to my FastAPI endpoint. Below is the code I've written so far:

@bot_router.post("/bot/pdf_convo")
async def pdf_convo(query: QuestionInput):
    chat_engine = cache["chat_engine"]
    user_question = query.content
    streaming_response = chat_engine.stream_chat(user_question)
    for token in streaming_response.response_gen:
        print(token, end="")

I'd appreciate any guidance on how to properly implement the streaming response with LlamaIndex. Thank you!

stevenong99 · Accepted Answer

Reference

FastAPI - StreamingResponse

Solution

In order to use the StreamingResponse class provided, you'll need to create an async generator or a normal generator/iterator, then pass it into the StreamingResponse object. In your case, you want to pass the streaming_response.response_gen into a function that will return a generator. For example, this is how I would do it:

async def response_streamer(response):
    for token in response:
        yield f"{token}"

@bot_router.post("/bot/pdf_convo")
async def pdf_convo(query: QuestionInput):
    chat_engine = cache["chat_engine"]
    user_question = query.content
    streaming_response = chat_engine.stream_chat(user_question)
    return StreamingResponse(response_streamer(streaming_response.response_gen))

How to Send a Streaming Response via LlamaIndex to a FastAPI Endpoint?

Tags:

openai-api

python

nlp

large-language-model

llama-index

Mubashir Ahmed Siddiqui

1 Answers

Reference

Solution

stevenong99

Recent Activity

Donate For Us

How to Send a Streaming Response via LlamaIndex to a FastAPI Endpoint?

Tags:

openai-api

python

nlp

large-language-model

llama-index

Mubashir Ahmed Siddiqui

1 Answers

Reference

Solution

stevenong99

Related questions

Recent Activity

Donate For Us