I need to send a streaming response using LlamaIndex to my FastAPI endpoint. Below is the code I've written so far:
@bot_router.post("/bot/pdf_convo")
async def pdf_convo(query: QuestionInput):
chat_engine = cache["chat_engine"]
user_question = query.content
streaming_response = chat_engine.stream_chat(user_question)
for token in streaming_response.response_gen:
print(token, end="")
I'd appreciate any guidance on how to properly implement the streaming response with LlamaIndex. Thank you!
FastAPI - StreamingResponse
In order to use the StreamingResponse class provided, you'll need to create an async generator or a normal generator/iterator, then pass it into the StreamingResponse object. In your case, you want to pass the streaming_response.response_gen into a function that will return a generator. For example, this is how I would do it:
async def response_streamer(response):
for token in response:
yield f"{token}"
@bot_router.post("/bot/pdf_convo")
async def pdf_convo(query: QuestionInput):
chat_engine = cache["chat_engine"]
user_question = query.content
streaming_response = chat_engine.stream_chat(user_question)
return StreamingResponse(response_streamer(streaming_response.response_gen))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With