We have a use case for ChatGPT in summarizing long pieces of text (speech-to-text conversations which can be over an hour).
However we find that the 4k token limit tends to lead to a truncation of the input text to say half or so due to the token limit.
Processing in parts does not seem to retain history of previous parts.
What options do we have for submitting a longer request which is over 4k tokens?
The closest answer to your question would be in the form of Embeddings.
You can find an overview of what they are here.
I recommend you review this code from the OpenAI Cookbook Github page that used a Web Crawl Q&A example to explain embeddings.
I used the code from Step 5 onwards and altered the location of the text to poin it to my file containing the long piece of text.
From:
# Open the file and read the text
with open("text/" + domain + "/" + file, "r", encoding="UTF-8") as f:
text = f.read()
to:
# Open the file and read the text
with open("/my_location/long_text_file.txt", "r", encoding="UTF-8") as f:
text = f.read()
And modified the questions at Step 13 to what I needed to know about the text.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With