I have deployed Llama2 in the VertexAI Model Garden and am able to use the API endpoint without issues.
But when I started engineering my prompts something came up. I am providing a lot of context to the model and its answer always start with a repetition of the input. Since that is so long, the majority of the actual answer is cut off.
So I wanted to increase the maximum token size that is created.
Here is a my code:
The rough idea is, that the user asks a question. First I extract from a Vector Store a piece of text that should contain the answer and then want the AI to summarize the text so that it answers the question.
from langchain.llms.vertexai import VertexAIModelGarden
llm = VertexAIModelGarden(
    project=...,
    endpoint_id=...,
    location=...,
)
prompt_template = """<s>[INST] <<SYS>>You are a helpful, respectful and honest assistant. If you don't know the answer
    to question don't share false information.
    Use the context below to answer the provided questions: 
    Context: {context}
    <</SYS>>
    Question: {question}[/INST]
    """
PROMPT = PromptTemplate(template=prompt_template, input_variables=["context", "question"])
chain = LLMChain(llm=llm, prompt=PROMPT)
# results is a list of Documents, so basically a bunch of text
inputs = [{"context": result.page_content, "question": question} for result in results]
final_result = chain.apply(inputs)
So fundamentally it boils down to whether I can provide to VertexAIModelGarden a parameter like max_token_length or max_new_token.
Nothing in the official documentation stood out to me in that regard: https://api.python.langchain.com/en/latest/llms/langchain.llms.vertexai.VertexAIModelGarden.html?highlight=vertexai#langchain.llms.vertexai.VertexAIModelGarden
Thanks!
Not sure if I'm too late to answer but I've ran into this issue too and hope this helps others after having spent hours going through the source code myself. Basically, you need to use the allowed_model_args to specify which arguments to use and give them when you call the model. For example, something like this:
prompt = 'your prompt here'
llm = VertexAIModelGarden(
    project=...,
    endpoint_id=...,
    allowed_model_args=["temperature", "max_tokens"]
)
llm(prompt, max_tokens=4000, temperature=0.0)
Hope this helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With