Why is GPT-4 giving different answers with same prompt & temperature=0?

Question

This is my code for calling the gpt-4 model:

messages = [
    {"role": "system", "content": system_msg},
    {"role": "user", "content": req}
]

response = openai.ChatCompletion.create(
        engine = "******-gpt-4-32k",
        messages = messages,
        temperature=0,
        top_p=1,
        frequency_penalty=0,
        presence_penalty=0
    )

answer = response["choices"][0]["message"]["content"]

Keeping system_msg & req constant, with temperature=0, I get different answers. I got 3 different answers when I last ran this 10 times for instance. The answers are similar in concept, but differ in semantics.

Why is this happening?

Franck Dernoncourt · Accepted Answer

This blogpost authored by Sherman Chann argues that:

Non-determinism in GPT-4 is caused by Sparse MoE [mixture of experts].

Note that it’s now possible to set a seed parameter. From platform.openai.com/docs (mirror):

Reproducible outputs

Beta

Chat Completions are non-deterministic by default (which means model outputs may differ from request to request). That being said, we offer some control towards deterministic outputs by giving you access to the seed parameter and the system_fingerprint response field.

To receive (mostly) deterministic outputs across API calls, you can:

Set the seed parameter to any integer of your choice and use the same value across requests you'd like deterministic outputs for.

Ensure all other parameters (like prompt or temperature) are the exact same across requests.

Sometimes, determinism may be impacted due to necessary changes OpenAI makes to model configurations on our end. To help you keep track of these changes, we expose the system_fingerprint field. If this value is different, you may see different outputs due to changes we've made on our systems.

Kavya Bhandari · Answer

Found a solution here: https://community.openai.com/t/observing-discrepancy-in-completions-with-temperature-0/73380

TLDR; some discrepancies occur due to floating pt operations sometimes when 2 tokens have very close probabilities. Even a single token change affects the whole chain and leads to divergent generations.

Why is GPT-4 giving different answers with same prompt & temperature=0?

Tags:

gpt-4

Kavya Bhandari

2 Answers

Franck Dernoncourt

Kavya Bhandari

Recent Activity

Donate For Us

Why is GPT-4 giving different answers with same prompt & temperature=0?

Tags:

gpt-4

Kavya Bhandari

2 Answers

Franck Dernoncourt

Kavya Bhandari

Related questions

Recent Activity

Donate For Us