Ah got it. So GPT is non-deterministic, but we somewhat handle that by having a caching layer in our AI. Basically if you make an ai() call, and we see that the page state is identical to a previous invocation of that exact AI prompt, then we will not consult the AI and install return you the cached result. We did this mainly to reduce costs and speed up execution of the 2nd-to-nth run of the same test, but it does make the AI a bit more deterministic.
There are some new features in GPT-4-Turbo that will let us handle determinism better, and we will be exploring that once GPT-4-Turbo is stable.
That makes a lot of sense, thank you for the explanation, I will have to explore this the next time I am building page tests. Have considered doing it myself but much happier using a relatively inexpensive product than maintaining the creaky homebuild version.
There are some new features in GPT-4-Turbo that will let us handle determinism better, and we will be exploring that once GPT-4-Turbo is stable.