Discussion about this post

User's avatar
Felix's avatar

Nice write up! I did a deep dive on the car wash test with 53 leading models + 10k human control group: https://opper.ai/blog/car-wash-test

Jeff Hemming's avatar

Thank you for sharing this! Ran the test on the tools that I use. They all failed AND I'm using pad services. Lead me to changing to Opus 4.5.

Question - what about Copilot? Curious, was it not included because it uses GPT-5 as it's core?

1 more comment...

No posts

Ready for more?