Player Of Games
A downloadable project
Evaluation of Large Language Models in Cooperative Language Games
Samuel Knoche
Independent
Abstract
This report investigates the potential of cooperative language games as an evaluation tool of language models. Specifically, the investigation focuses on LLM’s ability to both act as the “spymaster” and the “guesser” in the game of Codenames, focusing on the spymaster's capability to provide hints which will guide their teammate to correctly identify the “target” words, and the guesser's ability to correctly identify the target words using the given hint. We investigate both the capability of different LLMs at self-play, and their ability to play cooperatively with a human teammate. The report concludes with some promising results and suggestions for further investigation.
Keywords: Scale oversight, benchmarks, ML safety
Status | Released |
Category | Other |
Author | SamuelKnoche |
Leave a comment
Log in with itch.io to leave a comment.