arXiv
Language Models
Your chatbot might ace the test but flub the follow-up conversation
It's like someone who can recite facts perfectly but seems slightly baffled when you ask a natural next question — the model solves the problem but doesn't truly track what came before.
This means current language models might be expert performers without genuine conversational awareness, which matters if we want AI that actually *understands* dialogue flow, not just pattern-matches responses.
Bug reported: No