1st Update: Which Large Language Models are best for regulatory work?

The Regulatory Institute posted in February 2025 this article on “Which Large Language Models are best for regulatory work?”. Here is the first update.In a recent test aimed at drafting amendments to a bill, Mistral Le Chat, one of the two Large Language Models that have been highly rated so far, joined the group of underperformers alongside OpenAI’s various (Chat)GPT(s) and Google’s Gemini. For the first time, we tested the Large Language Model Manus. Manus performed slightly better than the other highly rated Large Language Model, Anthropic’s Claude Sonnet 3.7. Based on the new tests and past results, only one Large Language Model can be recommended for all drafting tasks: Claude Sonnet 3.7. Nevertheless, it is worthwhile using some other Large Language Models in parallel and comparing results. The recommendations in our previous article “Which Large Language Models are best for regulatory work?” remain valid but need to be updated to include Manus as top performer and nuanced with regard to Mistral’s Le Chat.

A few days after our last test, Anthropic released Claude Sonnet 4. It is said to have even better reasoning capabilities, and the initial results seem to confirm this. Anthropic could have an even greater lead over all Large Language Models except Manus with Claude Sonnet 4 than they did with Claude Sonnet 3.7.
Finally, we can confirm the observed trend of more “hallucinations” being produced by large language models.

Leave a Reply

Your email address will not be published. Required fields are marked *

thirteen + 1 =