March 4, 2024

Taylor Daily Press

Complete News World

Artificial intelligence with built-in fact checker makes mathematical discoveries

Artificial intelligence with built-in fact checker makes mathematical discoveries

AI company DeepMind uses AI chatbots to solve mathematical problems. They have developed a method that prevents the linguistic model from providing meaningless answers.

a company Google DeepMind claims To make the first mathematical discovery using an AI-powered chatbot. They built a fact checker that filters out the useless output from the chatbot, leaving only reliable solutions to mathematical or computer problems.


DeepMind has previously built successful systems that predict the weather or the composition of proteins. These AI models are created specifically for the corresponding task. They are trained on accurate and relevant data.

Read also

Was the largest animal ever a Triassic marine reptile?

The carnivorous reptiles that ruled the seas 200 million years ago were probably the largest animals to ever inhabit the Earth.

In contrast, large language models, such as GPT-4 and Google's Gemini, are trained on massive amounts of public data. As a result, they have a wide range of skills. However, this approach also makes them vulnerable to “hallucinations,” sometimes causing them to produce incorrect statements with apparent conviction.

Take the ChatGPT-3.5 chatbot for example. If you ask: “What is the name of King Willem-Alexander’s granddaughter?”, the AI ​​will answer you: “His eldest daughter, Princess Amalia, is often referred to as Princess Ariane.” This is a hallucinatory answer, because Amalia actually has no children.

A common solution to this phenomenon is to add a layer on top of AI. This layer verifies the accuracy of the output before passing it to the user. This is a difficult task, given the wide range of topics about which chatbots can be asked questions.


Researcher in artificial intelligence Hussein Fawzi From DeepMind and his colleagues have now created a language model called FunSearch. It's based on Google's PaLM2 model, adding a fact-checking layer they call the evaluator. This form was created specifically for writing computer code that solves problems in mathematics and computer science. According to DeepMind, this is a manageable task, because these new ideas and solutions can be verified quickly.

The underlying AI may still hallucinate and provide inaccurate or misleading results. But the evaluator filters out incorrect answers, leaving only reliable and useful concepts.

“We think that 90% of what a chatbot produces is probably unusable,” says Fawzi. However, chatbots are still very useful. “If I get a possible solution, I can easily tell you whether it is the right solution. But coming up with a solution yourself is very difficult,” says Fawzi. DeepMind claims that FunSearch can generate new scientific knowledge and ideas, something that linguistic models from before.

Mathematical problems

To start, FunSearch is given a very simple problem and solution as input. It then creates a database of new solutions, which the evaluator checks for accuracy. The best reliable solutions are returned to the language model as input, along with a query to improve the ideas. DeepMind says the system generates millions of potential solutions, which eventually converge to an effective outcome. This result is sometimes better than the best known solution.

The model does not solve mathematical problems directly. Instead, the model writes computer programs that find the solutions. For example, Fawzi and his colleagues challenged FunSearch to find solutions to this problem Cap set-problem. The model had to find patterns of dots in which three dots did not form a straight line. The problem becomes increasingly difficult and involves more and more calculations as the number of points increases. The AI ​​has solved 512 points in eight dimensions, which is larger than ever before.

Researchers also use FunSearch for this purpose packing binThe problem, where the goal is to place objects of different sizes into boxes efficiently. FunSearch found better solutions than the currently commonly used algorithms. This finding has direct applications for transportation and logistics companies. According to DeepMind, FunSearch can lead to improvements in many mathematical and computational problems.


Computer scientist Mark Lee From the University of Birmingham in the UK, he says the next breakthroughs in AI will come not from scaling up language models, but by adding layers that ensure accuracy, as DeepMind did with FunSearch.

“The power of the linguistic model is the ability to imagine things, but hallucinations are a problem,” Lee says. “This research overcomes this problem: it keeps the system in check.”

According to me, we should not criticize AI for its inaccurate or useless results. It is no different from the way mathematicians and scientists work. They brainstorm and test ideas. They follow the best ideas and discard the worst ones.

See also  Space travel in our daily lives: Studying radiation-resistant microbes in space leads to fewer wrinkles and less sun damage on Earth