AI company DeepMind uses AI chatbots to solve mathematical problems. They have developed a method that prevents the linguistic model from providing meaningless answers.
a company Google DeepMind claims To make the first mathematical discovery using an AI-powered chatbot. They built a fact checker that filters out the useless output from the chatbot, leaving only reliable solutions to mathematical or computer problems.
DeepMind has previously built successful systems that predict the weather or the composition of proteins. These AI models are created specifically for the corresponding task. They are trained on accurate and relevant data.
Was the largest animal ever a Triassic marine reptile?
The carnivorous reptiles that ruled the seas 200 million years ago were probably the largest animals to ever inhabit the Earth.
In contrast, large language models, such as GPT-4 and Google's Gemini, are trained on massive amounts of public data. As a result, they have a wide range of skills. However, this approach also makes them vulnerable to “hallucinations,” sometimes causing them to produce incorrect statements with apparent conviction.
Take the ChatGPT-3.5 chatbot for example. If you ask: “What is the name of King Willem-Alexander’s granddaughter?”, the AI will answer you: “His eldest daughter, Princess Amalia, is often referred to as Princess Ariane.” This is a hallucinatory answer, because Amalia actually has no children.
A common solution to this phenomenon is to add a layer on top of AI. This layer verifies the accuracy of the output before passing it to the user. This is a difficult task, given the wide range of topics about which chatbots can be asked questions.
Researcher in artificial intelligence Hussein Fawzi From DeepMind and his colleagues have now created a language model called FunSearch. It's based on Google's PaLM2 model, adding a fact-checking layer they call the evaluator. This form was created specifically for writing computer code that solves problems in mathematics and computer science. According to DeepMind, this is a manageable task, because these new ideas and solutions can be verified quickly.
The underlying AI may still hallucinate and provide inaccurate or misleading results. But the evaluator filters out incorrect answers, leaving only reliable and useful concepts.
“We think that 90% of what a chatbot produces is probably unusable,” says Fawzi. However, chatbots are still very useful. “If I get a possible solution, I can easily tell you whether it is the right solution. But coming up with a solution yourself is very difficult,” says Fawzi. DeepMind claims that FunSearch can generate new scientific knowledge and ideas, something that linguistic models from before.
To start, FunSearch is given a very simple problem and solution as input. It then creates a database of new solutions, which the evaluator checks for accuracy. The best reliable solutions are returned to the language model as input, along with a query to improve the ideas. DeepMind says the system generates millions of potential solutions, which eventually converge to an effective outcome. This result is sometimes better than the best known solution.
The model does not solve mathematical problems directly. Instead, the model writes computer programs that find the solutions. For example, Fawzi and his colleagues challenged FunSearch to find solutions to this problem Cap set-problem. The model had to find patterns of dots in which three dots did not form a straight line. The problem becomes increasingly difficult and involves more and more calculations as the number of points increases. The AI has solved 512 points in eight dimensions, which is larger than ever before.
Researchers also use FunSearch for this purpose packing binThe problem, where the goal is to place objects of different sizes into boxes efficiently. FunSearch found better solutions than the currently commonly used algorithms. This finding has direct applications for transportation and logistics companies. According to DeepMind, FunSearch can lead to improvements in many mathematical and computational problems.
Computer scientist Mark Lee From the University of Birmingham in the UK, he says the next breakthroughs in AI will come not from scaling up language models, but by adding layers that ensure accuracy, as DeepMind did with FunSearch.
“The power of the linguistic model is the ability to imagine things, but hallucinations are a problem,” Lee says. “This research overcomes this problem: it keeps the system in check.”
According to me, we should not criticize AI for its inaccurate or useless results. It is no different from the way mathematicians and scientists work. They brainstorm and test ideas. They follow the best ideas and discard the worst ones.
“Total coffee specialist. Hardcore reader. Incurable music scholar. Web guru. Freelance troublemaker. Problem solver. Travel trailblazer.”