A new way to make neural networks think systematically could ensure that artificial intelligence (AI) can be trained with less data. Over the past few years, models like ChatGPT have shown significant progress. But a lot of data is needed to train models, while people can learn with far fewer examples. to road Brenden Lake and Marco Baroni reported last week in the journal Science nature If deployed, AI can generalize better and, if possible, learn faster.
People can naturally think in general ways, which can be seen, for example, in how we learn mathematics. Children learn numbers and the + sign, and if the teacher explains that, for example, 1+2=3, children can understand that 2+1=3. This is also called “synthetic thinking.”
Compositionality is a property of language, as well as mathematics. It means that the meaning of a sentence or account depends on the meaning of its parts and structure. Syntactic thinking is nothing more than making use of the structure of language. This way you can understand the meaning of new combinations of words you already know.
A little training
Anyone who understands how the + sign works and knows the numbers can put all the numbers together in principle. Neural networks are not inherently intelligent. For example, if a neural network is not yet familiar with the + sign and has learned that 1+2=3, it may still think that 2+1=2, because it cannot immediately understand that the + sign always works in a certain way.
And in the new way that the researchers found nature Nowadays, programmers can teach the structure of neural networks so that they can avoid these types of errors with a little training. One of the researchers, computational linguist Marco Baroni, explains over the phone why they developed this method: “Training large models like ChatGPT requires a lot of energy, and we also want AI not to be developed only by large companies, like Google or Meta. Less data for training, it will be easier.
Neural networks are algorithms based on certain combinations of… Input And OutputsCan develop a method for estimating the outputs to be provided using new inputs. The method developed by Lake and Baroni can train the type of neural networks used for language processing, a family that includes ChatGPT.
Researchers use this technique Meta-learning It is called, where the AI is trained on different tasks, in this case, synthetic tasks, one at a time. The idea of meta-learning has been around since the 1990s, but according to Baroni, it has only been in recent years that neural networks have been developed enough to be able to learn compositionality in this way.
In one such synthesis task, the network is presented with a number of example sentences in an artificial language with the correct translation. For example, “fax” means red circle, “dup” means blue circle, and “fax kiki dup” means red circle then blue circle. The neural network is then shown a new sentence in the artificial language, for example ‘dup kiki fax’, and it must be translated correctly: first a blue circle and then a red circle.
The network is trained so that it provides the best possible translation of the new sentence it sees into various artificial languages with different grammatical rules. Once the model is trained, it can perform synthetic tasks just like humans. The researchers also showed that the trained network could perform a standard test of systematic generalization that it had developed better than the untrained network.
Jilli Zuidema, an assistant professor of artificial intelligence at the University of Amsterdam, explains that the untrained network with which Lake and Baroni perform the generalization test is very small compared to large modern models. “Their model has about a million parameters, whereas ChatGPT, for example, has billions. That’s a thousand times smaller.” Perhaps a larger model could do more than the untrained network used by Lake and Baroni.
an important question
ChatGPT can do so much that it sometimes seems as if the installation problem has already been solved. Zuidema: “It’s really amazing how creatively ChatGPT can handle new combinations of words. But we also know that ChatGPT has been trained on a lot of data and it’s not clear how exactly the model knows what to answer. You’ve probably just witnessed a lot Which makes a lot of it actually not new at all.
That’s why, according to Zuidema, it’s interesting to see how researchers can make smaller models with less training data to solve some synthetic tasks: “People sometimes ignore this question a little bit, but these ChatGPT models are very expensive to run, and there’s a really big need to train smaller models.” Smarter.
“Total coffee specialist. Hardcore reader. Incurable music scholar. Web guru. Freelance troublemaker. Problem solver. Travel trailblazer.”