AI for Teachers, An Open Textbook: Edition 1

AI Speak : Natural Language Processing

Natural language processing has been a topic on which research has worked in length for the past 50 years. This has led to the development of many tools we use every day:

More recently, chatbots, home assistants, automatic translation tools have been making a huge impact in all areas.

For a long time, research and industry was stalled by the intrinsic complexity of language. At the end of the 20th century, grammars for a language, written by experts, could have up to 50,000 rules.These expert systems were showing that technology could make a difference, but robust solutions were too complex to develop.

On the other hand, speech recognition needed to be able to make use of acoustic data and transform it into text. With the variety of speakers one could find, a very hard task indeed!

Researchers understood that if we had a model for the intended language, things would be easier: if we knew what were the words of the language, how sentences were formed, then it would be easier to find the right sentence from a set of candidates to match a given utterance, or to produce a valid translation from a set of possible sequences of words.

Another crucial topic has been that of semantics. Most of the work we can do to solve linguistic questions is shallow: the algorithms will produce an answer based on some local syntactic rules. If at the end the text means nothing, so be it. A similar thing may happen when we read a text by some pupils: we can correct the mistakes without really understanding what the text is about! A real challenge is to associate meaning to text, and when possible to uttered sentences.

In 2008 arrived a surprising result1: one unique language model could be learnt from a large amount of data and used for a variety of linguistic tasks. In fact, that unique model performed better than models trained for the specific tasks.

The model was a deep neural network. Nowhere as deep as the models used today! But enough to convince research and industry that machine learning, and more specifically deep learning was going to be the answer to many questions in NLP.

Since then, Natural language processing has ceased to follow a model driven approach and has been nearly always based on a data driven approach.

Traditionally, the main language tasks can be decomposed into 2 families: those involving building models and those involving decoding.

Building models

In order to transcribe, answer questions, generate dialogues or translate, you need to be able to know if “Je parle Français” is indeed a sentence in French or not. And as with spoken language grammar is not always followed accurately, the answer will want to be probabilistic: A sentence is more or less French. This allows the system to produce different candidate sentences (as the transcription of a sound, the translation of a sentence,…) and the probability can be a score. We can take the highest ranking sentence or combine the score with other sources of information (we may also be interested in what the sentence is about). Language models do this: the probabilities are built from machine learning algorithms. And of course, the more data there is, the better. For some languages there is a lot of data to build language models from. For others, this is not the case: these are under-resourced languages.

For the case of translation, we want not 2 but 3 models: a language model for each language and another model for the translations, informing us of what the better translations of fragments of language can be. These are difficult to produce when data is scarce. If models for common language pairs are easier to build, this will not be the case for languages that are not frequently spoken together (say Portuguese and Slovene). A typical way out here is to use a pivot language (typically English) and translate via this pivot language: from Portuguese to English and then from English to Slovene. Obviously leading to inferior results as the errors accumulate.

Decoding

Decoding is the process with which an algorithm takes the input sequence (which can be signal or text) and, by consulting the models, makes a decision, which will often be an output text. There are here some algorithmic considerations: in many cases transcription and translation are to happen in real time and diminishing the lag is a key issue. So there is room for a lot of artificial intelligence.

End-to-end

Nowadays, the approach of building these components separately and combining them later has been replaced by end-to-end approaches in which the system will transcribe/translate/interpret the input through a unique model. Currently, such models are trained by deep neural networks which can be huge: it is reported that the current largest GPT3 model comprises several hundred million parameters!

Let’s try to get the intuition: Suppose we have some data. This raw data can be encoded in some way. But the encoding can be very redundant, and perhaps even expensive. Let us now build a particular machine called an auto-encoder (see diagram below). This machine will be able to take a text, compress it into a small vector (this is the encoder), and then uncompress the vector (the decoder part) and restitute a text which is somehow close to the original text. The idea is that this mechanism will make the intermediate vector very meaningful with two desirable properties: a reasonably small vector “containing” the information of the initial text.

The future

An example of-end-to-end we will be seeing soon will be able to perform the following task: it will hear you speak your language, transcribe your text, translate it to a language you don’t know, train a speech synthesis system to your voice and have your own voice speak the corresponding text in a new sentence. Here are two examples produced by researchers at the Universidad Politecnica de Valencia, in Spain in which the speaker's own voice model is used to do the dubbing.

Some consequences for education

The steady progress of Natural language processing is remarkable. Where we would laugh at the stupid translations proposed by AI just 10 years ago, it becomes increasingly difficult to find gross errors today. Speech recognition and Character recognition techniques are also improving fast.
The semantics challenges are still there and answering questions which require a deep understanding of a text is still not working right. But things are going in the right direction. Which means that the teacher should expect some of the following statements to be true soon, if they are not already out there!In these examples it is clear that the AI will be far from perfect and the expert will detect that if the language is correct, the flow of ideas isn't. But let's face it: during the course of education, how long does it take for our pupils and students to reach that level?
------------------------------------------------------------------------------------------------------

1 Collobert, Ronan, and Jason Weston. “A unified architecture for natural language processing: Deep neural networks with multitask learning.” Proceedings of the 25th international conference on Machine learning. 2008. http://machinelearning.org/archive/icml2008/papers/391.pdf. Note: this reference is given for historical reasons. But it is difficult to read!

This page has paths:

This page references: