AI for Teachers, An Open Textbook: Edition 1

AI Speak : Machine Learning

An algorithm is a fixed sequence of instructions for carrying out a task. It breaks down the task into easy, confusion-free steps : like a well written recipe.

Programming languages are languages that a computer can follow and execute. They act as a bridge between what we understand and what a machine can - ultimately, switches that go on and off. For a computer, images, videos, instructions are all 1s (switch is on) and 0s (switch is off).

When written in a programming language, an algorithm becomes a program. Applications are programs written for an end user. 

Conventional programs take in data and follow the instructions to give an output. Many early AI programs were conventional. Since the instructions cannot adapt to the data, these programs were not very good at things like predicting based on incomplete information and Natural language processing (NLP).
A search engine is powered by both conventional and Machine learning algorithms. As opposed to conventional programs, ML algorithms analyse data for patterns and use these patterns or rules to make future decisions or predictions. That is, based on data - good and bad examples, they find their own recipe.

These algorithms are well suited for situations with a lot of complexity and missing data. They can also monitor their performance and use this feedback to become better with use.

This is not very different from humans, especially babies learning skills outside the conventional educational system. Babies observe, repeat, learn, test their learning and improve. Where necessary, they improvise.

But the similarity between machines and humans is very shallow. "Learning" from a human perspective is much different, and way more nuanced and complex than "learning" for the machine.

A Classification Problem

One common task a ML application is used to perform is classification - Is this a photo of a dog or a cat? Is this student struggling or have they passed the exam? There are two or more groups. And the application has to classify new data into one of these groups.

Let us take the example of a pack of playing cards divided into two piles - Group A and Group B, following some pattern. We need to classify a new card, the ace of diamonds as belonging to Group A or Group B.
 
First, we need to understand how the groups are split - we need examples. Let us draw four cards from Group A and four from Group B. These  8 example cases form our training set - data which helps us see the pattern - "training" us to see the result.

As soon as we are shown the arrangement to the right, most of us would guess that the Ace of diamonds belongs to Group B. We do not need instructions, human brain is a pattern finding marvel. How would a machine do this?

ML algorithms are built on powerful statistical theories. Different algorithms are based on different mathematical equations that have to be chosen carefully to fit the task at hand. It is the job of the programmer to choose the data, analyse what features of the data are relevant to the particular problem and choose the correct ML algorithm.

The Importance of Data

The card draw above could have gone wrong in a number of ways. Please refer to the image. 1 has too few cards, no guess would be possible. 2 has more cards but all of the same suit - no way to know where diamonds would go. If the groups were not of the same size, 3 could very well mean that number cards are in group A and picture cards in group B.

Usually machine learning problems are more open ended and involve data sets much bigger than a pack of cards. Training sets have to be chosen with the help of statistical analysis or else errors creep in. Good data selection is crucial to a good ML application, more so than other types of programs. Machine learning needs a great number of relevant data. At an absolute minimum, a basic machine learning model should contain ten times as many data points as the total number of features.1That said, ML is also particularly equiped to handle noisy, messy and contradictory data.

Feature Extraction

When shown Group A and Group B examples above, the first thing you might have noticed could be the colour of the cards. Then the number or letter and the suit. For an algorithm all these features have to be entered specifically. It cannot know what is important to the problem automatically.

While selecting the features of interest, programmers have to ask themselves many questions. How many features are too few to be useful? How many features are too many? Which features are relevant for the task? What is the relationship between the chosen features - is one feature dependent on the other? With the chosen features, is it possible for the output to be accurate?

The Process

When the programmer is creating the application - they take data, extract features from it, choose an appropriate machine learning algorithm (mathematical function which defines the process), and train it using labeled data (in the case where the output is known - like Group A or Group B) so that the machine understand the pattern behind the problem.

For a machine understanding takes the form of a set of numbers - weights - that it assigns to each feature. With the correct assignment of weights, it can calculate the probability of a new card being in Group A or Group B. Typically, during the training stage, the programmer helps the machine by manually changing some values - this is called tuning the application.

Once this is done, the program has to be tested before putting in use. For this the labeled data that was not use for training would be given to the program. This is called the test data. The machine's performance in predicting the output would then be gauged. Once determined to be satisfactory, the program can be put to use : it is ready to take new data and make a decision or prediction about it.
The real time performance is then continuously monitored and improved (feature weights are adjusted to get better output). Often, real time performance gives different results than when ML is tested with already available data. Since experimenting with real users is expensive, takes high-effort, and often risky, algorithms are always tested using historic user data, which may not be able to assess impact on user behavior.This is why it is important to do a comprehensive evaluation of Machine Learning applications once in use : 

Feel like doing some hands on Machine Learning? Try this activity.

------------------------------------------------------------------------------------------------------
1Theobald, O. Machine Learning For Absolute Beginners: A Plain English Introduction (Second Edition) (Machine Learning From Scratch Book 1) (p. 24). Scatterplot Press. Kindle Edition.
Konstan, J., Terveen, L., Human-centered recommender systems: Origins, advances, challenges, and opportunities, AI Magazine, 42(3), 31-42, 2021

This page has paths:

This page references: