Looking at recent crime investigations TV series, it seems natural to use facial recognition technology to identify criminals caught on CCTV. It feels like yesterday that we were shocked to see the match between AlphaGo and Lee Se-dol, but artificial intelligence seems to be in our everyday life already.
1. History of Artificial Intelligence
The possibility of artificial intelligence, which had been studied sporadically since the late 1940s, was named after a conference held by scientists such as Marvin Minsky, John McCarthy, Claude Shannon, and Nathan Rochester at Dartmouth University in the United States in 1956. However, the concept of artificial intelligence they thought was quite abstract and comprehensive, meaning a complex computer with characteristics similar to human intelligence.
Arthur Samuel, a former IBM engineer, created a function that calculates the winning percentage based on where the chessman is placed, and selects the highest winning percentage, in order to minimize memory usage when developing a chess program. He had the program memorize all the possible cases that were created in the process, and in 1959, and named it Machine Learning.

In 1957, Frank Rosenblatt of Cornell Aerospace Laboratory developed Perceptron, an early form of artificial neural network, based on Arthur Samuel’s attempt. It was planned to be installed on an early computer called IBM 704 to perform image recognition, but the recognition rate was low and errors occurred too frequently, failing to demonstrate the expected performance. As a result, skepticism about machine learning spread, providing an opportunity for related studies to stagnate.
In 2013, Google acquired a start-up called DNNresearch founded in 2012 by Professor Geoffrey Hinton and his students at the University of Toronto. DNNsearch is a company established to develop so-called deep learning algorithms that use information input/output layers similar to neurons in the brain to learn data. Through the new learning system, he solved the overfitting problem, which was the limitation of artificial neural network technology in the past. The release of high-performance GPUs, just in time, helped him to reduce the time required for computation. And machine learning gained public attention again.
2. Statistical Analysis Models
Artificial intelligence became actively used to predict time-series data thanks to big data accumulated by Internet search engines, mobile carriers, and credit card companies. Artificial intelligence took care of the huge amount of operations needed for big data analysis. The time-series data prediction models for artificial intelligence are based on traditional statistical analysis methods, such as moving average, exponential smoothing, and regression analysis.
The moving average is a technique for analyzing time-series patterns by averaging a predetermined number of preceding values among numerous time series data values. Since it analyzes only the patterns of past data values without considering the effects of other variables, it is easy to calculate, while its performance as a predictive model is not as much. The case of using an arithmetic mean to calculate the average is called a simple moving average, and the case of calculating the average by applying weights is called a weighted moving average.
The exponential smoothing is similar to the weighted moving average, but there is a difference in that it uses the entire time series data value and gives the most recent value an exponentially greater weight. It is easy to calculate like a moving average, with an advantage in predicting irregular time-series patterns. Simple exponential smoothing is used if there is no trend and seasonality in the data, double exponential smoothing is used if there is a trend, and triple exponential smoothing is used if there is both trend and seasonality.
Unlike miving average and exponential smoothing, the regression analysis is a technique that can take into account the influence of independent variables on dependent variables. The word “regression” means returning to the previous status. Starting in the late 19th century by Francis Galton’s hypothesis that there is a linear relationship between heights of parents and children. He found that the height tends to return to the overall average. It was established as a mathematical theory as functional relationships were derived through empirical tests. The case with one independent variable is called simple regression, and the case with multiple variables is called multiple regression.

When independent variables can be identified to some extent, regression analysis has demonstrated the best predictive performance, so it has been used most widely. However, there are many variations depending on the assumptions and variables, and there are also large differences in performance. Therefore, it was necessary to evaluate the performance of the models, and a residual, which means the difference between the predicted and actual values from the model, was used. The residuals are tested how evenly distributed according to the normal distribution, and the model is determined to work if the P-value with values between 0 and 1 is less than 0.05, and R-squared with values between 0 and 1 is closer to 1.
3. AI Forecasting Models
The time-series data prediction models for artificial intelligence are still developing at a rapid pace, most of which are based on regression analysis. Time series data prediction tasks through machine learning are largely divided into classification that determines the categories to which new data values belong, clustering that group data values into similar characteristics, regression that derives new unknown data values themselves, and dimensionality reduction that expedites the calculations. In other words, there are tasks whose outcome is a category, while there are tasks whose outcome is a number. And the models that perform these tasks can be broadly categorized into Legacy, Classic, and Topical.
Legacy models are developed earlier and are still the basis for subsequent models, but they are not used much now due to time-consuming operations. Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) fall under this category.
Classic models are widely used and validated for performance, and are currently the most commonly used models. The are often used for both classification and regression operations, the following models call under this category:
- Multi-Layer Perceptron (MLP) is an improvement of LSTM that predicts multiple-point results from a single data value or a single-point result from multiple data values.
- Autoregressive Integrated Moving Average (ARIMA) is a model in which the predicted result values correspond to the linear relationship between historical data values and predictive errors, combining the principle of regression and moving averages.
- Bayesian Neural Network (BNN) is a model that determines probability distribution functions from past data values, compares the result value of the function with the actual value, and improves the function. For example, BNN can indicate a probability relationship between a disease and symptoms, and the probability of the presence of various diseases is calculated from the given symptoms.
- Radial Basis Functions Neural Network (RBFNN) is a model that estimates functions for nonlinear data values and is useful when there are few variables. It is widely used to predict daily traffic on websites.
- Generalized Regression Neural Network (GRNN) is emerging as a new alternative to nonlinear data prediction, as it has been improved to include a wider range of variables based on RBFNN.
- K-Nearest Neighbor Regression Neural Network (KNN) is a model that has strengths in classifying data values affected by multiple variables. The problem is that it takes a considerable amount of time to determine the range of adjacent data, as K adjacent data in spaces with similar properties need to be learned.
- Classification and Regression Trees (CART) is a model that finds rules for predicting resultant values from variables whose value is unknown. There is a weakness in capturing changes inherent in data value attributes and trends.
- Support Vector Machine (SVM) creates a model that categorizes new data values based on given data values. This model displays data values in coordinate space, finds boundaries that show the largest distance, and divides categories. Recently, this is gaining attention in non-linear data classification.
- Gaussian Processes (GP) is an improved model based on BNN, based on probability distribution, and can explicitly obtain the probability distribution of various resultant values without a preset probability distribution function.
Topical models are relatively recently developed, and are often used in conjunction with other models taking have special functions. Convolutional Neural Network (CNN), Attention Mechanism, Transformer Neural Network, Light Gradient Boosting Machine (LightGBM), Decision Trees, XGBoost, AdaBoost fall under this category.
By the way, as there are various artificial intelligence time-series data prediction models, there are also various ways to evaluate their performance. Among them, two methods seem to be used frequently. One common factor in verifying the error rate based on percentage is the Mean Absolute Percentage Error (MAPE), which calculates the error rate by dividing the difference between the actual value and the predicted value by the actual value. However, when the actual value is close to 0, the error rate approaches infinity. To complement this, the Symmetric Mean Absolute Percentage Error (sMAPE) calculates the error rate by dividing the difference between the absolute value of the actual value and the absolute value of the predicted value by the average of the two.
There is already considerable progress in building infrastructure to develop and utilize artificial intelligence data prediction models. First of all, Python is by far the most commonly used programming language for development and is available to anyone for free. We also have a vast library for reference in developing machine learning algorithms through Python. For example, Scikit Learn which is also available free of charge to anyone, systematically categorizes libraries according to their desired functionality and provides APIs and modules by category.
However, the compoonent holding back further developments is the contaminated actual data that is all over the place. In order for artificial intelligence time-series data prediction models to be properly utilized, there must be a lot of consistent forms of past data, so it is still limited to the field and period accompanied with so-called big data. Of course, just as we can’t rewrite the history of the past, we can’t go back to the past and create data. Nevertheless, if human experience and artificial intelligence algorithms are properly combined, it does not seem impossible to reversely predict past data.