Overview of Machine Learning: Theoretical Basis and Development of Machine Learning

This article mainly refers to Wang Wei, researcher of Complex Systems and Intelligent Science Laboratory of Institute of Automation, Chinese Academy of Sciences, "Discussion on Machine Learning", discussing the description of machine learning, theoretical basis, development history and research status.

0 Preface

In the early 1990s, the then US Vice President proposed an important plan, the NaTIonal InformaTIon Infrastructure (NII). The technical meaning of this plan contains four aspects:

(1) Information can be easily obtained regardless of time and region.

(2) Information can be used effectively regardless of time and region.

(3) The hardware and software resources can be effectively utilized regardless of time and region.

(4) Ensure information security.

This article focuses on solving the problem of "effective use of information". The essence of this paper is how to build models or discover useful knowledge from massive data according to the specific needs of users. For computer science, this is machine learning.

机器学习综述——机器学习理论基础与发展脉络

Researchers in computer science, especially artificial intelligence, generally recognize Simon's discussion of learning: "If a system can improve its performance by performing a process, this is learning." This is a fairly broad description, the main point is " System, which covers computing systems, control systems, and human systems. Learning about these different systems clearly belongs to different scientific fields. Even the computing system, because of its different objectives, is divided into "machine learning that summarizes the world model of a particular problem from limited observations", "data analysis of various relationships implied in the observed data", and "data that mines useful knowledge from observational data." Dig" and other branches. Since the common goal of the various methods developed by these branches is “from a large amount of disordered information to a simple and orderly knowledge”, they can all be understood as “processes” in the sense of Simon, that is, “learning”. .

1 Machine learning description

This article will discuss methods that limit the "machine learning of a world model of a particular problem from limited observations" and "data analysis of various relationships implied in observational data from limited observations" and collectively refer to it as machine learning.

We describe the machine learning as follows:

Let W be a finite or infinite collection of all observations for a given world. Due to our limited ability to observe, we can only obtain a limited subset of the world, QW, called the sample set. Machine learning is based on this sample set, which extrapolates the model of the world and makes it true to the world (as much as possible).

This description implies three issues that need to be addressed:

(1) Consistent: Assume that World W has the same properties as Sample Set Q. For example, if the learning process is based on statistical principles, independent and identical distributions ( ii d ) are a class of consistent conditions.

(2) Division: Put the sample set into the n-dimensional space and find a decision interface (equivalent relationship) defined in this space, so that the different objects determined by the problem are divided into disjoint areas.

(3) Generalization: Generalization ability is an indicator of how true the model is to the world. From a finite set of samples, calculate a model to make this indicator the largest (minimum).

These problems impose quite stringent conditions on the observed data. First, people need to collect data according to the consensus hypothesis, which constitutes the sample set needed by the machine learning algorithm. Secondly, it needs to find a space to represent this problem. Finally, the generalization index of the model Need to meet consistent assumptions and be able to guide algorithm design. These conditions limit the scope of application of machine learning.

2 History of machine learning

2.1 Machine Learning and Artificial Intelligence

Machine learning is the core of artificial intelligence research. Its application has spread to various branches of artificial intelligence, such as expert systems, automatic reasoning, natural language understanding, pattern recognition, computer vision, intelligent robots and other fields.

Artificial intelligence involves issues such as consciousness (consciousness), self (self), mind (including unconscious spirit (unconscious_mind)) and the like. The only intelligence that people understand is the intelligence of people themselves. This is a widely accepted view. But our understanding of our own intelligence is very limited, and we have limited understanding of the necessary elements that constitute human intelligence, so it is difficult to define what is "intelligence" of "manual" manufacturing. Therefore, the study of artificial intelligence often involves the study of human intelligence itself. Other intelligences about animals or other man-made systems are also generally considered to be research topics related to artificial intelligence. The following figure shows the development of artificial intelligence:

Machine learning is an inevitable outcome of the development of artificial intelligence research to a certain stage. From the 1950s to the early 1970s, artificial intelligence research was in the "inference period", and people thought that as long as the machine was given logical reasoning ability, the machine could be intelligent. Representative work at this stage included A. Newell and H. Simon's "Logical Theoretician" program and the subsequent "General Problem Solving" program, which achieved exciting results at the time. For example, the "Logical Theorist" program in 1952 proved the 38 theorems in the famous mathematician Russell and Whitehead's masterpiece "Mathematical Principles". In 1963, all 52 theorems were proved, and theorem 2.85 was even better than Russell and Whitehead proved to be more subtle. A. Newell and H. Simon received the 1975 Turing Award. However, as research progresses, people gradually realize that only having logical reasoning ability is far from achieving artificial intelligence. EA Feigenbaum et al. believe that to make a machine intelligent, it is necessary to try to make the machine knowledgeable. Under their advocacy, artificial intelligence entered the “knowledge period” in the mid-1970s. During this period, a large number of expert systems came out and made great contributions in many fields. EA Feigenbaum received the Turing Award in 1994 as the father of “Knowledge Engineering”. However, the expert system faces the "knowledge engineering bottleneck". Simply put, it is quite difficult for people to summarize the knowledge and teach it to the computer. Therefore, some scholars think that if the machine can learn knowledge by itself, how good! In fact, Turing’s article on the Turing test in 1950 already mentioned the possibility of machine learning, and in the 1950s, it has already begun research work related to machine learning, mainly focusing on neural network-based connectionism. In terms of learning, representative work mainly includes F. Rosenblatt's perception machine, B. Widrow's Adaline, and so on. In the 1960s and 1970s, a variety of learning techniques were initially developed, such as statistical learning techniques based on decision theory and reinforcement learning techniques. Representative work mainly includes AL Samuel's checkers program and NJ Nilson's “learning machine”. "Well, some important results of the smashing statistical learning theory after more than 20 years are also achieved during this period." During this period, symbolic learning techniques based on logic or graph structure representations also began to appear. Representative work includes P. Winston's “Structure Learning System”, RS Michalski et al. “Logic-based Inductive Learning System”, EB Hunt, etc. Human "concept learning system" and so on. In the summer of 1980, the first machine learning seminar was held at Carnegie Mellon University in the United States. In the same year, Strategy Analysis and Information System produced three machine learning albums. In 1983, Tioga Press published RS Michalski. JG Carbonell and TM Mitchell, "Machine Learning: An Artificial Intelligence Path", which contains 16 articles written by 20 scholars, summed up the machine learning research work at that time and produced great repercussions; "Machine Learning" was founded; in 1989, "Artificial Intelligence" published a machine learning album, which published some of the more active research work at that time. The content appeared later in JG Carbonell editor-in-chief, MIT Press, 1990, Machine Learning : The style and method of the book. In general, the 1980s was a period in which machine learning became an independent subject area and began to develop rapidly and various machine learning techniques flourished. RS Michalski et al. divided machine learning research into “learning from examples”, “learning in problem solving and planning”, “learning through observation and discovery”, and “learning from instruction”; and EA Feigenbaum is famous. In the "Artificial Intelligence Handbook", the machine learning technology is divided into four categories, namely "mechanical learning", "teaching learning", "analog learning", "inductive learning".

2.2 The theoretical basis of machine learning

One of the scientific foundations of machine learning is neuroscience. However, the three major impacts on the progress of machine learning are the following three findings:

(1) James's discovery that neurons are interconnected.

(2) McCulloch and Pitts' findings about the way neurons work are "excited" and "suppressed."

(3) Hebb's learning law (change in the strength of neuron interconnection).

Among them, the discovery of McCulloch and Pitts has had a tremendous impact on modern information science. For machine learning, this achievement gives the basic model of modern machine learning, plus the Hebb learning law that guides the change of weights between connected neurons, which is the basis of most popular machine learning algorithms.

In 1954, Barlow and Hebb proposed different hypotheses when studying visual perception learning: Barlow advocated single-cell theory, assuming that input from the primary stage is concentrated to a single cell with specific response characteristics, and uses this neural single Cells appear to be visual objects. This consideration suggests that nerve cells may have more complex structures; and Hebb argues that visual objects are represented by interrelated neural cell aggregates and are called ensembles. In neuroscience research, although both hypotheses are supported by biological evidence, there is no biological conclusion to this debate. This biological reality leaves a space for imagination for our computer scientists. Since there are two different complementary research routes in machine learning, these two hypotheses have important implications for machine learning research.

In the study of machine learning partitioning, based on these two assumptions, the development history of machine learning can be clearly summarized as: Perceptron, BP and SVM, etc.; spline theory, k-nearest neighbor, Madalin e, symbol Machine learning, cluster machine learning, and manifold machine learning are another category.

Based on the McCulloch and Pitts models, in 1957, Rosenblatt first proposed the perceptron algorithm, which is the first machine learning algorithm with important academic significance. The bumpy course of this ideological development is a true portrayal of the development history of machine learning research. The main contributions of the perceptron algorithm are: First, borrow the simplest McCulloch and Pitts models as the neural cell model; then, according to the Hebb cluster considerations, multiple such neural cell models are clustered according to specific rules to form a neural network, and It translates into the following machine learning problems: Calculate a hyperplane that divides points in spatially different categories into different regions. Based on the optimization theory, Rosenblatt shows that if a sample set is linearly separable, the algorithm must converge with any precision. The resulting problem is how to deal with linear indivisible problems.

In 1969, Minsky and Paper published Perceptron, a book with far-reaching implications for machine learning research. At present, people generally know that because of the XOR problem in this book, the research direction of the perceptron is stifled. However, the basic ideas put forward in the study of machine learning in this work are still correct today. The core of the idea is two:

(1) Algorithm capability: An algorithm that can only solve linear problems is not enough. An algorithm that can solve nonlinear problems is needed.

(2) Computational complexity: Algorithms that only solve the problem of the toy world are meaningless, and algorithms that can solve real world problems are needed.

In 1986, the BP algorithm of Rumelhart et al. solved the XOR problem. The research direction of the perceptron in the past twenty years has been re-accepted, and people have since re-focused on this research direction. This is an important contribution of Rumelhart et al.

Another important research in the 1960s came from Widrow. In 1960, Widrow introduced the Madaline model. In terms of algorithm, the linear indivisible problem is essentially abandoning the continuous and smooth condition of the decision-making interface that divides the sample set, and replacing the plane of the segmentation. From a modern point of view, the main difference between this study and the neuroscience hypothesis of perceptrons is that it is an idea to confirm that neural cells have a more complex structure in the Barlow hypothesis, thereby considering linear models (eg, perceptrons) For the neural cell model (rather than the simple McCulloch and Pitts models), then based on the Hebb neuron assembly hypothesis, these local models are clustered into a representation of the problem world, thereby solving the linear indivisible problem. However, this research is far less famous than the perceptron, for the following reasons: First, although Madaline can solve the linear indivisible problem, its solution may be trivial; second, Widrow does not give its theoretical basis, in fact, its The theoretical basis is far more complicated than the perceptron. Until 1990, Schapire proved the "weak learnable theorem" according to Valiant's "probability approximation correct (PAC)" theory, which really attracted people's attention.

Further comparison of the neuroscience implications of two different routes in machine learning is interesting: for machine learning, the most significant difference is the assumption of a neural cell model. For example, the perceptron is based on the simplest McCulloch and Pitts models. The neural cell model, while Madaline is a neural model of the local model of the problem world, both methods need to be clustered according to the Hebb thought. Therefore, for machine learning research, the two neurosciences are complementary. However, there are differences between the two: the former emphasizes the integrity of the model, which is consistent with Barlow's “single cell theory of representation of objects”. Therefore, we call it the Barlow route; while the latter emphasizes that the representation of the world requires multiple nerve cells. Cluster, which is consistent with Hebb's “Multiple Cell Theory for Representing Objects”, which we call the Hebb route. Given the fundamental difference in computation between the overall model and the local model, despite the distinction between machine learning based on the Barlow and Hebb assumptions.

At the end of this section, it may be interesting to compare Carbonell's vision for a decade after machine learning in 1989 with Diet Terich's vision ten years later. We hope to illustrate the changes in machine learning research due to problems. The occurrence of changes (Table 1).

Industrial Mini PC

Industrial Mini Computer,Compact Industrial Pc,Industrial Fanless Mini Pc,Industrial Nuc Pc

Guangdong Elieken Electronic Technology Co.,Ltd. , https://www.elieken.com