机器学习词典(The Machine Learning Dictionary)8_分类词汇_双语文档

章节正文

Feedforward networks are fine for classifying objects, but their units (as distinct from their weights) have no memory of previous inputs. Consequently they are unable to cope with sequenceprediction tasks - tasks like predicting, given a sequence of sunspot activity counts, what the sunspot activity for the next time period will be, and financialprediction tasks (e.g. given share prices for the last n days, and presumably other economic data, etc., predict tomorrow's share price).

Simple recurrent nets can tackle tasks like this, because they do have a kind of memory for recording information derived from activation values from previous time steps.

This article is included for general interest - sequenceprediction tasks are not part of the syllabus of COMP9414 Artificial Intelligence.

sigmoidal nonlinearity

Another name for the logistic function and certain related functions (such as tanh(x)). Sigmoidal functions are a type of squashing function. They are called because sigma is the Greek letter "s", and the logistic functions look somewhat like a sloping letter "s" when graphed.

simple recurrent network

A simple recurrent network is like a feedforward network with an input layer, and output layer, and a single hidden layer, except that there is a further group of units called state units or context units. There is one state unit for each hidden unit. The activation function of the state unit is as follows: the activation of a state unit in time step n is the same of that of the correspondinghidden unit in time step n–1. That is, the state unit activations are copies of the hidden unit activations from the previous time step. Each state unit is also connected to each hidden unit by a trainable weight - the direction of this connection is from the state unit to the hidden unit.

SRN diagram

Simple recurrent networks can learn sequenceprediction tasks.

See also recurrent networks.

This article is included for general interest - simple recurrent networks are not part of the syllabus of COMP9414 Artificial Intelligence.

splitting criterion in ID3

The point of the ID3 algorithm is to decide the best attribute, out of those not already used, on which to split the training instances that are classified to a particular branch node.

The algorithm, in outline, is as follows:

if all the instances belong to a single class, there is nothing to do (except create a leaf node labelled with the name of that class).

otherwise, for each attribute that has not already been used, calculate the information gain that would be obtained by using that attribute on the particular set of instances classified to this branch node.

use the attribute with the greatest information gain.

This leaves the question of how to calculate the information gain associated with using a particular attribute A. Suppose that there are k classes C1, C2, ..., Ck, and that of the N instances classified to this node, I1 belong to class C1, I2 belong to class C2, ..., and Ik belong to class Ck.

Let p1 = I1/N, p2 = I2/N, ..., and pk = Ik/N.

The initial entropy E at this node is:

–p1log2(p1) –p2log2(p2) ... –pklog2(pk).

Now split the instances on each value of the chosen attribute A. Suppose that there are r attribute values for A, namely a1, a2, ..., ar.

For a particular value aj, say, suppose that there are Jj,1 instances in class C1, Jj,2 instances in class C2, ..., and Jj,k instances in class Ck, for a total of Jj instances having attribute value aj.

Let qj,1 = Jj,1/Jj, qj,2 = Jj,2/Jj, ..., and qj,k = Jj,k/Jj.

The entropy Ej associated with this attribute value aj this position is:

–qj,1log2(qj,1) –qj,2log2(qj,2) ... –qj,klog2(qj,k).

Now compute:

E – ((J1/N).E1 + (J2/N).E2 + ... + (Jr/N).Er).

This is the information gain for attribute A.

Note that Jj/N is the estimated probability that an instance classified to this node will have value aj for attribute A. Thus we are weighting the entropy estimates Ej by the estimated probability that an instance has the associated attribute value.

In terms of the example used in the lecture notes, (see also calculations in lecture notes), k = 2 since there are just two classes, positive and negative. I1 = 4 and I2 = 3, and N =7, and so p1 = 4/7 and p2 = 3/7, and E = –p1log2(p1) –p2log2(p2) = –(4/7)×log2(4/7) – (3/7)×log2(3/7). In the example, the first attribute A considered is size, and the first value of size considered, large, corresponds to a1, in the example in the lecture notes, so J1,1 = 2 = J1,2, and J1 = 4. Thus q1,1 = J1,1/J1 = 2/4 = ½, and q1,2 = J1,2/J1 = 2/4 = ½, and E1 = –q1,1log2(q1,1) –q1,2log2(q1,2) = -½×log2(½) – ½×log2(½) = 1.

Similarly E2 = 1 and J2 = 2 (size = small), and E3 = 0 and J3 = 1 (size = medium) so the final information gain,

E – ((J1/N).E1 + (J2/N).E2 + ... + (Jr/N).Er)

= E – ((4/7)×E1 + (2/7)×E2 + (1/7)×E3)

which turned out to be about 0.13 in the example.

squashing function

One of several functions, including sigmoid and step functions (see threshold used to transform the total net input to a neuron to obtain the final output of the neuron.

stopping criterion in backprop

Possible stopping criteria in error backpropagation learning include:

total error of the network falling below some predetermined level;

a certain number of epochs having been completed;

Combinations of the two (e.g. whichever of the two occurs first) and other stopping conditions are possible. See the reference by Haykin (Neural networks: a comprehensivefoundation p. 153) for more details.

supervised learning

Supervised learning is a kind of machine learning where the learning algorithm is provided with a set of inputs for the algorithm along with the corresponding correct outputs, and learning involves the algorithm comparing its current actualoutput with the correct or target outputs, so that it knows what its error is, and modify things accordingly.

Contrast unsupervised learning.

symbolic learning algorithms

Symbolic learning algorithms learn concepts by constructing a symbolic expression (such as a decision tree, as in Aq ) that describes a class (or classes) of objects. Many such systems work with representations equivalent to first order logic.

Such learning algorithms have the advantage that their internal representations and their final output can be inspected and relatively easily understood by humans.

See also function approximation algorithms.

distinct [di´stiŋkt] a.清楚的；独特的 (初中英语单词)
previous [´pri:viəs] a.先，前，以前的 (初中英语单词)
unable [ʌn´eibəl] a.不能的；无能为力的 (初中英语单词)
financial [fi´nænʃəl] a.金融的，财政的 (初中英语单词)
artificial [,ɑ:ti´fiʃəl] a.人工的；模拟的 (初中英语单词)
function [´fʌŋkʃən] n.机能；职责 vi.活动 (初中英语单词)
output [´autput] n.产品；产品；计算结果 (初中英语单词)
hidden [´hid(ə)n] hide 的过去分词 (初中英语单词)
connection [kə´nekʃən] n.联系；关系；联运 (初中英语单词)
attribute [ə´tribju:t] n.象征 vt.归因于 (初中英语单词)
outline [´autlain] n.外形 vt.画出...轮廓 (初中英语单词)
otherwise [´ʌðəwaiz] ad.另外 conj.否则 (初中英语单词)
instance [´instəns] n.例子，实例，例证 (初中英语单词)
obtain [əb´tein] v.获得；买到；得到承认 (初中英语单词)
learning [´lə:niŋ] n.学习；学问；知识 (初中英语单词)
reference [´refərəns] n.参考；参照；出处 (初中英语单词)
foundation [faun´deiʃən] n.建立；基金；地基 (初中英语单词)
actual [´æktʃuəl] a.现实的；实际的 (初中英语单词)
contrast [´kɔntrɑ:st] n.对比 v.使对比(照) (初中英语单词)
advantage [əd´vɑ:ntidʒ] n.优势；利益 (初中英语单词)
consequently [´kɔnsikwəntli] ad.因此，所以 (高中英语单词)
predict [pri´dikt] v.预言；预告；预示 (高中英语单词)
tackle [´tækəl] n.用具；装置 vt.处理 (高中英语单词)
related [ri´leitid] a.叙述的；有联系的 (高中英语单词)
initial [i´niʃəl] a.最初的 n.首字母 (高中英语单词)
namely [´neimli] ad.即，也就是 (高中英语单词)
probability [,prɔbə´biliti] n.或有；可能性 (高中英语单词)
positive [´pɔzətiv] a.确定的 (高中英语单词)
negative [´negətiv] a.否定的 n.否定词 (高中英语单词)
threshold [´θreʃhəuld] n.门槛；入门；开端 (高中英语单词)
transform [træns´fɔ:m] v.转化，转变；改造 (高中英语单词)
comprehensive [,kɔmpri´hensiv] a.综合的；理解的 (高中英语单词)
equivalent [i´kwivələnt] a.相等的 n.同等物 (高中英语单词)
internal [in´tə:nl] a.内部的；国内的 (高中英语单词)
relatively [´relətivli] ad.比较地；相对地 (高中英语单词)
prediction [pri´dikʃən] n.预告；(气象等)预报 (英语四级单词)
sequence [´si:kwəns] n.继续；顺序；程序 (英语四级单词)
network [´netwə:k] n.网状物 vt.联播 (英语四级单词)
corresponding [,kɔri´spɔndiŋ] a.符合的；相当的 (英语四级单词)
similarly [´similəli] ad.类似地，同样地 (英语四级单词)
presumably [pri´zju:məbli] ad.推测起来；大概 (英语六级单词)
whichever [witʃ´evə] a.&pron.无论哪个(些) (英语六级单词)

文章标签:词典

章节正文

上传人

网友

栏目分类

英语能力

英语词汇

分类词汇

文章信息

浏览:103

机器学习词典(The Machine Learning Dictionary)8

上传人

栏目分类

文章信息

相关文档

特色课程

专题

热门标签