酷兔英语

章节正文

F

 

see attribute.
feedforward net
A kind of neural network in which the nodes can be numbered, in such a way that each node has weighted connections only to nodes with higher numbers. Such nets can be trained using the error backpropagation learning algorithm.

In practice, the nodes of most feedforward nets are partitioned into layers - that is, sets of nodes, and the layers may be numbered in such a way that the nodes in each layer are connected only to nodes in the next layer - that is, the layer with the next higher number. Commonly successive layers are totally interconnected - each node in the earlier layer is connected to every node in the next layer.

The first layer has no input connections, so consists of input units and is termed the input layer (yellow nodes in the diagram below).

The last layer has no output connections, so consists of output units and is termed the output layer (maroon nodes in the diagram below).

The layers in between the input and output layers are termed hidden layers, and consist of hidden units (light blue nodes and brown nodes in the diagram below).

When the net is operating, the activations of non-input neurons are computing using each neuron's activation function.

 


Feedforward network. All connections (arrows) are in one direction; there are no cycles of activation flow (cyclic subgraphs). Each colour identifies a different layer in the network. The layers 1 and 2 are fully interconnected, and so are layers 3 and 4. Layers 2 and 3 are only partly interconnected.

firing
  1. In a biological neural network: neurons in a biological neural network fire when and if they receive enough stimulus via their (input) synapses. This means that an electricalimpulse is propagated along the neuron's axon and transmitted to other neurons via the output synaptic connections of the neuron. The firing rate of a neuron is the frequency with which it fires (cf. activation in an artificial neural network.

     

  2. In an expert system: when a rule in the expertsystem is used, it is said to fire.
function approximation algorithms
include connectionist and statistical techniques of machine learning. The idea is that machine learning means learning, from a number of examples or instances or training patterns, to compute a function which has as its arguments variables corresponding to the input part of the training pattern(s), and has as its output variables corresponding to the output part of the training patterns, which maps the input part of each training pattern to its output part. The hope is that the function will interpolate / generalize from the training patterns, so that it will produce reasonable outputs when given other inputs.

See also symbolic learning algorithms.

forward pass in backpropagation
In the forward pass in backpropagation, each training pattern is presented to the input units of the network. The hidden unit activations are computed from the inputs and input-to-hidden unit weights, and then (in the case of a 3-layer network, with only a single layer of hidden units) the outputs are computed using the hidden layer activations and the current hidden-to-output weights. Certain statistics are kept from this computation, and used in the backward pass. The target outputs from each training pattern are compared with the actual activation levels of the output units - the difference between the two is termed the error. Training may be pattern-by-pattern or epoch-by-epoch. With pattern-by-pattern training, the pattern error is provided directly to the backward pass. With epoch-by-epoch training, the pattern errors are summed across all training patterns, and the total error is provided to the backward pass.

G

 

generalization in backprop
Learning in backprop seems to operate by first of all getting a rough set of weights which fit the training patterns in a general sort of way, and then working progressively towards a set of weights that fit the training patterns exactly. If learning goes too far down this path, one may reach a set of weights that fits the idiosyncrasies of the particular set of patterns very well, but does not interpolate (i.e. generalize) well.

Moreover, with large complex sets of training patterns, it is likely that some errors may occur, either in the inputs or in the outputs. In that case, and again particularly in the later parts of the learning process, it is likely that backprop will be contorting the weights so as to fit precisely around training patterns that are actually erroneous! This phenomenon is known as over-fitting.

This problem can to some extent be avoided by stopping learning early. How does one tell when to stop? One method is to partition the training patterns into two sets (assuming that there are enough of them). The larger part of the training patterns, say 80% of them, chosen at random, form the training set, and the remaining 20% are referred to as the test set. Every now and again during training, one measures the performance of the current set of weights on the test set. One normally finds that the error on the training set drops monotonically (that's what a gradient descent algorithm is supposed to do, after all). However, error on the test set (which will be larger, per pattern, than the error on the training set) will fall at first, then start to rise as the algorithm begins to overtrain. Best generalization performance is gained by stopping the algorithm at the point where error on the test set starts to rise.

generalized delta rule
An improvement on the in error backpropagation learning. If the learning rate (often denoted by η) is small, the backprop algorithm proceeds slowly, but accurately follows the path of steepest descent on the error surface. If η is too large, the algorithm may "bounce off the canyon walls of the error surface" - i.e. not work well. This can be largely avoided by modifying the delta rule to include a momentum term:

 

Δwji(n) = α Δwji(n–1) + η δj(n) yi(n)

in the notation of Haykin's text (Neural networks - a comprehensive foundation). The constant α is a termed the momentum constant and can be adjusted to achieve the best effect. The second summand corresponds to the standard delta rule, while the first summand says "add α × the previous change to this weight."

This new rule is called the generalized delta rule. The effect is that if the basic delta rule would be consistently pushing a weight in the same direction, then it gradually gathers "momentum" in that direction.

gradient descent
Understanding this term depends to some extent on the error surface metaphor.

When an artificial neural networklearning algorithm causes the weights of the net to change, it will do so in such a way that the current point on the error surface will descend into a valley of the error surface, in a direction that corresponds to the steepest (downhill) gradient or slope at the current point on the error surface. For this reason, backprop is said to be a gradient descent method, and to perform gradient descent in weight space.

See also local minimum.



文章标签:词典  

章节正文