当前位置:   article > 正文

Python深度学习-快速指南

write the python code of a 6-layer stacked autoencoder (including 4 hidden l

Python深度学习-快速指南 (Python Deep Learning - Quick Guide)

Python深度学习-简介 (Python Deep Learning - Introduction)

Deep structured learning or hierarchical learning or deep learning in short is part of the family of machine learning methods which are themselves a subset of the broader field of Artificial Intelligence.

深度结构化学习或分层学习或简称为深度学习是机器学习方法家族的一部分,而机器学习方法本身就是更广泛的人工智能领域的子集。

Deep learning is a class of machine learning algorithms that use several layers of nonlinear processing units for feature extraction and transformation. Each successive layer uses the output from the previous layer as input.

深度学习是一类机器学习算法,它使用几层非线性处理单元进行特征提取和转换。 每个后续层都使用前一层的输出作为输入。

Deep neural networks, deep belief networks and recurrent neural networks have been applied to fields such as computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation, and bioinformatics where they produced results comparable to and in some cases better than human experts have.

深度神经网络,深度信念网络和递归神经网络已应用于计算机视觉,语音识别,自然语言处理,音频识别,社交网络过滤,机器翻译和生物信息学等领域,在这些领域所产生的结果可与之媲美,在某些情况下比人类专家要好。

Deep Learning Algorithms and Networks −

深度学习算法和网络-

  • are based on the unsupervised learning of multiple levels of features or representations of the data. Higher-level features are derived from lower level features to form a hierarchical representation.

    基于无监督学习数据的多个级别的特征或表示形式。 较高级别的功能从较低级别的功能派生而来,形成了层次表示。

  • use some form of gradient descent for training.

    使用某种形式的梯度下降进行训练。

Python深度学习-环境 (Python Deep Learning - Environment)

In this chapter, we will learn about the environment set up for Python Deep Learning. We have to install the following software for making deep learning algorithms.

在本章中,我们将学习为Python深度学习设置的环境。 我们必须安装以下软件来进行深度学习算法。

  • Python 2.7+

    Python 2.7以上
  • Scipy with Numpy

    脾气暴躁的西皮
  • Matplotlib

    Matplotlib
  • Theano

    茶野
  • Keras

    凯拉斯
  • TensorFlow

    TensorFlow

It is strongly recommend that Python, NumPy, SciPy, and Matplotlib are installed through the Anaconda distribution. It comes with all of those packages.

强烈建议通过Anaconda发行版安装Python,NumPy,SciPy和Matplotlib。 它包含所有这些软件包。

We need to ensure that the different types of software are installed properly.

我们需要确保正确安装了不同类型的软件。

Let us go to our command line program and type in the following command −

让我们进入命令行程序并输入以下命令-

  1. $ python
  2. Python 3.6.3 |Anaconda custom (32-bit)| (default, Oct 13 2017, 14:21:34)
  3. [GCC 7.2.0] on linux

Next, we can import the required libraries and print their versions −

接下来,我们可以导入所需的库并打印其版本-

  1. import numpy
  2. print numpy.__version__

输出量 (Output)

  1. 1.14.2

安装Theano,TensorFlow和Keras (Installation of Theano, TensorFlow and Keras)

Before we begin with the installation of the packages − Theano, TensorFlow and Keras, we need to confirm if the pip is installed. The package management system in Anaconda is called the pip.

在开始安装软件包-Theano,TensorFlow和Keras之前,我们需要确认是否已安装pip 。 Anaconda中的包裹管理系统称为pip。

To confirm the installation of pip, type the following in the command line −

要确认pip的安装,请在命令行中键入以下内容-

  1. $ pip

Once the installation of pip is confirmed, we can install TensorFlow and Keras by executing the following command −

确认安装了pip后,我们可以通过执行以下命令来安装TensorFlow和Keras-

  1. $pip install theano
  2. $pip install tensorflow
  3. $pip install keras

Confirm the installation of Theano by executing the following line of code −

通过执行以下代码行来确认Theano的安装-

  1. $python –c “import theano: print (theano.__version__)”

输出量 (Output)

  1. 1.0.1

Confirm the installation of Tensorflow by executing the following line of code −

通过执行以下代码行来确认Tensorflow的安装-

  1. $python –c “import tensorflow: print tensorflow.__version__”

输出量 (Output)

  1. 1.7.0

Confirm the installation of Keras by executing the following line of code −

通过执行以下代码行来确认Keras的安装-

  1. $python –c “import keras: print keras.__version__”
  2. Using TensorFlow backend

输出量 (Output)

  1. 2.1.5

Python深度基础机器学习 (Python Deep Basic Machine Learning)

Artificial Intelligence (AI) is any code, algorithm or technique that enables a computer to mimic human cognitive behaviour or intelligence. Machine Learning (ML) is a subset of AI that uses statistical methods to enable machines to learn and improve with experience. Deep Learning is a subset of Machine Learning, which makes the computation of multi-layer neural networks feasible. Machine Learning is seen as shallow learning while Deep Learning is seen as hierarchical learning with abstraction.

人工智能(AI)是使计算机能够模仿人类认知行为或智力的任何代码,算法或技术。 机器学习(ML)是AI的子集,它使用统计方法来使机器学习并根据经验进行改进。 深度学习是机器学习的一个子集,它使多层神经网络的计算变得可行。 机器学习被视为浅层学习,而深度学习被视为具有抽象的分层学习。

Machine learning deals with a wide range of concepts. The concepts are listed below −

机器学习涉及各种各样的概念。 概念在下面列出-

  • supervised

    监督的
  • unsupervised

    无监督
  • reinforcement learning

    强化学习
  • linear regression

    线性回归
  • cost functions

    成本函数
  • overfitting

    过度拟合
  • under-fitting

    不合身
  • hyper-parameter, etc.

    超参数等

In supervised learning, we learn to predict values from labelled data. One ML technique that helps here is classification, where target values are discrete values; for example,cats and dogs. Another technique in machine learning that could come of help is regression. Regression works onthe target values. The target values are continuous values; for example, the stock market data can be analysed using Regression.

在监督学习中,我们学习根据标记数据预测值。 分类法,其中目标值是离散值,这是帮助ML的一种技术。 例如猫和狗。 机器学习中的另一种可能会带来帮助的技术是回归。 回归适用于目标值。 目标值是连续值。 例如,可以使用回归分析股市数据。

In unsupervised learning, we make inferences from the input data that is not labelled or structured. If we have a million medical records and we have to make sense of it, find the underlying structure, outliers or detect anomalies, we use clustering technique to divide data into broad clusters.

在无监督学习中,我们从未标记或未结构化的输入数据中进行推断。 如果我们有一百万条医疗记录,并且我们必须弄清楚它,发现底层结构,离群值或检测异常,则可以使用聚类技术将数据划分为广泛的聚类。

Data sets are divided into training sets, testing sets, validation sets and so on.

数据集分为训练集,测试集,验证集等。

A breakthrough in 2012 brought the concept of Deep Learning into prominence. An algorithm classified 1 million images into 1000 categories successfully using 2 GPUs and latest technologies like Big Data.

2012年的一项突破使深度学习的概念倍受关注。 该算法使用2个GPU和最新技术(例如大数据)成功地将100万张图像分类为1000个类别。

深度学习与传统机器学习的关系 (Relating Deep Learning and Traditional Machine Learning)

One of the major challenges encountered in traditional machine learning models is a process called feature extraction. The programmer needs to be specific and tell the computer the features to be looked out for. These features will help in making decisions.

传统机器学习模型中遇到的主要挑战之一是称为特征提取的过程。 程序员需要具体说明并告诉计算机要注意的功能。 这些功能将有助于做出决策。

Entering raw data into the algorithm rarely works, so feature extraction is a critical part of the traditional machine learning workflow.

将原始数据输入算法很少,因此特征提取是传统机器学习工作流程的关键部分。

This places a huge responsibility on the programmer, and the algorithm's efficiency relies heavily on how inventive the programmer is. For complex problems such as object recognition or handwriting recognition, this is a huge issue.

这给程序员带来了巨大的责任,算法的效率在很大程度上取决于程序员的创造力。 对于诸如对象识别或手写识别之类的复杂问题,这是一个巨大的问题。

Deep learning, with the ability to learn multiple layers of representation, is one of the few methods that has help us with automatic feature extraction. The lower layers can be assumed to be performing automatic feature extraction, requiring little or no guidance from the programmer.

能够学习多层表示的深度学习是帮助我们进行自动特征提取的少数几种方法之一。 可以假设较低的层正在执行自动特征提取,几乎不需要程序员的指导。

人工神经网络 (Artificial Neural Networks)

The Artificial Neural Network, or just neural network for short, is not a new idea. It has been around for about 80 years.

人工神经网络,或者简称为神经网络,并不是一个新想法。 它已经存在了大约80年。

It was not until 2011, when Deep Neural Networks became popular with the use of new techniques, huge dataset availability, and powerful computers.

直到2011年,深度神经网络因使用新技术,巨大的数据集可用性和强大的计算机而变得流行。

A neural network mimics a neuron, which has dendrites, a nucleus, axon, and terminal axon.

神经网络模仿具有树突,核,轴突和末端轴突的神经元。

Terminal Axon

For a network, we need two neurons. These neurons transfer information via synapse between the dendrites of one and the terminal axon of another.

对于一个网络,我们需要两个神经元。 这些神经元通过突触在一个的树突和另一个的终轴突之间传递信息。

Neurons Transfer Information

A probable model of an artificial neuron looks like this −

人工神经元的可能模型看起来像这样-

Probable Model

A neural network will look like as shown below −

神经网络如下图所示-

Neural Network

The circles are neurons or nodes, with their functions on the data and the lines/edges connecting them are the weights/information being passed along.

圆圈是神经元或节点,它们在数据上具有功能,连接它们的线/边是传递的权重/信息。

Each column is a layer. The first layer of your data is the input layer. Then, all the layers between the input layer and the output layer are the hidden layers.

每列是一个层。 数据的第一层是输入层。 然后,输入层和输出层之间的所有层都是隐藏层。

If you have one or a few hidden layers, then you have a shallow neural network. If you have many hidden layers, then you have a deep neural network.

如果您有一个或几个隐藏层,那么您就拥有一个浅层的神经网络。 如果您有许多隐藏层,那么您将拥有一个深层的神经网络。

In this model, you have input data, you weight it, and pass it through the function in the neuron that is called threshold function or activation function.

在此模型中,您具有输入数据,对其进行加权,然后将其通过神经元中的函数(称为阈值函数或激活函数)传递。

Basically, it is the sum of all of the values after comparing it with a certain value. If you fire a signal, then the result is (1) out, or nothing is fired out, then (0). That is then weighted and passed along to the next neuron, and the same sort of function is run.

基本上,它是将它与某个特定值进行比较之后所有值的总和。 如果您发射信号,则结果为(1),否则没有结果,则为(0)。 然后将其加权并传递到下一个神经元,并运行相同类型的功能。

We can have a sigmoid (s-shape) function as the activation function.

我们可以将S型(s形)函数作为激活函数。

As for the weights, they are just random to start, and they are unique per input into the node/neuron.

至于权重,它们只是随机开始的,并且对于节点/神经元的每个输入都是唯一的。

In a typical "feed forward", the most basic type of neural network, you have your information pass straight through the network you created, and you compare the output to what you hoped the output would have been using your sample data.

在典型的“前馈”(神经网络的最基本类型)中,您的信息将直接通过创建的网络传递,然后将输出与希望使用示例数据获得的输出进行比较。

From here, you need to adjust the weights to help you get your output to match your desired output.

在这里,您需要调整权重以帮助您获得与所需输出匹配的输出。

The act of sending data straight through a neural network is called a feed forward neural network.

直接通过神经网络发送数据的行为称为前馈神经网络。

Our data goes from input, to the layers, in order, then to the output.

我们的数据从输入开始依次到各层,再到输出。

When we go backwards and begin adjusting weights to minimize loss/cost, this is called back propagation.

当我们倒退并开始调整权重以最小化损失/成本时,这称为反向传播。

This is an optimization problem. With the neural network, in real practice, we have to deal with hundreds of thousands of variables, or millions, or more.

这是一个优化问题。 使用神经网络,在实际中,我们必须处理成千上万个变量,甚至数百万个甚至更多。

The first solution was to use stochastic gradient descent as optimization method. Now, there are options like AdaGrad, Adam Optimizer and so on. Either way, this is a massive computational operation. That is why Neural Networks were mostly left on the shelf for over half a century. It was only very recently that we even had the power and architecture in our machines to even consider doing these operations, and the properly sized datasets to match.

第一个解决方案是使用随机梯度下降作为优化方法。 现在,有一些选项,例如AdaGrad,Adam Optimizer等。 无论哪种方式,这都是一个庞大的计算操作。 这就是为什么神经网络大部分被搁置了半个多世纪。 直到最近,我们甚至在机器中都拥有强大的功能和体系结构,甚至可以考虑执行这些操作,并选择合适大小的数据集进行匹配。

For simple classification tasks, the neural network is relatively close in performance to other simple algorithms like K Nearest Neighbors. The real utility of neural networks is realized when we have much larger data, and much more complex questions, both of which outperform other machine learning models.

对于简单的分类任务,神经网络在性能上与其他简单算法(例如K最近邻居)相对接近。 当我们拥有更大的数据和更复杂的问题时,神经网络才真正发挥作用,这两者都胜过其他机器学习模型。

深度神经网络 (Deep Neural Networks)

A deep neural network (DNN) is an ANN with multiple hidden layers between the input and output layers. Similar to shallow ANNs, DNNs can model complex non-linear relationships.

深度神经网络(DNN)是在输入和输出层之间具有多个隐藏层的ANN。 与浅层ANN相似,DNN可以对复杂的非线性关系建模。

The main purpose of a neural network is to receive a set of inputs, perform progressively complex calculations on them, and give output to solve real world problems like classification. We restrict ourselves to feed forward neural networks.

神经网络的主要目的是接收一组输入,对其进行渐进的复杂计算,并提供输出以解决诸如分类之类的现实问题。 我们限制自己前馈神经网络。

We have an input, an output, and a flow of sequential data in a deep network.

在深度网络中,我们具有输入,输出和顺序数据流。

Deep Network

Neural networks are widely used in supervised learning and reinforcement learning problems. These networks are based on a set of layers connected to each other.

神经网络广泛用于监督学习和强化学习问题。 这些网络基于彼此连接的一组层。

In deep learning, the number of hidden layers, mostly non-linear, can be large; say about 1000 layers.

在深度学习中,隐藏层的数量(大多数是非线性的)可能很大; 说大约1000层。

DL models produce much better results than normal ML networks.

DL模型比普通的ML网络产生更好的结果。

We mostly use the gradient descent method for optimizing the network and minimising the loss function.

我们主要使用梯度下降法来优化网络并最小化损失函数。

We can use the Imagenet, a repository of millions of digital images to classify a dataset into categories like cats and dogs. DL nets are increasingly used for dynamic images apart from static ones and for time series and text analysis.

我们可以使用Imagenet (数百万个数字图像的存储库)将数据集分类为猫和狗等类别。 DL网络越来越多地用于除静态图像之外的动态图像以及时间序列和文本分析。

Training the data sets forms an important part of Deep Learning models. In addition, Backpropagation is the main algorithm in training DL models.

训练数据集是深度学习模型的重要组成部分。 另外,反向传播是训练DL模型的主要算法。

DL deals with training large neural networks with complex input output transformations.

DL处理具有复杂输入输出转换的大型神经网络的训练。

One example of DL is the mapping of a photo to the name of the person(s) in photo as they do on social networks and describing a picture with a phrase is another recent application of DL.

DL的一个示例是将照片映射到照片中的人的名字,就像他们在社交网络上所做的那样,并用短语描述图片是DL的另一项最新应用。

DL Mapping

Neural networks are functions that have inputs like x1,x2,x3…that are transformed to outputs like z1,z2,z3 and so on in two (shallow networks) or several intermediate operations also called layers (deep networks).

神经网络是具有x1,x2,x3等输入的函数,这些函数在两个(浅网络)或几个中间操作(也称为层)(深层网络)中转换为z1,z2,z3等输出。

The weights and biases change from layer to layer. ‘w’ and ‘v’ are the weights or synapses of layers of the neural networks.

权重和偏差会随层的不同而变化。 “ w”和“ v”是神经网络各层的权重或突触。

The best use case of deep learning is the supervised learning problem.Here,we have large set of data inputs with a desired set of outputs.

深度学习的最佳用例是有监督的学习问题。在这里,我们有大量的数据输入和所需的一组输出。

Backpropagation Algorithm

Here we apply back propagation algorithm to get correct output prediction.

在这里,我们应用反向传播算法来获得正确的输出预测。

The most basic data set of deep learning is the MNIST, a dataset of handwritten digits.

深度学习的最基本数据集是MNIST,这是手写数字的数据集。

We can train deep a Convolutional Neural Network with Keras to classify images of handwritten digits from this dataset.

我们可以使用Keras深度训练卷积神经网络,以对该数据集中的手写数字图像进行分类。

The firing or activation of a neural net classifier produces a score. For example,to classify patients as sick and healthy,we consider parameters such as height, weight and body temperature, blood pressure etc.

触发或激活神经网​​络分类器会产生一个分数。 例如,为了将患者分类为健康患者,我们考虑身高,体重和体温,血压等参数。

A high score means patient is sick and a low score means he is healthy.

高分表示患者生病,低分表示患者健康。

Each node in output and hidden layers has its own classifiers. The input layer takes inputs and passes on its scores to the next hidden layer for further activation and this goes on till the output is reached.

输出层和隐藏层中的每个节点都有自己的分类器。 输入层接受输入并将其分数传递到下一个隐藏层以进行进一步激活,并一直进行到达到输出为止。

This progress from input to output from left to right in the forward direction is called forward propagation.

从输入到输出从左到右在向前方向上的这种进展称为前向传播。

Credit assignment path (CAP) in a neural network is the series of transformations starting from the input to the output. CAPs elaborate probable causal connections between the input and the output.

神经网络中的信用分配路径(CAP)是从输入到输出的一系列转换。 CAP详细说明了输入和输出之间可能的因果关系。

CAP depth for a given feed forward neural network or the CAP depth is the number of hidden layers plus one as the output layer is included. For recurrent neural networks, where a signal may propagate through a layer several times, the CAP depth can be potentially limitless.

给定前馈神经网络的CAP深度或CAP深度是隐藏层的数量加上包含输出层的一层。 对于递归神经网络,其中信号可能会多次传播穿过一层,因此CAP深度可能是无限的。

深网和浅网 (Deep Nets and Shallow Nets)

There is no clear threshold of depth that divides shallow learning from deep learning; but it is mostly agreed that for deep learning which has multiple non-linear layers, CAP must be greater than two.

没有明确的深度阈值将浅层学习与深度学习区分开。 但是大多数人都同意,对于具有多个非线性层的深度学习,CAP必须大于两个。

Basic node in a neural net is a perception mimicking a neuron in a biological neural network. Then we have multi-layered Perception or MLP. Each set of inputs is modified by a set of weights and biases; each edge has a unique weight and each node has a unique bias.

神经网络中的基本节点是模仿生物神经网络中神经元的感知。 然后我们有了多层感知或MLP。 每组输入都通过一组权重和偏差进行修改; 每个边缘都有唯一的权重,每个节点都有唯一的偏差。

The prediction accuracy of a neural net depends on its weights and biases.

神经网络的预测准确性取决于其权重和偏差。

The process of improving the accuracy of neural network is called training. The output from a forward prop net is compared to that value which is known to be correct.

提高神经网络准确性的过程称为训练。 将前向支撑网的输出与已知正确的值进行比较。

The cost function or the loss function is the difference between the generated output and the actual output.

成本函数或损失函数是生成的输出与实际输出之间的差。

The point of training is to make the cost of training as small as possible across millions of training examples.To do this, the network tweaks the weights and biases until the prediction matches the correct output.

训练的重点是使数百万个训练示例中的训练成本尽可能小。为此,网络会调整权重和偏差,直到预测与正确的输出匹配为止。

Once trained well, a neural net has the potential to make an accurate prediction every time.

一旦训练好,神经网络就有可能每次都能做出准确的预测。

When the pattern gets complex and you want your computer to recognise them, you have to go for neural networks.In such complex pattern scenarios, neural network outperformsall other competing algorithms.

当模式变得复杂而您想让计算机识别它们时,您就必须使用神经网络。在这种复杂的模式情况下,神经网络的性能优于所有其他竞争算法。

There are now GPUs that can train them faster than ever before. Deep neural networks are already revolutionizing the field of AI

现在有GPU可以比以往更快地训练它们。 深度神经网络已经在改变AI领域

Computers have proved to be good at performing repetitive calculations and following detailed instructions but have been not so good at recognising complex patterns.

事实证明,计算机擅长执行重复计算和遵循详细的说明,但对识别复杂的模式却不太擅长。

If there is the problem of recognition of simple patterns, a support vector machine (svm) or a logistic regression classifier can do the job well, but as the complexity of patternincreases, there is no way but to go for deep neural networks.

如果存在识别简单模式的问题,则支持向量机(svm)或逻辑回归分类器可以很好地完成工作,但是随着模式复杂性的增加,除了深度神经网络之外别无选择。

Therefore, for complex patterns like a human face, shallow neural networks fail and have no alternative but to go for deep neural networks with more layers. The deep nets are able to do their job by breaking down the complex patterns into simpler ones. For example, human face; adeep net would use edges to detect parts like lips, nose, eyes, ears and so on and then re-combine these together to form a human face

因此,对于像人脸这样的复杂模式,浅层神经网络会失败,并且别无选择,只能使用具有更多层的深层神经网络。 深层网络可以通过将复杂的模式分解为更简单的模式来完成其工作。 例如,人脸; adeep net将使用边缘检测嘴唇,鼻子,眼睛,耳朵等部分,然后将它们重新组合在一起以形成人脸

The accuracy of correct prediction has become so accurate that recently at a Google Pattern Recognition Challenge, a deep net beat a human.

正确预测的准确性变得如此精确,以至于在最近的Google模式识别挑战赛上,深网击败了人类。

This idea of a web of layered perceptrons has been around for some time; in this area, deep nets mimic the human brain. But one downside to this is that they take long time to train, a hardware constraint

关于分层感知器网的想法已经存在了一段时间。 在这个区域,深网模仿了人类的大脑。 但是这样做的一个缺点是他们需要花费很长时间进行训练,这是硬件的限制

However recent high performance GPUs have been able to train such deep nets under a week; while fast cpus could have taken weeks or perhaps months to do the same.

但是,最近的高性能GPU在一周之内就能训练出如此深的网络。 而快速的cpus可能要花费数周甚至数月才能完成相同的操作。

选择深网 (Choosing a Deep Net)

How to choose a deep net? We have to decide if we are building a classifier or if we are trying to find patterns in the data and if we are going to use unsupervised learning. To extract patterns from a set of unlabelled data, we use a Restricted Boltzman machine or an Auto encoder.

如何选择深网? 我们必须决定是否要构建分类器,或者是否要尝试在数据中查找模式,以及是否要使用无监督学习。 要从一组未标记的数据中提取模式,我们使用Restricted Boltzman机器或自动编码器。

Consider the following points while choosing a deep net −

选择深网时请考虑以下几点-

  • For text processing, sentiment analysis, parsing and name entity recognition, we use a recurrent net or recursive neural tensor network or RNTN;

    对于文本处理,情感分析,解析和名称实体识别,我们使用递归网络或递归神经张量网络或RNTN;

  • For any language model that operates at character level, we use the recurrent net.

    对于在字符级别运行的任何语言模型,我们都使用递归网络。

  • For image recognition, we use deep belief network DBN or convolutional network.

    对于图像识别,我们使用深度置信网络DBN或卷积网络。

  • For object recognition, we use a RNTN or a convolutional network.

    对于对象识别,我们使用RNTN或卷积网络。

  • For speech recognition, we use recurrent net.

    对于语音识别,我们使用递归网络。

In general, deep belief networks and multilayer perceptrons with rectified linear units or RELU are both good choices for classification.

通常,深度信念网络和带有整流线性单元或RELU的多层感知器都是分类的好选择。

For time series analysis, it is always recommended to use recurrent net.

对于时间序列分析,始终建议使用递归网络。

Neural nets have been around for more than 50 years; but only now they have risen into prominence. The reason is that they are hard to train; when we try to train them with a method called back propagation, we run into a problem called vanishing or exploding gradients.When that happens, training takes a longer time and accuracy takes a back-seat. When training a data set, we are constantly calculating the cost function, which is the difference between predicted output and the actual output from a set of labelled training data.The cost function is then minimized by adjusting the weights and biases values until the lowest value is obtained. The training process uses a gradient, which is the rate at which the cost will change with respect to change in weight or bias values.

神经网络已经存在了50多年了。 但是直到现在,它们才变得突出。 原因是他们很难训练。 当我们尝试使用一种称为向后传播的方法训练它们时,我们会遇到一个称为消失或爆炸梯度的问题。 在训练数据集时,我们会不断地计算成本函数,这是一组标记的训练数据的预测输出与实际输出之间的差值,然后通过调整权重和偏差值直至最低值来最小化成本函数获得。 训练过程使用梯度,即相对于重量或偏差值的变化,成本变化的速率。

受限制的Boltzman网络或自动编码器-RBN (Restricted Boltzman Networks or Autoencoders - RBNs)

In 2006, a breakthrough was achieved in tackling the issue of vanishing gradients. Geoff Hinton devised a novel strategy that led to the development of Restricted Boltzman Machine - RBM, a shallow two layer net.

2006年,在解决梯度消失问题上取得了突破。 杰夫·欣顿(Geoff Hinton)设计了一种新颖的策略,从而开发了浅层两层网络Restricted Boltzman Machine-RBM

The first layer is the visible layer and the second layer is the hidden layer. Each node in the visible layer is connected to every node in the hidden layer. The network is known as restricted as no two layers within the same layer are allowed to share a connection.

第一层是可见层,第二层是隐藏层。 可见层中的每个节点都连接到隐藏层中的每个节点。 该网络被称为受限网络,因为同一层内的任何两个层均不允许共享连接。

Autoencoders are networks that encode input data as vectors. They create a hidden, or compressed, representation of the raw data. The vectors are useful in dimensionality reduction; the vector compresses the raw data into smaller number of essential dimensions. Autoencoders are paired with decoders, which allows the reconstruction of input data based on its hidden representation.

自动编码器是将输入数据编码为矢量的网络。 它们创建原始数据的隐藏或压缩表示。 向量在降维方面很有用。 向量将原始数据压缩为较少的基本维数。 自动编码器与解码器配对,可以基于其隐藏表示重建输入数据。

RBM is the mathematical equivalent of a two-way translator. A forward pass takes inputs and translates them into a set of numbers that encodes the inputs. A backward pass meanwhile takes this set of numbers and translates them back into reconstructed inputs. A well-trained net performs back prop with a high degree of accuracy.

RBM是双向转换器的数学等效项。 前向传递获取输入并将其转换为一组数字,这些数字对输入进行编码。 同时,向后传递采用这组数字并将其转换回重构的输入。 训练有素的网具有很高的准确性,可以执行反向支撑。

In either steps, the weights and the biases have a critical role; they help the RBM in decoding the interrelationships between the inputs and in deciding which inputs are essential in detecting patterns. Through forward and backward passes, the RBM is trained to re-construct the input with different weights and biases until the input and there-construction are as close as possible. An interesting aspect of RBM is that data need not be labelled. This turns out to be very important for real world data sets like photos, videos, voices and sensor data, all of which tend to be unlabelled. Instead of manually labelling data by humans, RBM automatically sorts through data; by properly adjusting the weights and biases, an RBM is able to extract important features and reconstruct the input. RBM is a part of family of feature extractor neural nets, which are designed to recognize inherent patterns in data. These are also called auto-encoders because they have to encode their own structure.

在这两个步骤中,权重和偏见都起着至关重要的作用。 它们帮助RBM解码输入之间的相互关系,并确定哪些输入对于检测模式至关重要。 通过前进和后退,RBM被训练为使用不同的权重和偏差来重构输入,直到输入和此处的构建尽可能接近为止。 RBM的一个有趣方面是不需要标记数据。 事实证明,这对于诸如照片,视频,语音和传感器数据之类的现实世界数据集非常重要,而所有这些数据集往往都没有标签。 RBM无需人工人工标记数据,而是自动对数据进行分类。 通过适当地调整权重和偏差,RBM能够提取重要特征并重建输入。 RBM是特征提取器神经网络家族的一部分,其旨在识别数据中的固有模式。 这些也称为自动编码器,因为它们必须编码自己的结构。

RBM Structure

深度信仰网络-DBN (Deep Belief Networks - DBNs)

Deep belief networks (DBNs) are formed by combining RBMs and introducing a clever training method. We have a new model that finally solves the problem of vanishing gradient. Geoff Hinton invented the RBMs and also Deep Belief Nets as alternative to back propagation.

深度信念网络(DBN)是通过结合RBM并引入聪明的训练方法而形成的。 我们有了一个新模型,最终解决了梯度消失的问题。 杰夫·欣顿(Geoff Hinton)发明了RBM和Deep Belief Nets作为反向传播的替代方法。

A DBN is similar in structure to a MLP (Multi-layer perceptron), but very different when it comes to training. it is the training that enables DBNs to outperform their shallow counterparts

DBN的结构与MLP(多层感知器)相似,但是在训练方面却大不相同。 正是这种培训使DBN能够胜过其浅薄的竞争对手

A DBN can be visualized as a stack of RBMs where the hidden layer of one RBM is the visible layer of the RBM above it. The first RBM is trained to reconstruct its input as accurately as possible.

DBN可以可视化为一堆RBM,其中一个RBM的隐藏层是其上方RBM的可见层。 训练了第一个RBM,以尽可能准确地重建其输入。

The hidden layer of the first RBM is taken as the visible layer of the second RBM and the second RBM is trained using the outputs from the first RBM. This process is iterated till every layer in the network is trained.

将第一RBM的隐藏层用作第二RBM的可见层,并使用第一RBM的输出来训练第二RBM。 重复此过程,直到网络中的每个层都经过培训为止。

In a DBN, each RBM learns the entire input. A DBN works globally by fine-tuning the entire input in succession as the model slowly improves like a camera lens slowly focussing a picture. A stack of RBMs outperforms a single RBM as a multi-layer perceptron MLP outperforms a single perceptron.

在DBN中,每个RBM都会学习整个输入。 当模型缓慢地改善,就像相机镜头缓慢地聚焦图像时,DBN通过连续地微调整个输入来全局地工作。 堆叠的RBM胜过单个RBM,因为多层感知器MLP胜过单个感知器。

At this stage, the RBMs have detected inherent patterns in the data but without any names or label. To finish training of the DBN, we have to introduce labels to the patterns and fine tune the net with supervised learning.

在这一阶段,RBM已检测到数据中的固有模式,但没有任何名称或标签。 要完成DBN的培训,我们必须在模式上引入标签,并在监督学习的基础上对网络进行微调。

We need a very small set of labelled samples so that the features and patterns can be associated with a name. This small-labelled set of data is used for training. This set of labelled data can be very small when compared to the original data set.

我们需要一小组标记的样本,以便将特征和样式与名称相关联。 这组小标签的数据用于训练。 与原始数据集相比,这组标记数据可能很小。

The weights and biases are altered slightly, resulting in a small change in the net's perception of the patterns and often a small increase in the total accuracy.

权重和偏差会略有变化,从而导致网络对模式的感知发生很小的变化,并且总精度往往会有所增加。

The training can also be completed in a reasonable amount of time by using GPUs giving very accurate results as compared to shallow nets and we see a solution to vanishing gradient problem too.

与浅网相比,使用GPU提供的结果也非常准确,因此训练也可以在合理的时间内完成,并且我们也看到了消失的梯度问题的解决方案。

生成对抗网络-GAN (Generative Adversarial Networks - GANs)

Generative adversarial networks are deep neural nets comprising two nets, pitted one against the other, thus the “adversarial” name.

生成对抗网络是包括两个网络的深层神经网络,其中两个网络相互抵触,因此称为“对抗性”名称。

GANs were introduced in a paper published by researchers at the University of Montreal in 2014. Facebook’s AI expert Yann LeCun, referring to GANs, called adversarial training “the most interesting idea in the last 10 years in ML.”

GAN在2014年由蒙特利尔大学的研究人员发表的一篇论文中进行了介绍。Facebook的AI专家Yann LeCun在提到GAN时称对抗训练为“过去10年来ML最有趣的想法”。

GANs’ potential is huge, as the network-scan learn to mimic any distribution of data. GANs can be taught to create parallel worlds strikingly similar to our own in any domain: images, music, speech, prose. They are robot artists in a way, and their output is quite impressive.

随着网络扫描学会模仿数据的任何分布,GAN的潜力巨大。 可以教导GAN在任何领域创建与我们自己惊人相似的平行世界:图像,音乐,语音,散文。 从某种意义上说,他们是机器人艺术家,他们的作品令人印象深刻。

In a GAN, one neural network, known as the generator, generates new data instances, while the other, the discriminator, evaluates them for authenticity.

在GAN中,一个神经网络(称为生成器)会生成新的数据实例,而另一个神经网络(鉴别器)会对它们的真实性进行评估。

Let us say we are trying to generate hand-written numerals like those found in the MNIST dataset, which is taken from the real world. The work of the discriminator, when shown an instance from the true MNIST dataset, is to recognize them as authentic.

假设我们正在尝试生成类似于MNIST数据集中的手写数字,这些数字取自现实世界。 鉴别器的工作是在显示来自真实MNIST数据集的实例时将其识别为真实的。

Now consider the following steps of the GAN −

现在考虑GAN的以下步骤-

  • The generator network takes input in the form of random numbers and returns an image.

    生成器网络以随机数的形式获取输入并返回图像。

  • This generated image is given as input to the discriminator network along with a stream of images taken from the actual dataset.

    将该生成的图像与从实际数据集中获取的图像流一起作为输入提供给鉴别器网络。

  • The discriminator takes in both real and fake images and returns probabilities, a number between 0 and 1, with 1 representing a prediction of authenticity and 0 representing fake.

    鉴别器同时获取真实图像和伪造图像,并返回概率,介于0和1之间的数字,其中1代表对真实性的预测,0代表伪造。

  • So you have a double feedback loop −

    所以你有一个双重反馈循环-

    • The discriminator is in a feedback loop with the ground truth of the images, which we know.

      鉴别器处于反馈循环中,具有图像的基本事实,这是我们所知道的。

    • The generator is in a feedback loop with the discriminator.

      发生器与鉴别器处于反馈回路中。

递归神经网络-RNN (Recurrent Neural Networks - RNNs)

RNNSare neural networks in which data can flow in any direction. These networks are used for applications such as language modelling or Natural Language Processing (NLP).

RNN Sare神经网络,数据可以在任何方向流动。 这些网络用于语言建模或自然语言处理(NLP)等应用。

The basic concept underlying RNNs is to utilize sequential information. In a normal neural network it is assumed that all inputs and outputs are independent of each other. If we want to predict the next word in a sentence we have to know which words came before it.

RNN的基本概念是利用顺序信息。 在正常的神经网络中,假定所有输入和输出彼此独立。 如果我们想预测句子中的下一个单词,我们必须知道哪个单词在它之前。

RNNs are called recurrent as they repeat the same task for every element of a sequence, with the output being based on the previous computations. RNNs thus can be said to have a “memory” that captures information about what has been previously calculated. In theory, RNNs can use information in very long sequences, but in reality, they can look back only a few steps.

RNN之所以称为递归,是因为它们对序列的每个元素重复相同的任务,并且输出基于先前的计算。 因此,可以说RNN具有“内存”,可以捕获有关先前计算出的信息。 从理论上讲,RNN可以按很长的顺序使用信息,但实际上,它们只能回顾几步。

Recurrent Neural Networks

Long short-term memory networks (LSTMs) are most commonly used RNNs.

长短期内存网络(LSTM)是最常用的RNN。

Together with convolutional Neural Networks, RNNs have been used as part of a model to generate descriptions for unlabelled images. It is quite amazing how well this seems to work.

RNN与卷积神经网络一起被用作模型的一部分,以生成未标记图像的描述。 令人惊讶的是,这看起来效果如何。

卷积深度神经网络-CNN (Convolutional Deep Neural Networks - CNNs)

If we increase the number of layers in a neural network to make it deeper, it increases the complexity of the network and allows us to model functions that are more complicated. However, the number of weights and biases will exponentially increase. As a matter of fact, learning such difficult problems can become impossible for normal neural networks. This leads to a solution, the convolutional neural networks.

如果我们增加神经网络中的层数以使其更深,则它会增加网络的复杂性,并允许我们对更复杂的函数进行建模。 但是,权重和偏差的数量将成倍增加。 实际上,对于正常的神经网络来说,学习这样的难题变得不可能。 这导致了一个解决方案,即卷积神经网络。

CNNs are extensively used in computer vision; have been applied also in acoustic modelling for automatic speech recognition.

CNN广泛用于计算机视觉; 也已经在用于自动语音识别的声学建模中应用。

The idea behind convolutional neural networks is the idea of a “moving filter” which passes through the image. This moving filter, or convolution, applies to a certain neighbourhood of nodes which for example may be pixels, where the filter applied is 0.5 x the node value −

卷积神经网络背后的思想是穿过图像的“运动滤波器”的思想。 此移动滤镜或卷积应用于节点的某个邻域,例如可以是像素,其中应用的滤镜为节点值的0.5 x-

Noted researcher Yann LeCun pioneered convolutional neural networks. Facebook as facial recognition software uses these nets. CNN have been the go to solution for machine vision projects. There are many layers to a convolutional network. In Imagenet challenge, a machine was able to beat a human at object recognition in 2015.

著名的研究人员Yann LeCun开创了卷积神经网络。 Facebook作为面部识别软件使用了这些网络。 CNN已经成为机器视觉项目的解决方案。 卷积网络有很多层。 在Imagenet挑战中,一台机器在2015年的物体识别中能够击败人类。

In a nutshell, Convolutional Neural Networks (CNNs) are multi-layer neural networks. The layers are sometimes up to 17 or more and assume the input data to be images.

简而言之,卷积神经网络(CNN)是多层神经网络。 图层有时最多17个或更多,并假设输入数据为图像。

Convolutional Neural Networks

CNNs drastically reduce the number of parameters that need to be tuned. So, CNNs efficiently handle the high dimensionality of raw images.

CNN大大减少了需要调整的参数数量。 因此,CNN可以有效处理原始图像的高维度。

Python深度学习-基础 (Python Deep Learning - Fundamentals)

In this chapter, we will look into the fundamentals of Python Deep Learning.

在本章中,我们将研究Python深度学习的基础知识。

深度学习模型/算法 (Deep learning models/algorithms)

Let us now learn about the different deep learning models/ algorithms.

现在让我们了解不同的深度学习模型/算法。

Some of the popular models within deep learning are as follows −

深度学习中的一些流行模型如下-

  • Convolutional neural networks

    卷积神经网络
  • Recurrent neural networks

    递归神经网络
  • Deep belief networks

    深度信仰网络
  • Generative adversarial networks

    生成对抗网络
  • Auto-encoders and so on

    自动编码器等

The inputs and outputs are represented as vectors or tensors. For example, a neural network may have the inputs where individual pixel RGB values in an image are represented as vectors.

输入和输出表示为矢量或张量。 例如,神经网络可以具有输入,其中图像中的各个像素RGB值表示为矢量。

The layers of neurons that lie between the input layer and the output layer are called hidden layers. This is where most of the work happens when the neural net tries to solve problems. Taking a closer look at the hidden layers can reveal a lot about the features the network has learned to extract from the data.

位于输入层和输出层之间的神经元层称为隐藏层。 这是神经网络试图解决问题时大部分工作的地方。 仔细研究隐藏层可以揭示很多有关网络已学会从数据中提取的功能的信息。

Different architectures of neural networks are formed by choosing which neurons to connect to the other neurons in the next layer.

通过选择哪些神经元连接到下一层中的其他神经元,可以形成神经网络的不同体系结构。

用于计算输出的伪代码 (Pseudocode for calculating output)

Following is the pseudocode for calculating output of Forward-propagating Neural Network

以下是用于计算前向传播神经网络输出的伪代码-

  • # node[] := array of topologically sorted nodes

    #node []:=拓扑排序节点的数组
  • # An edge from a to b means a is to the left of b

    #从a到b的边表示a在b的左侧
  • # If the Neural Network has R inputs and S outputs,

    #如果神经网络具有R输入和S输出,
  • # then first R nodes are input nodes and last S nodes are output nodes.

    #然后,第一个R节点是输入节点,最后一个S节点是输出节点。
  • # incoming[x] := nodes connected to node x

    #入站[x]:=连接到节点x的节点
  • # weight[x] := weights of incoming edges to x

    #weight [x]:=输入边到x的权重

For each neuron x, from left to right −

对于每个神经元x,从左到右-

  • if x <= R: do nothing # its an input node

    如果x <= R:不执行任何操作#其输入节点
  • inputs[x] = [output[i] for i in incoming[x]]

    输入[x] = [输入[x]中i的输出[i]]
  • weighted_sum = dot_product(weights[x], inputs[x])

    weighted_sum = dot_product(权重[x],输入[x])
  • output[x] = Activation_function(weighted_sum)

    输出[x] =激活功能(加权和)

训练神经网络 (Training a Neural Network)

We will now learn how to train a neural network. We will also learn back propagation algorithm and backward pass in Python Deep Learning.

现在,我们将学习如何训练神经网络。 我们还将在Python深度学习中学习反向传播算法和反向传递。

We have to find the optimal values of the weights of a neural network to get the desired output. To train a neural network, we use the iterative gradient descent method. We start initially with random initialization of the weights. After random initialization, we make predictions on some subset of the data with forward-propagation process, compute the corresponding cost function C, and update each weight w by an amount proportional to dC/dw, i.e., the derivative of the cost functions w.r.t. the weight. The proportionality constant is known as the learning rate.

我们必须找到神经网络权重的最佳值才能获得所需的输出。 为了训练神经网络,我们使用迭代梯度下降法。 我们首先从权重的随机初始化开始。 随机初始化后,我们使用前向传播过程对数据的某些子集进行预测,计算相应的成本函数C,并以与dC / dw成比例的量更新每个权重w,即,重量。 比例常数称为学习率。

The gradients can be calculated efficiently using the back-propagation algorithm. The key observation of backward propagation or backward prop is that because of the chain rule of differentiation, the gradient at each neuron in the neural network can be calculated using the gradient at the neurons, it has outgoing edges to. Hence, we calculate the gradients backwards, i.e., first calculate the gradients of the output layer, then the top-most hidden layer, followed by the preceding hidden layer, and so on, ending at the input layer.

使用反向传播算法可以有效地计算出梯度。 向后传播或向后支撑的主要观察结果是,由于微分的链式规则,可以使用神经元的出射边缘来计算神经网络中每个神经元处的梯度。 因此,我们向后计算梯度,即首先计算输出层的梯度,然后是最顶层的隐藏层,然后是前面的隐藏层,依此类推,直到输入层为止。

The back-propagation algorithm is implemented mostly using the idea of a computational graph, where each neuron is expanded to many nodes in the computational graph and performs a simple mathematical operation like addition, multiplication. The computational graph does not have any weights on the edges; all weights are assigned to the nodes, so the weights become their own nodes. The backward propagation algorithm is then run on the computational graph. Once the calculation is complete, only the gradients of the weight nodes are required for update. The rest of the gradients can be discarded.

反向传播算法主要是使用计算图的思想来实现的,其中每个神经元都扩展到计算图中的许多节点,并执行简单的数学运算,例如加法,乘法。 计算图的边缘没有任何权重。 所有权重均分配给节点,因此权重成为其自己的节点。 然后在计算图上运行反向传播算法。 一旦计算完成,仅需要权重节点的梯度即可更新。 其余的梯度可以丢弃。

梯度下降优化技术 (Gradient Descent Optimization Technique)

One commonly used optimization function that adjusts weights according to the error they caused is called the “gradient descent.”

一种根据重量引起的误差调整权重的常用优化功能称为“梯度下降”。

Gradient is another name for slope, and slope, on an x-y graph, represents how two variables are related to each other: the rise over the run, the change in distance over the change in time, etc. In this case, the slope is the ratio between the network’s error and a single weight; i.e., how does the error change as the weight is varied.

梯度是坡度的另一个名称,坡度在xy图上表示两个变量如何相互关联:行程的上升,距离随时间的变化等。在这种情况下,坡度为网络错误与单个权重之间的比率; 也就是说,误差随着重量的变化如何变化。

To put it more precisely, we want to find which weight produces the least error. We want to find the weight that correctly represents the signals contained in the input data, and translates them to a correct classification.

为了更准确地说,我们想找出产生最小误差的权重。 我们想要找到正确表示输入数据中包含的信号的权重,并将其转换为正确的分类。

As a neural network learns, it slowly adjusts many weights so that they can map signal to meaning correctly. The ratio between network Error and each of those weights is a derivative, dE/dw that calculates the extent to which a slight change in a weight causes a slight change in the error.

随着神经网络的学习,它会缓慢调整许多权重,以便它们可以将信号正确映射到含义。 网络错误与这些权重中的每个权重之间的比率是导数dE / dw,它计算权重的轻微变化导致误差的轻微变化的程度。

Each weight is just one factor in a deep network that involves many transforms; the signal of the weight passes through activations and sums over several layers, so we use the chain rule of calculus to work back through the network activations and outputs.This leads us to the weight in question, and its relationship to overall error.

在涉及许多转换的深度网络中,每个权重只是一个因素。 权重的信号通过激活并在多个层上求和,因此我们使用演算的链式规则通过网络激活和输出进行反算,这导致了我们所讨论的权重及其与整体误差的关系。

Given two variables, error and weight, are mediated by a third variable, activation, through which the weight is passed. We can calculate how a change in weight affects a change in error by first calculating how a change in activation affects a change in Error, and how a change in weight affects a change in activation.

给定两个变量(错误和权重),由传递权重的第三个变量( 激活)介导。 我们可以通过首先计算激活变化如何影响误差变化以及重量变化如何影响激活变化来计算重量变化如何影响误差变化。

The basic idea in deep learning is nothing more than that: adjusting a model’s weights in response to the error it produces, until you cannot reduce the error any more.

深度学习的基本思想不过是:根据模型产生的误差来调整模型的权重,直到无法再减小误差为止。

The deep net trains slowly if the gradient value is small and fast if the value is high. Any inaccuracies in training leads to inaccurate outputs. The process of training the nets from the output back to the input is called back propagation or back prop. We know that forward propagation starts with the input and works forward. Back prop does the reverse/opposite calculating the gradient from right to left.

如果梯度值较小,则深网缓慢训练;如果梯度值较高,则深网训练很快。 培训中的任何不正确都会导致输出不准确。 从输出到输入的训练网络的过程称为反向传播或反向支撑。 我们知道前向传播从输入开始并向前进行。 反向Struts进行反向/相对计算,计算从右到左的梯度。

Each time we calculate a gradient, we use all the previous gradients up to that point.

每次我们计算梯度时,都会使用到该点之前的所有梯度。

Let us start at a node in the output layer. The edge uses the gradient at that node. As we go back into the hidden layers, it gets more complex. The product of two numbers between 0 and 1 gives youa smaller number. The gradient value keeps getting smaller and as a result back prop takes a lot of time to train and accuracy suffers.

让我们从输出层的一个节点开始。 边缘在该节点处使用渐变。 当我们回到隐藏层时,它变得更加复杂。 0到1之间的两个数字的乘积为您提供较小的数字。 梯度值会越来越小,结果是反向支撑要花费很多时间来训练,并且精度会受到影响。

深度学习算法的挑战 (Challenges in Deep Learning Algorithms)

There are certain challenges for both shallow neural networks and deep neural networks, like overfitting and computation time. DNNs are affected by overfitting because the use of added layers of abstraction which allow them to model rare dependencies in the training data.

浅层神经网络和深层神经网络都面临某些挑战,例如过度拟合和计算时间。 DNN受过度拟合的影响,因为使用了添加的抽象层,这使它们可以对训练数据中的稀有依赖性进行建模。

Regularization methods such as drop out, early stopping, data augmentation, transfer learning are applied during training to combat overfitting. Drop out regularization randomly omits units from the hidden layers during training which helps in avoiding rare dependencies. DNNs take into consideration several training parameters such as the size, i.e., the number of layers and the number of units per layer, the learning rate and initial weights. Finding optimal parameters is not always practical due to the high cost in time and computational resources. Several hacks such as batching can speed up computation. The large processing power of GPUs has significantly helped the training process, as the matrix and vector computations required are well-executed on the GPUs.

在训练过程中采用了正规化方法,例如辍学,提早停止,数据扩充,转移学习,以对抗过度拟合。 在训练过程中,放弃正规化会从隐藏层中随机删除单位,这有助于避免罕见的依赖关系。 DNN考虑了几个训练参数,例如大小,即层数和每层单元数,学习率和初始权重。 由于时间和计算资源的高昂成本,寻找最佳参数并不总是可行的。 批处理之类的一些技巧可以加快计算速度。 GPU的强大处理能力极大地帮助了训练过程,因为所需的矩阵和矢量计算可以在GPU上很好地执行。

退出 (Dropout)

Dropout is a popular regularization technique for neural networks. Deep neural networks are particularly prone to overfitting.

辍学是一种流行的神经网络正则化技术。 深度神经网络特别容易过度拟合。

Let us now see what dropout is and how it works.

现在让我们看看什么是辍学以及它是如何工作的。

In the words of Geoffrey Hinton, one of the pioneers of Deep Learning, ‘If you have a deep neural net and it's not overfitting, you should probably be using a bigger one and using dropout’.

用深度学习的先驱之一杰弗里·欣顿(Geoffrey Hinton)的话来说,“如果您拥有一个深层的神经网络并且没有过度拟合,那么您可能应该使用更大的网络并使用辍学功能”。

Dropout is a technique where during each iteration of gradient descent, we drop a set of randomly selected nodes. This means that we ignore some nodes randomly as if they do not exist.

辍学是一项技术,在该技术中,每次梯度下降迭代期间,我们都会丢弃一组随机选择的节点。 这意味着我们随机忽略某些节点,就好像它们不存在一样。

Each neuron is kept with a probability of q and dropped randomly with probability 1-q. The value q may be different for each layer in the neural network. A value of 0.5 for the hidden layers, and 0 for input layer works well on a wide range of tasks.

每个神经元以q的概率保留,并以1-q的概率随机丢弃。 对于神经网络中的每个层,值q可能不同。 对于广泛的任务,隐藏层的值为0.5,输入层的值为0。

During evaluation and prediction, no dropout is used. The output of each neuron is multiplied by q so that the input to the next layer has the same expected value.

在评估和预测期间,不使用任何辍学。 每个神经元的输出乘以q,以便到下一层的输入具有相同的期望值。

The idea behind Dropout is as follows − In a neural network without dropout regularization, neurons develop co-dependency amongst each other that leads to overfitting.

Dropout背后的想法如下-在没有Dropout正则化的神经网络中,神经元之间会形成相互依赖关系,从而导致过度拟合。

实施技巧 (Implementation trick)

Dropout is implemented in libraries such as TensorFlow and Pytorch by keeping the output of the randomly selected neurons as 0. That is, though the neuron exists, its output is overwritten as 0.

通过将随机选择的神经元的输出保持为0,可在TensorFlow和Pytorch之类的库中实现Dropout。也就是说,尽管存在神经元,但其输出将被覆盖为0。

提前停止 (Early Stopping)

We train neural networks using an iterative algorithm called gradient descent.

我们使用称为梯度下降的迭代算法训练神经网络。

The idea behind early stopping is intuitive; we stop training when the error starts to increase. Here, by error, we mean the error measured on validation data, which is the part of training data used for tuning hyper-parameters. In this case, the hyper-parameter is the stop criteria.

尽早停止的想法很直观。 当错误开始增加时,我们将停止训练。 在这里,误差是指在验证数据上测得的误差,该数据是用于调整超参数的训练数据的一部分。 在这种情况下,超参数是停止条件。

数据扩充 (Data Augmentation)

The process where we increase the quantum of data we have or augment it by using existing data and applying some transformations on it. The exact transformations used depend on the task we intend to achieve. Moreover, the transformations that help the neural net depend on its architecture.

我们使用现有数据并对其进行一些转换来增加或增加所拥有数据量的过程。 使用的确切转换取决于我们要完成的任务。 此外,帮助神经网络的转换取决于其体系结构。

For instance, in many computer vision tasks such as object classification, an effective data augmentation technique is adding new data points that are cropped or translated versions of original data.

例如,在诸如对象分类之类的许多计算机视觉任务中,有效的数据增强技术正在添加新数据点,这些新数据点是原始数据的裁剪或翻译版本。

When a computer accepts an image as an input, it takes in an array of pixel values. Let us say that the whole image is shifted left by 15 pixels. We apply many different shifts in different directions, resulting in an augmented dataset many times the size of the original dataset.

当计算机接受图像作为输入时,它将接受一个像素值数组。 我们说整个图像向左移动15个像素。 我们在不同方向上应用了许多不同的移位,从而形成了一个扩展数据集,其大小是原始数据集的许多倍。

转移学习 (Transfer Learning)

The process of taking a pre-trained model and “fine-tuning” the model with our own dataset is called transfer learning. There are several ways to do this.A few ways are described below −

采取预先训练的模型并使用我们自己的数据集对模型进行“微调”的过程称为转移学习。 有几种方法可以做到这一点,下面介绍几种方法-

  • We train the pre-trained model on a large dataset. Then, we remove the last layer of the network and replace it with a new layer with random weights.

    我们在大型数据集上训练预训练模型。 然后,我们删除网络的最后一层,并用具有随机权重的新层替换它。

  • We then freeze the weights of all the other layers and train the network normally. Here freezing the layers is not changing the weights during gradient descent or optimization.

    然后,我们冻结所有其他层的权重并正常训练网络。 在这里,冻结层不会在梯度下降或优化过程中改变权重。

The concept behind this is that the pre-trained model will act as a feature extractor, and only the last layer will be trained on the current task.

其背后的概念是预训练的模型将充当特征提取器,并且仅最后一层将在当前任务上进行训练。

计算图 (Computational Graphs)

Backpropagation is implemented in deep learning frameworks like Tensorflow, Torch, Theano, etc., by using computational graphs. More significantly, understanding back propagation on computational graphs combines several different algorithms and its variations such as backprop through time and backprop with shared weights. Once everything is converted into a computational graph, they are still the same algorithm − just back propagation on computational graphs.

反向传播是通过使用计算图在Tensorflow,Torch,Theano等深度学习框架中实现的。 更重要的是,了解计算图上的反向传播结合了几种不同的算法及其变体,例如通过时间的反向传播和具有共享权重的反向传播。 一旦一切都转换为计算图,它们仍然是相同的算法-只是在计算图上反向传播。

什么是计算图 (What is Computational Graph)

A computational graph is defined as a directed graph where the nodes correspond to mathematical operations. Computational graphs are a way of expressing and evaluating a mathematical expression.

计算图被定义为有向图,其中节点对应于数学运算。 计算图是表达和评估数学表达式的一种方式。

For example, here is a simple mathematical equation −

例如,这是一个简单的数学方程式-

p=x+y

p=x+y

We can draw a computational graph of the above equation as follows.

我们可以如下绘制上述方程的计算图。

Computational Graph Equation1

The above computational graph has an addition node (node with "+" sign) with two input variables x and y and one output q.

上面的计算图包含一个带有两个输入变量x和y和一个输出q的加法节点(带有“ +”号的节点)。

Let us take another example, slightly more complex. We have the following equation.

让我们再举一个例子,稍微复杂一点。 我们有以下等式。

g=(x+y)z

g= left(x+y right) astz

The above equation is represented by the following computational graph.

上式由以下计算图表示。

Computational Graph Equation2

计算图和反向传播 (Computational Graphs and Backpropagation)

Computational graphs and backpropagation, both are important core concepts in deep learning for training neural networks.

计算图和反向传播都是深度学习中训练神经网络的重要核心概念。

前进通行证 (Forward Pass)

Forward pass is the procedure for evaluating the value of the mathematical expression represented by computational graphs. Doing forward pass means we are passing the value from variables in forward direction from the left (input) to the right where the output is.

前向传递是评估由计算图表示的数学表达式的值的过程。 进行正向传递意味着我们将变量的值沿正向从左(输入)传递到输出所处的右侧。

Let us consider an example by giving some value to all of the inputs. Suppose, the following values are given to all of the inputs.

让我们考虑一个示例,它为所有输入提供一些价值。 假设以下值被赋予所有输入。

x=1,y=3,z=3

x=1y=3z=3

By giving these values to the inputs, we can perform forward pass and get the following values for the outputs on each node.

通过将这些值提供给输入,我们可以执行前向传递,并在每个节点上获得输出的以下值。

First, we use the value of x = 1 and y = 3, to get p = 4.

首先,我们使用x = 1和y = 3的值得到p = 4。

Forward Pass

Then we use p = 4 and z = -3 to get g = -12. We go from left to right, forwards.

然后我们使用p = 4和z = -3得到g = -12。 我们从左到右前进。

Forward Pass Equation

后退通行证的目的 (Objectives of Backward Pass)

In the backward pass, our intention is to compute the gradients for each input with respect to the final output. These gradients are essential for training the neural network using gradient descent.

在后向传递中,我们的目的是计算每个输入相对于最终输出的梯度。 这些梯度对于使用梯度下降训练神经网络至关重要。

For example, we desire the following gradients.

例如,我们需要以下渐变。

所需的渐变 (Desired gradients)

xf,yf,zf

 frac partialx partialf frac partialy partialf frac partialz partialf

向后传播(反向传播) (Backward pass (backpropagation))

We start the backward pass by finding the derivative of the final output with respect to the final output (itself!). Thus, it will result in the identity derivation and the value is equal to one.

我们通过找到最终输出相对于最终输出(本身!)的导数来开始向后传递。 因此,它将导致同一性推导,并且该值等于1。

gg=1

 frac partialg partialg=1

Our computational graph now looks as shown below −

我们的计算图现在看起来如下所示-

Backward Pass

Next, we will do the backward pass through the "*" operation. We will calculate the gradients at p and z. Since g = p*z, we know that −

接下来,我们将通过“ *”操作进行反向传递。 我们将计算p和z处的梯度。 由于g = p * z,我们知道-

gz=p

 frac partialg partialz=p

gp=z

 frac partialg partialp=z

We already know the values of z and p from the forward pass. Hence, we get −

我们已经从向前传递中知道了z和p的值。 因此,我们得到-

gz=p=4

 frac partialg partialz=p=4

and

gp=z=3

 frac partialg partialp=z=3

We want to calculate the gradients at x and y −

我们要计算x和y处的梯度-

gx,gy

 frac partialg partialx frac partialg partialy

However, we want to do this efficiently (although x and g are only two hops away in this graph, imagine them being really far from each other). To calculate these values efficiently, we will use the chain rule of differentiation. From chain rule, we have −

但是,我们希望有效地做到这一点(尽管x和g在此图中仅相距两跳,想象它们之间确实距离很远)。 为了有效地计算这些值,我们将使用微分的链式规则。 根据链式规则,我们有-

gx=gppx

 frac partialg partialx= frac partialg partialp ast frac partialp partialx

gy=gppy

 frac partialg partialy= frac partialg partialp ast frac partialp partialy

But we already know the dg/dp = -3, dp/dx and dp/dy are easy since p directly depends on x and y. We have −

但是我们已经知道dg / dp = -3,dp / dx和dp / dy很容易,因为p直接取决于x和y。 我们有-

p=x+yxp=1,yp=1

p=x+y Rightarrow frac partialx partialp=1 frac partialy partialp=1

Hence, we get −

因此,我们得到-

gf=gppx=(3).1=3

 frac partialg partialf= frac partialg partialp ast frac partialp partialx= left(3 right).1=3

In addition, for the input y −

另外,对于输入y-

gy=gppy=(3).1=3

 frac partialg partialy= frac partialg partialp ast frac partialp partialy= left(3 right).1=3

The main reason for doing this backwards is that when we had to calculate the gradient at x, we only used already computed values, and dq/dx (derivative of node output with respect to the same node's input). We used local information to compute a global value.

向后进行此操作的主要原因是,当我们必须计算x处的梯度时,我们仅使用已经计算出的值和dq / dx(节点输出相对于同一节点输入的导数)。 我们使用本地信息来计算全球价值。

训练神经网络的步骤 (Steps for training a neural network)

Follow these steps to train a neural network −

请按照以下步骤训练神经网络-

  • For data point x in dataset,we do forward pass with x as input, and calculate the cost c as output.

    对于数据集中的数据点x,我们将x作为输入进行正向传递,并计算成本c作为输出。

  • We do backward pass starting at c, and calculate gradients for all nodes in the graph. This includes nodes that represent the neural network weights.

    我们从c开始进行反向传递,并计算图中所有节点的梯度。 这包括代表神经网络权重的节点。

  • We then update the weights by doing W = W - learning rate * gradients.

    然后,我们通过执行W = W-学习率*梯度来更新权重。

  • We repeat this process until stop criteria is met.

    我们重复此过程,直到满足停止条件。

Python深度学习-应用 (Python Deep Learning - Applications)

Deep learning has produced good results for a few applications such as computer vision, language translation, image captioning, audio transcription, molecular biology, speech recognition, natural language processing, self-driving cars, brain tumour detection, real-time speech translation, music composition, automatic game playing and so on.

深度学习在诸如计算机视觉,语言翻译,图像字幕,音频转录,分子生物学,语音识别,自然语言处理,自动驾驶汽车,脑瘤检测,实时语音翻译,音乐等少数应用中产生了良好的结果构图,自动游戏等等。

Deep learning is the next big leap after machine learning with a more advanced implementation. Currently, it is heading towards becoming an industry standard bringing a strong promise of being a game changer when dealing with raw unstructured data.

深度学习是继机器学习之后实现更高级实现的下一个重大飞跃。 当前,它正朝着成为行业标准的方向发展,在处理原始的非结构化数据时,有望成为改变游戏规则的有力保证。

Deep learning is currently one of the best solution providers fora wide range of real-world problems. Developers are building AI programs that, instead of using previously given rules, learn from examples to solve complicated tasks. With deep learning being used by many data scientists, deeper neural networks are delivering results that are ever more accurate.

深度学习是目前针对各种现实问题的最佳解决方案提供商之一。 开发人员正在构建AI程序,而不是使用先前给定的规则,而是从示例中学习以解决复杂的任务。 随着许多数据科学家使用深度学习,更深层的神经网络正在提供更加准确的结果。

The idea is to develop deep neural networks by increasing the number of training layers for each network; machine learns more about the data until it is as accurate as possible. Developers can use deep learning techniques to implement complex machine learning tasks, and train AI networks to have high levels of perceptual recognition.

这个想法是通过增加每个网络的训练层数来开发深度神经网络。 机器会更多地了解数据,直到数据尽可能准确为止。 开发人员可以使用深度学习技术来执行复杂的机器学习任务,并训练AI网络具有高水平的感知识别能力。

Deep learning finds its popularity in Computer vision. Here one of the tasks achieved is image classification where given input images are classified as cat, dog, etc. or as a class or label that best describe the image. We as humans learn how to do this task very early in our lives and have these skills of quickly recognizing patterns, generalizing from prior knowledge, and adapting to different image environments.

深度学习在计算机视觉中很受欢迎。 这里实现的任务之一是图像分类,其中给定的输入图像被分类为猫,狗等,或分类为最能描述图像的类或标签。 作为人类,我们很早就学会了如何执行此任务,并具有快速识别模式,从先验知识中概括并适应不同图像环境的这些技能。

库和框架 (Libraries and Frameworks)

In this chapter, we will relate deep learning to the different libraries and frameworks.

在本章中,我们将深度学习与不同的库和框架相关联。

深度学习和Theano (Deep learning and Theano)

If we want to start coding a deep neural network, it is better we have an idea how different frameworks like Theano, TensorFlow, Keras, PyTorch etc work.

如果我们想开始对深度神经网络进行编码,最好了解一下Theano,TensorFlow,Keras,PyTorch等不同框架如何工作。

Theano is python library which provides a set of functions for building deep nets that train quickly on our machine.

Theano是python库,它提供了一组功能来构建可在我们的机器上快速训练的深层网络。

Theano was developed at the University of Montreal, Canada under the leadership of Yoshua Bengio a deep net pioneer.

Theano是在加拿大深奥网络先驱Yoshua Bengio的领导下在加拿大蒙特利尔大学开发的。

Theano lets us define and evaluate mathematical expressions with vectors and matrices which are rectangular arrays of numbers.

Theano让我们用向量和矩阵定义和评估数学表达式,向量和矩阵是数字的矩形阵列。

Technically speaking, both neural nets and input data can be represented as matrices and all standard net operations can be redefined as matrix operations. This is important since computers can carry out matrix operations very quickly.

从技术上讲,神经网络和输入数据都可以表示为矩阵,而所有标准网络运算都可以重新定义为矩阵运算。 这很重要,因为计算机可以非常快速地执行矩阵运算。

We can process multiple matrix values in parallel and if we build a neural net with this underlying structure, we can use a single machine with a GPU to train enormous nets in a reasonable time window.

我们可以并行处理多个矩阵值,如果我们使用此基础结构构建神经网络,则可以使用带有GPU的单台机器在合理的时间范围内训练庞大的网络。

However if we use Theano, we have to build the deep net from ground up. The library does not provide complete functionality for creating a specific type of deep net.

但是,如果使用Theano,则必须从头开始构建深层网络。 该库不提供用于创建特定类型的深网的完整功能。

Instead, we have to code every aspect of the deep net like the model, the layers, the activation, the training method and any special methods to stop overfitting.

相反,我们必须对深层网络的各个方面进行编码,例如模型,层,激活,训练方法以及任何特殊方法,以防止过度拟合。

The good news however is that Theano allows the building our implementation over a top of vectorized functions providing us with a highly optimized solution.

好消息是,Theano允许在矢量化函数之上构建我们的实现,从而为我们提供了高度优化的解决方案。

There are many other libraries that extend the functionality of Theano. TensorFlow and Keras can be used with Theano as backend.

还有许多其他库可以扩展Theano的功能。 TensorFlow和Keras可以与Theano一起用作后端。

使用TensorFlow进行深度学习 (Deep Learning with TensorFlow)

Googles TensorFlow is a python library. This library is a great choice for building commercial grade deep learning applications.

Google的TensorFlow是一个python库。 该库是构建商业级深度学习应用程序的绝佳选择。

TensorFlow grew out of another library DistBelief V2 that was a part of Google Brain Project. This library aims to extend the portability of machine learning so that research models could be applied to commercial-grade applications.

TensorFlow源自另一个库DistBelief V2,该库是Google Brain Project的一部分。 该库旨在扩展机器学习的可移植性,以便将研究模型应用于商业级应用程序。

Much like the Theano library, TensorFlow is based on computational graphs where a node represents persistent data or math operation and edges represent the flow of data between nodes, which is a multidimensional array or tensor; hence the name TensorFlow

与Theano库非常相似,TensorFlow基于计算图,其中一个节点表示持久数据或数学运算,边缘表示节点之间的数据流,多维数组或张量。 因此名称为TensorFlow

The output from an operation or a set of operations is fed as input into the next.

一个操作或一组操作的输出将作为输入馈入下一个。

Even though TensorFlow was designed for neural networks, it works well for other nets where computation can be modelled as data flow graph.

即使TensorFlow是为神经网络设计的,但它也适用于其他可以将计算建模为数据流图的网络。

TensorFlow also uses several features from Theano such as common and sub-expression elimination, auto differentiation, shared and symbolic variables.

TensorFlow还使用了Theano的多项功能,例如消除公共和子表达式,自动区分,共享和符号变量。

Different types of deep nets can be built using TensorFlow like convolutional nets, Autoencoders, RNTN, RNN, RBM, DBM/MLP and so on.

使用TensorFlow可以构建不同类型的深层网络,例如卷积网络,自动编码器,RNTN,RNN,RBM,DBM / MLP等。

However, there is no support for hyper parameter configuration in TensorFlow.For this functionality, we can use Keras.

但是,在TensorFlow中不支持超参数配置。为此功能,我们可以使用Keras。

深度学习和Keras (Deep Learning and Keras)

Keras is a powerful easy-to-use Python library for developing and evaluating deep learning models.

Keras是一个功能强大的易于使用的Python库,用于开发和评估深度学习模型。

It has a minimalist design that allows us to build a net layer by layer; train it, and run it.

它具有极简设计,可让我们逐层构建网络。 训练它并运行它。

It wraps the efficient numerical computation libraries Theano and TensorFlow and allows us to define and train neural network models in a few short lines of code.

它包装了高效的数值计算库Theano和TensorFlow,并允许我们用几行短代码定义和训练神经网络模型。

It is a high-level neural network API, helping to make wide use of deep learning and artificial intelligence. It runs on top of a number of lower-level libraries including TensorFlow, Theano,and so on. Keras code is portable; we can implement a neural network in Keras using Theano or TensorFlow as a back ended without any changes in code.

它是高级神经网络API,有助于广泛使用深度学习和人工智能。 它在TensorFlow,Theano等许多较低级别的库之上运行。 Keras代码是可移植的; 我们可以使用Theano或TensorFlow作为后端在Keras中实现神经网络,而无需更改任何代码。

Python深度学习-实现 (Python Deep Learning - Implementations)

In this implementation of Deep learning, our objective is to predict the customer attrition or churning data for a certain bank - which customers are likely to leave this bank service. The Dataset used is relatively small and contains 10000 rows with 14 columns. We are using Anaconda distribution, and frameworks like Theano, TensorFlow and Keras. Keras is built on top of Tensorflow and Theano which function as its backends.

在实施深度学习的过程中,我们的目标是预测特定银行的客户流失或搅动数据-哪些客户可能会退出该银行服务。 使用的数据集相对较小,包含10000行和14列。 我们正在使用Anaconda发行版以及Theano,TensorFlow和Keras之类的框架。 Keras建立在Tensorflow和Theano的后端,后者充当后端。

  1. # Artificial Neural Network
  2. # Installing Theano
  3. pip install --upgrade theano
  4. # Installing Tensorflow
  5. pip install –upgrade tensorflow
  6. # Installing Keras
  7. pip install --upgrade keras

步骤1:资料预处理 (Step 1: Data preprocessing)

  1. In[]:
  2. # Importing the libraries
  3. import numpy as np
  4. import matplotlib.pyplot as plt
  5. import pandas as pd
  6. # Importing the database
  7. dataset = pd.read_csv('Churn_Modelling.csv')

第2步 (Step 2)

We create matrices of the features of dataset and the target variable, which is column 14, labeled as “Exited”.

我们创建数据集特征和目标变量的矩阵,该变量位于第14列,标记为“已退出”。

The initial look of data is as shown below −

数据的初始外观如下所示-

  1. In[]:
  2. X = dataset.iloc[:, 3:13].values
  3. Y = dataset.iloc[:, 13].values
  4. X

输出量 (Output)

Step Output

第三步 (Step 3)

  1. Y

输出量 (Output)

  1. array([1, 0, 1, ..., 1, 1, 0], dtype = int64)

第4步 (Step 4)

We make the analysis simpler by encoding string variables. We are using the ScikitLearn function ‘LabelEncoder’ to automatically encode the different labels in the columns with values between 0 to n_classes-1.

通过编码字符串变量,我们使分析更加简单。 我们正在使用ScikitLearn函数“ LabelEncoder”自动对列中值在0到n_classes-1之间的不同标签进行编码。

  1. from sklearn.preprocessing import LabelEncoder, OneHotEncoder
  2. labelencoder_X_1 = LabelEncoder()
  3. X[:,1] = labelencoder_X_1.fit_transform(X[:,1])
  4. labelencoder_X_2 = LabelEncoder()
  5. X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])
  6. X

输出量 (Output)

Step4 Output

In the above output,country names are replaced by 0, 1 and 2; while male and female are replaced by 0 and 1.

在上面的输出中,国家名称被0、1和2替换; 而将男性和女性分别替换为0和1。

第5步 (Step 5)

Labelling Encoded Data

标记编码数据

We use the same ScikitLearn library and another function called the OneHotEncoder to just pass the column number creating a dummy variable.

我们使用相同的ScikitLearn库和另一个称为OneHotEncoder的函数来传递列号,从而创建一个虚拟变量。

  1. onehotencoder = OneHotEncoder(categorical features = [1])
  2. X = onehotencoder.fit_transform(X).toarray()
  3. X = X[:, 1:]
  4. X

Now, the first 2 columns represent the country and the 4th column represents the gender.

现在,前两列代表国家,第四列代表性别。

输出量 (Output)

Step5 Output

We always divide our data into training and testing part; we train our model on training data and then we check the accuracy of a model on testing data which helps in evaluating the efficiency of model.

我们始终将数据分为培训和测试部分; 我们在训练数据上训练模型,然后在测试数据上检查模型的准确性,这有助于评估模型的效率。

第6步 (Step 6)

We are using ScikitLearn’s train_test_split function to split our data into training set and test set. We keep the train- to- test split ratio as 80:20.

我们正在使用ScikitLearn的train_test_split函数将数据分为训练集和测试集。 我们将火车与测试的比率保持为80:20。

  1. #Splitting the dataset into the Training set and the Test Set
  2. from sklearn.model_selection import train_test_split
  3. X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)

Some variables have values in thousands while some have values in tens or ones. We scale the data so that they are more representative.

有些变量的值是数千,而有些变量的值是十或一。 我们缩放数据,以便它们更具代表性。

步骤7 (Step 7)

In this code, we are fitting and transforming the training data using the StandardScaler function. We standardize our scaling so that we use the same fitted method to transform/scale test data.

在此代码中,我们使用StandardScaler函数拟合和转换训练数据。 我们将缩放比例标准化,以便我们使用相同的拟合方法来转换/缩放测试数据。

  1. # Feature Scaling
  1. fromsklearn.preprocessing import StandardScaler
  2. sc = StandardScaler()
  3. X_train = sc.fit_transform(X_train)
  4. X_test = sc.transform(X_test)

输出量 (Output)

step7 output

The data is now scaled properly. Finally, we are done with our data pre-processing. Now,we will start with our model.

现在,数据已正确缩放。 最后,我们完成了数据预处理。 现在,我们将从模型开始。

步骤8 (Step 8)

We import the required Modules here. We need the Sequential module for initializing the neural network and the dense module to add the hidden layers.

我们在此处导入所需的模块。 我们需要用于初始化神经网络的顺序模块和需要添加隐藏层的密集模块。

  1. # Importing the Keras libraries and packages
  2. import keras
  3. from keras.models import Sequential
  4. from keras.layers import Dense

步骤9 (Step 9)

We will name the model as Classifier as our aim is to classify customer churn. Then we use the Sequential module for initialization.

我们将模型命名为“分类器”,因为我们的目的是对客户流失进行分类。 然后,我们使用顺序模块进行初始化。

  1. #Initializing Neural Network
  2. classifier = Sequential()

第10步 (Step 10)

We add the hidden layers one by one using the dense function. In the code below, we will see many arguments.

我们使用稠密功能一一添加隐藏层。 在下面的代码中,我们将看到许多参数。

Our first parameter is output_dim. It is the number of nodes we add to this layer. init is the initialization of the Stochastic Gradient Decent. In a Neural Network we assign weights to each node. At initialization, weights should be near to zero and we randomly initialize weights using the uniform function. The input_dim parameter is needed only for first layer, as the model does not know the number of our input variables. Here the total number of input variables is 11. In the second layer, the model automatically knows the number of input variables from the first hidden layer.

我们的第一个参数是output_dim 。 这是我们添加到此层的节点数。 init是随机渐变体面的初始化。 在神经网络中,我们为每个节点分配权重。 初始化时,权重应接近零,我们使用统一函数随机初始化权重。 由于模型不知道输入变量的数量,因此仅在第一层需要input_dim参数。 此处,输入变量的总数为11。在第二层中,模型会自动从第一隐藏层中知道输入变量的数目。

Execute the following line of code to addthe input layer and the first hidden layer −

执行以下代码行以添加输入层和第一个隐藏层-

  1. classifier.add(Dense(units = 6, kernel_initializer = 'uniform',
  2. activation = 'relu', input_dim = 11))

Execute the following line of code to add the second hidden layer −

执行以下代码行以添加第二个隐藏层-

  1. classifier.add(Dense(units = 6, kernel_initializer = 'uniform',
  2. activation = 'relu'))

Execute the following line of code to add the output layer −

执行以下代码行添加输出层-

  1. classifier.add(Dense(units = 1, kernel_initializer = 'uniform',
  2. activation = 'sigmoid'))

步骤11 (Step 11)

Compiling the ANN

编译神经网络

We have added multiple layers to our classifier until now. We will now compile them using the compile method. Arguments added in final compilation control complete the neural network.So,we need to be careful in this step.

到目前为止,我们已经在分类器中添加了多层。 现在,我们将使用compile方法对其进行编译 。 在最终编译控件中添加的参数完善了神经网络。因此,在此步骤中我们需要小心。

Here is a brief explanation of the arguments.

这是对参数的简要说明。

First argument is Optimizer.This is an algorithm used to find the optimal set of weights. This algorithm is called the Stochastic Gradient Descent (SGD). Here we are using one among several types, called the ‘Adam optimizer’. The SGD depends on loss, so our second parameter is loss. If our dependent variable is binary, we use logarithmic loss function called ‘binary_crossentropy’, and if our dependent variable has more than two categories in output, then we use ‘categorical_crossentropy’. We want to improve performance of our neural network based on accuracy, so we add metrics as accuracy.

第一个参数是Optimizer,这是一种用于找到最佳权重集的算法。 该算法称为随机梯度下降(SGD) 。 在这里,我们使用几种类型中的一种,称为“亚当优化器”。 SGD取决于损失,因此我们的第二个参数是损失。 如果我们的因变量是二进制,则使用称为'binary_crossentropy'的对数损失函数,如果我们的因变量在输出中具有两个以上的类别,则使用'categorical_crossentropy' 。 我们希望基于准确性来改善神经网络的性能,因此我们添加了度量作为准确性。

  1. # Compiling Neural Network
  2. classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

步骤12 (Step 12)

A number of codes need to be executed in this step.

在此步骤中需要执行许多代码。

将ANN拟合到训练集 (Fitting the ANN to the Training Set)

We now train our model on the training data. We use the fit method to fit our model. We also optimize the weights to improve model efficiency. For this, we have to update the weights. Batch size is the number of observations after which we update the weights. Epoch is the total number of iterations. The values of batch size and epoch are chosen by the trial and error method.

现在,我们在训练数据上训练模型。 我们使用拟合方法来拟合我们的模型。 我们还优化权重以提高模型效率。 为此,我们必须更新权重。 批次大小是观察值的数量,之后我们将更新权重。 时代是迭代的总数。 批次大小和时期的值是通过反复试验方法选择的。

  1. classifier.fit(X_train, y_train, batch_size = 10, epochs = 50)

进行预测并评估模型 (Making predictions and evaluating the model)

  1. # Predicting the Test set results
  2. y_pred = classifier.predict(X_test)
  3. y_pred = (y_pred > 0.5)

预测一个新的观察 (Predicting a single new observation)

  1. # Predicting a single new observation
  2. """Our goal is to predict if the customer with the following data will leave the bank:
  3. Geography: Spain
  4. Credit Score: 500
  5. Gender: Female
  6. Age: 40
  7. Tenure: 3
  8. Balance: 50000
  9. Number of Products: 2
  10. Has Credit Card: Yes
  11. Is Active Member: Yes

步骤13 (Step 13)

Predicting the test set result

预测测试结果

The prediction result will give you probability of the customer leaving the company. We will convert that probability into binary 0 and 1.

预测结果将给您客户离开公司的可能性。 我们将那个概率转换为二进制0和1。

  1. # Predicting the Test set results
  2. y_pred = classifier.predict(X_test)
  3. y_pred = (y_pred > 0.5)
  1. new_prediction = classifier.predict(sc.transform
  2. (np.array([[0.0, 0, 500, 1, 40, 3, 50000, 2, 1, 1, 40000]])))
  3. new_prediction = (new_prediction > 0.5)

步骤14 (Step 14)

This is the last step where we evaluate our model performance. We already have original results and thus we can build confusion matrix to check the accuracy of our model.

这是我们评估模型性能的最后一步。 我们已经有了原始结果,因此可以构建混淆矩阵来检查模型的准确性。

Making the Confusion Matrix

制作混乱矩阵

  1. from sklearn.metrics import confusion_matrix
  2. cm = confusion_matrix(y_test, y_pred)
  3. print (cm)

输出量 (Output)

  1. loss: 0.3384 acc: 0.8605
  2. [ [1541 54]
  3. [230 175] ]

From the confusion matrix, the Accuracy of our model can be calculated as −

根据混淆矩阵,我们模型的精度可以计算为-

  1. Accuracy = 1541+175/2000=0.858

We achieved 85.8% accuracy, which is good.

我们达到了85.8%的准确度 ,这是很好的。

正向传播算法 (The Forward Propagation Algorithm)

In this section, we will learn how to write code to do forward propagation (prediction) for a simple neural network −

在本节中,我们将学习如何为简单的神经网络编写代码以进行正向传播(预测)-

Forward Propagation Algorithm

Each data point is a customer. The first input is how many accounts they have, and the second input is how many children they have. The model will predict how many transactions the user makes in the next year.

每个数据点都是一个客户。 第一个输入是他们有多少个帐户,第二个输入是他们有多少个孩子。 该模型将预测用户在明年进行的交易次数。

The input data is pre-loaded as input data, and the weights are in a dictionary called weights. The array of weights for the first node in the hidden layer are in weights [‘node_0’], and for the second node in the hidden layer are in weights[‘node_1’] respectively.

输入数据被预加载为输入数据,权重在称为权重的字典中。 隐藏层中第一个节点的权重数组以权重['node_0']表示,隐藏层中第二个节点的权重数组分别以权重['node_1']表示。

The weights feeding into the output node are available in weights.

馈入输出节点的权重可用权重提供。

整流线性激活函数 (The Rectified Linear Activation Function)

An "activation function" is a function that works at each node. It transforms the node's input into some output.

“激活功能”是在每个节点上起作用的功能。 它将节点的输入转换为某些输出。

The rectified linear activation function (called ReLU) is widely used in very high-performance networks. This function takes a single number as an input, returning 0 if the input is negative, and input as the output if the input is positive.

整流的线性激活函数(称为ReLU )广泛用于非常高性能的网络中。 此函数将一个数字作为输入,如果输入为负,则返回0,如果输入为正,则返回输入。

Here are some examples −

这是一些例子-

  • relu(4) = 4

    relu(4)= 4
  • relu(-2) = 0

    relu(-2)= 0

We fill in the definition of the relu() function−

我们填写relu()函数的定义-

  • We use the max() function to calculate the value for the output of relu().

    我们使用max()函数来计算relu()的输出值。
  • We apply the relu() function to node_0_input to calculate node_0_output.

    我们将relu()函数应用于node_0_input来计算node_0_output。
  • We apply the relu() function to node_1_input to calculate node_1_output.

    我们将relu()函数应用于node_1_input以计算node_1_output。
  1. import numpy as np
  2. input_data = np.array([-1, 2])
  3. weights = {
  4. 'node_0': np.array([3, 3]),
  5. 'node_1': np.array([1, 5]),
  6. 'output': np.array([2, -1])
  7. }
  8. node_0_input = (input_data * weights['node_0']).sum()
  9. node_0_output = np.tanh(node_0_input)
  10. node_1_input = (input_data * weights['node_1']).sum()
  11. node_1_output = np.tanh(node_1_input)
  12. hidden_layer_output = np.array(node_0_output, node_1_output)
  13. output =(hidden_layer_output * weights['output']).sum()
  14. print(output)
  15. def relu(input):
  16. '''Define your relu activation function here'''
  17. # Calculate the value for the output of the relu function: output
  18. output = max(input,0)
  19. # Return the value just calculated
  20. return(output)
  21. # Calculate node 0 value: node_0_output
  22. node_0_input = (input_data * weights['node_0']).sum()
  23. node_0_output = relu(node_0_input)
  24. # Calculate node 1 value: node_1_output
  25. node_1_input = (input_data * weights['node_1']).sum()
  26. node_1_output = relu(node_1_input)
  27. # Put node values into array: hidden_layer_outputs
  28. hidden_layer_outputs = np.array([node_0_output, node_1_output])
  29. # Calculate model output (do not apply relu)
  30. odel_output = (hidden_layer_outputs * weights['output']).sum()
  31. print(model_output)# Print model output

输出量 (Output)

  1. 0.9950547536867305
  2. -3

将网络应用于许多观测/数据行 (Applying the network to many Observations/rows of data)

In this section, we will learn how to define a function called predict_with_network(). This function will generate predictions for multiple data observations, taken from network above taken as input_data. The weights given in above network are being used. The relu() function definition is also being used.

在本节中,我们将学习如何定义一个名为predict_with_network()的函数。 此函数将针对来自上方网络的多个数据观测值生成预测,作为输入数据。 使用上述网络中给出的权重。 还使用了relu()函数定义。

Let us define a function called predict_with_network() that accepts two arguments - input_data_row and weights - and returns a prediction from the network as the output.

让我们定义一个名为predict_with_network()的函数,该函数接受两个参数(input_data_row和weights),并从网络返回一个预测作为输出。

We calculate the input and output values for each node, storing them as: node_0_input, node_0_output, node_1_input, and node_1_output.

我们计算每个节点的输入和输出值,并将它们存储为:node_0_input,node_0_output,node_1_input和node_1_output。

To calculate the input value of a node, we multiply the relevant arrays together and compute their sum.

为了计算节点的输入值,我们将相关数组相乘并计算它们的总和。

To calculate the output value of a node, we apply the relu()function to the input value of the node. We use a ‘for loop’ to iterate over input_data −

为了计算节点的输出值,我们将relu()函数应用于节点的输入值。 我们使用'for循环'遍历input_data-

We also use our predict_with_network() to generate predictions for each row of the input_data - input_data_row. We also append each prediction to results.

我们还使用predict_with_network()为input_data-input_data_row的每一行生成预测。 我们还将每个预测附加到结果中。

  1. # Define predict_with_network()
  2. def predict_with_network(input_data_row, weights):
  3. # Calculate node 0 value
  4. node_0_input = (input_data_row * weights['node_0']).sum()
  5. node_0_output = relu(node_0_input)
  6. # Calculate node 1 value
  7. node_1_input = (input_data_row * weights['node_1']).sum()
  8. node_1_output = relu(node_1_input)
  9. # Put node values into array: hidden_layer_outputs
  10. hidden_layer_outputs = np.array([node_0_output, node_1_output])
  11. # Calculate model output
  12. input_to_final_layer = (hidden_layer_outputs*weights['output']).sum()
  13. model_output = relu(input_to_final_layer)
  14. # Return model output
  15. return(model_output)
  16. # Create empty list to store prediction results
  17. results = []
  18. for input_data_row in input_data:
  19. # Append prediction to results
  20. results.append(predict_with_network(input_data_row, weights))
  21. print(results)# Print results

输出量 (Output)

  1. [0, 12]

Here we have used the relu function where relu(26) = 26 and relu(-13)=0 and so on.

在这里,我们使用了relu函数,其中relu(26)= 26和relu(-13)= 0等等。

深度多层神经网络 (Deep multi-layer neural networks)

Here we are writing code to do forward propagation for a neural network with two hidden layers. Each hidden layer has two nodes. The input data has been preloaded as input_data. The nodes in the first hidden layer are called node_0_0 and node_0_1.

在这里,我们正在编写代码以对具有两个隐藏层的神经网络进行正向传播。 每个隐藏层都有两个节点。 输入数据已预加载为input_data 。 第一隐藏层中的节点称为node_0_0和node_0_1。

Their weights are pre-loaded as weights['node_0_0'] and weights['node_0_1'] respectively.

它们的权重分别预加载为weights ['node_0_0']和weights ['node_0_1']。

The nodes in the second hidden layer are called node_1_0 and node_1_1. Their weights are pre-loaded as weights['node_1_0'] and weights['node_1_1'] respectively.

第二隐藏层中的节点称为node_1_0和node_1_1 。 它们的权重分别预加载为weights ['node_1_0']weights ['node_1_1']

We then create a model output from the hidden nodes using weights pre-loaded as weights['output'].

然后,我们使用预加载为weights ['output']的权重从隐藏节点创建模型输出。

Deep Multi Layer

We calculate node_0_0_input using its weights weights['node_0_0'] and the given input_data. Then apply the relu() function to get node_0_0_output.

我们使用权重weights ['node_0_0']和给定的input_data来计算node_0_0_input。 然后应用relu()函数获取node_0_0_output。

We do the same as above for node_0_1_input to get node_0_1_output.

我们对node_0_1_input进行与上述相同的操作,以获取node_0_1_output。

We calculate node_1_0_input using its weights weights['node_1_0'] and the outputs from the first hidden layer - hidden_0_outputs. We then apply the relu() function to get node_1_0_output.

我们使用权重weights ['node_1_0']和第一个隐藏层的输出-hidden_​​0_outputs来计算node_1_0_input。 然后,我们应用relu()函数来获取node_1_0_output。

We do the same as above for node_1_1_input to get node_1_1_output.

我们对node_1_1_input进行与上述相同的操作,以获取node_1_1_output。

We calculate model_output using weights['output'] and the outputs from the second hidden layer hidden_1_outputs array. We do not apply the relu()function to this output.

我们使用weights ['output']和第二个隐藏层hidden_​​1_outputs数组的输出来计算model_output。 我们不将relu()函数应用于此输出。

Multi Hidden Layer
  1. import numpy as np
  2. input_data = np.array([3, 5])
  3. weights = {
  4. 'node_0_0': np.array([2, 4]),
  5. 'node_0_1': np.array([4, -5]),
  6. 'node_1_0': np.array([-1, 1]),
  7. 'node_1_1': np.array([2, 2]),
  8. 'output': np.array([2, 7])
  9. }
  10. def predict_with_network(input_data):
  11. # Calculate node 0 in the first hidden layer
  12. node_0_0_input = (input_data * weights['node_0_0']).sum()
  13. node_0_0_output = relu(node_0_0_input)
  14. # Calculate node 1 in the first hidden layer
  15. node_0_1_input = (input_data*weights['node_0_1']).sum()
  16. node_0_1_output = relu(node_0_1_input)
  17. # Put node values into array: hidden_0_outputs
  18. hidden_0_outputs = np.array([node_0_0_output, node_0_1_output])
  19. # Calculate node 0 in the second hidden layer
  20. node_1_0_input = (hidden_0_outputs*weights['node_1_0']).sum()
  21. node_1_0_output = relu(node_1_0_input)
  22. # Calculate node 1 in the second hidden layer
  23. node_1_1_input = (hidden_0_outputs*weights['node_1_1']).sum()
  24. node_1_1_output = relu(node_1_1_input)
  25. # Put node values into array: hidden_1_outputs
  26. hidden_1_outputs = np.array([node_1_0_output, node_1_1_output])
  27. # Calculate model output: model_output
  28. model_output = (hidden_1_outputs*weights['output']).sum()
  29. # Return model_output
  30. return(model_output)
  31. output = predict_with_network(input_data)
  32. print(output)

输出量 (Output)

  1. 364

翻译自: https://www.tutorialspoint.com/python_deep_learning/python_deep_learning_quick_guide.htm

本文内容由网友自发贡献,转载请注明出处:【wpsshop博客】
推荐阅读
相关标签
  

闽ICP备14008679号