当前位置:   article > 正文

DL-1-week2-Basics of Neural Network Programming_week 2 quiz - neural network basics

week 2 quiz - neural network basics

2 Basics of Neural Network Programming


Date:2018.3.4


2.1 Binary Classification

Example:

  • input:a cat image
    Three 64*64 matrices in computer corresponding to the red,green and blue.Then define a feature vector x corresponding to this image.
    The dimension of the input features x is:
    nx=64×64×64=12288
  • output: 1(cat) vs 0(not cat)

2.2 Logistic Regression

Given x, want y^=P(y=1|x)
xRnx
Parameters: ωRnx,bR
Output: First consider the linear regression:y^=ωTx+b, but y^[0,1]. Actually, In logistic regression, our output is instead going to be y^ equals the sigmoid function applied to this quantity.

y^=σ(ωTx+b)

sigmoid function is:
σ(z)=11+ez

easily to find that :
0<σ(z)<1

if we let x0=1,then xRnx+1, y^=σ(θTx)
这里写图片描述


Date:2018.3.5


2.3 Logistic Regression cost function

Loss(error) function:
Usually we use the square loss function:

L(y^,y)=12(y^y)2

but in logistic regression,the loss function is :
L(y^,y)=[ylogy^+(1y)log(1y^)]

If y=1:L(y^,y)=logy^. So,you want logy^ to be as big as possible, also want y^large. But y^ is given by Sigmoid function, so y^ should be close to 1.

If y=0:L(y^,y)=log(1y^),this make y^ as small as possible.

So, if y is 0, we try to make y^ small, if y is 1, we try to make y^ large.

Coss function:
Loss function measures how well you’re doing on a single training example. However, coss function measures how well you’re doing an entire training set.

J(ω,b)=1mi=1mL(y^(i),y(i))

2.4 Gradient Descent

Want to find ω,b that minimize J(ω,b)
Repeat:

ω:=ωαJ(ω)ω

b:=bαJ(ω,b)b

where α is the learning rate.

2.5 Derivatives

calculus and derivatives
skip

2.6 More derivatives examples

The slope of the function
skip

2.7 Computation Graph

Given a function:

J(a,b,c)=3(a+b×c)

The step to solve it :

u=bc
v=a+u
J=3v
  • 1
  • 2
  • 3

a forward path and a backward pass(back propagation)
这里写图片描述

2.8 Derivatives with a Computation Graph

chain rule


Date:2018.3.6


2.9 Logistic Regression Gradient descent

The model is:

z=ωTx+b
y^=a=σ(z)
L(a,y)=(ylog(a)+(1y)log(1a))

We have only two features x1 and x2.The step to get parameters ω1,ω2,b is :

z = w1*x1+w2*x2+b
a = y^ = sigmoid(z)
L(a,y)
  • 1
  • 2
  • 3

dω1=Laazzω1=(ya+1y1a)×a(1a)×x1
=(ay)x1
dω2=(ay)x2
db=(ay)

The update steps:

w1:= w1 - \alpha dw1
w2:= w2 - \alpha dw2
b := b  - \alpha db
  • 1
  • 2
  • 3

2.10 Gradient descent on m examples

The cost function:

J(ω,b)=1mi=1mL(a(i),y)
a(i)=y^(i)=σ(z)=σ(ωTx(i)+b)

So the overall gradient is :
J(ω,b)ωi=1mi=1mdωi

这里写图片描述

This is only one loop for gradient descent,actually,you need more steps to use gradient descent, if the data has a lot of features,you may need 3 for loops, one for gradient descent,one for all items, one for all features, so it is too slow for the algorithm. The vectorization can accelerate the speed.

2.11 Vectorization

Vectorization is the art of getting rid of explicit for loops in your code.
Non-vectorizated:

z=0
for i in range(n_x):
    z+=w[i]*x[i]
z+=b
  • 1
  • 2
  • 3
  • 4

Vectorizated:

z = np.dot(w,x)+b
  • 1

2.12 More vectorization examples

Whenever possible, avoid explicit for-loops.
Given

v=[v1,v2,...,vn]T
, you want to apply the exponential operation on every element of a vector.
Non-vectorization:

u = np.zeros((n,1))
for i in range(n):
    u[i]=math.exp(v[i])
  • 1
  • 2
  • 3

Vectorization:

import numpy as np
u = np.exp(v)
  • 1
  • 2

In logistic regression derivatives:
这里写图片描述

2.13&2.14 Vectorizing Logistic Regression

Z=WTX+B
A=σ(Z)
where , Z=(z1,z2,z3)T,X=(x1,x2,x3)T.
Let dz=[dz(1),dz(2),...,dz(m)],A=[a(1),a(2),...,a(m)],Y=[y(1),y(2),...,y(m)]
So,
dz=AY
db=1mdZT1
dw=1mXdZT


Date:2018.3.7


2.15 Broadcasting in Python

import numpy as np
A = np.array([[1,2,3],[4,5,6],[7,8,9]])
cal = A.sum(axis = 0)
percentage = 100*A/cal.reshape(1,3)#reshape(),改成1行3列的数组
print(percentage)
  • 1
  • 2
  • 3
  • 4
  • 5

这里写图片描述

2.16 A note on python/numpy vectors

Usually use a = np.random.randn(5,1),not use a = np.random.randn(5)

a = np.random.randn(5) #a.shape = (5,),rank 1 array
a = np.random.randn(5,1)#a.shape = (5,1),5 by 1 column vector
a = np.random.randn(1,5)#a.shape = (1,5),1 by 5 row vector
  • 1
  • 2
  • 3

you can use assert(a.shape==(5,1)) to ensure you get a column vector otherwise a rank 1 array. Use a.reshape(5,1) if you get a rank 1 array.

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小小林熬夜学编程/article/detail/124283
推荐阅读
相关标签
  

闽ICP备14008679号