Machine Learning,machinelearning


http://blog.csdn.net/pipisorry/article/details/43115525

机器学习Machine Learning - Andrew NG courses学习笔记

单变量线性回归Linear regression with one variable

模型表示Model representation

例子:

这是Regression Problem(one of supervised learning)并且是Univariate linear regression (Linear regression with one variable.)
变量定义Notation(术语terminology):
m = Number of training examples
x’s = “input” variable  /  features
y’s = “output” variable  /  “target” variable

e.g.  (x,y)表示一个trainning example 而 (xi,yi)表示ith trainning example.

Model representation

h代表假设hypothesis,h maps x's to y's(其实就是求解x到y的一个函数)



成本函数cost function

上个例子中h设为下图中的式子,我们要做的是

how to go about choosing these parameter values, theta zero and theta one.


try to minimize the square difference between the output of the hypothesis and the actual price of the house.

the mathematical definition of the cost function


定义这个函数为(J函数就是cost function的一种)

why do we minimize one by 2M?

going to minimize one by 2M.Putting the 2, the constant one half, in front it just makes some of the math a little easier.
why do we take the squares of the errors?

It turns out that the squared error cost function is a reasonable choice and will work well for most problems, for most regression problems. There are other cost functions that will work pretty well, but the squared error cost function is probably the most commonly used one for regression problems.


cost function intuition I

    

{简化起见:将theta0设为0, 即h函数过原点}

Each value of theta one corresponds to a different hypothesis, or to a different straight line fit on the left. And for each value of theta one, we could then derive a different value of j of theta one.

for example, theta one=1,corresponded to this straight line through the data. for each value of theta one we wound up with a different value of J of theta one 


cost function intuition II
在这里我们又keep both of my parameters, theta zero, and theta one.

cost function用3D图表示(不同theta0\theta1取值下h(x)图形和cost fun J的变化)

    

cost function用等高线图contour plot/figure表示


梯度下降Gradient descent(for minimizing the cost function J,minimize other functions as well, not just the cost function J, for linear regression.)

问题:

解决:

梯度下降算法:



符号解释:

A: =B means we will set A to be equal to the value of B.    a computer operation, where you set the value of A to be a value.A:=A+1 means take A and increase its value by one.
A=B, is a truth assertion.asserting that the value of A equals to the value of B. just making a claim that the values of A and B are the same.I won't ever write A=A+1.Because that's just wrong.

注意Note:Compute that thing for both theta0 and theta1, and then simultaneously at the same time update theta0 and theta1.

左右两个式子的区别:对于右边if you've already updated theta0 then you would be using the new value of theta0 to compute this derivative term and so this gives you a different value of temp1 than the left hand side, because you've now plugged in the new value of theta0 into this equation. 

If you implement the non-simultaneous update, it will probably work anyway, but this algorithm on the right is not what people refer to as gradient descent and this is some other algorithm with different properties. And for various reasons, this can behave in slightly stranger ways.


梯度下降Gradient descent intuition

这里假设theta0 = 0

算法每次改变theta1一点点

suppose you initialize theta one at a local minimum.It turns out that at local optimum your derivative would be equal to zero. so,it leaves theta one unchanged.


alpha的设置对cost func的影响



gradient descent中alpha值的自动变化:

the derivative here will be even smaller than it was at the green point.

As gradient descent runs. You will automatically take smaller and smaller steps until eventually you are taking very small steps.so actually there is no need to decrease alpha overtime.






from:http://blog.csdn.net/pipisorry/article/details/43115525


相关内容