Understanding least-squared cost function
I've been using some of the loss functions in machine learning like Least Squared Error, Logistic Loss, Softmax Loss, etc. for quite some time but never had I dug deep into them to really understand how they were derived and the motivation behind them. Recently I took a look at Stanford's CS 229 materials and find it mind-blowing to finally have some understanding about those common loss functions and I want to share some of my knowledge here. First let's take a look at Least Squared Error. Least-squared cost function is usually used in regression problems, defined by the formula where h is our hypothesis function of the input x that estimates the output y and least-squared cost function is to measure how close we are in our estimation. Clearly, the higher L the further we are from estimating y correctly, so we are trying to minimize this cost function. But why least-squared cost function is a natural cost function to use in this case? Let's first assume t...