Machine Learning Part II

After working with many machine learning models and training algorithms, which seem like unfathomable black boxes. we were able to optimize a regression system, have also worked with image classifiers. But we developed these systems without understanding what's s inside and how they work, so now we need to go deeper so that we can grasp how they work and understand the details of implementation. Gaining a deep understanding of these details will help you with the right model and with choosing the best training algorithm. Also, it will help you with debugging and error analysis. In this chapter, we'll work with polynomial regression, which is a complex model that works for nonlinear data sets. In addition, we'll working with several regularization techniques that reduce training that encourages overfitting.
Linear Regression As an example, we'll take l_S = θ0 + θ1 × GDP_per_cap. This is a simple model for a linear function of the input feature ,“GPD_per_cap”. (θ0 and θ1) are the parameters of the model, In general, you'll use a linear model to make a prediction by calculating a weighted sum of the input features, and also a constant “bias,” as you can see in the following equation. . Y is the value of the predictor. . N represents the features . X1 is the value of the feature. . Θj is the model parameter of j theta. Also, we can write the equation in vectorized form, as in the following example: . Θ is the value that minimizes the cost. . Y contains the values y (1) to y (m). Let’s write some code to practice. Import numpy as np V1_x = 2 * np.random.rand (100, 1) V2_y = 4 + 3 * V1_x + np.random.randn (100, 1) After that, we'll calculate Θ value using our equation. It's time to use the inv() function from our linear algebra module of numpy (np.linalg)to calculate the inverse of any matrix, and also, the dot() function for multiply our matrix Value1 = np.c_[np.ones((100, 1)), V1_x] myTheta = np.linalg.inv(Value1.T.dot(Value1)).dot(Value1.T).dot(V2_y) >>>myTheta Array([[num], [num]]) This function uses the following equation — y = 4 + 3x + noise “Gaussian” — to generate our data. Now let’s make our predictions. >>>V1_new = np.array([[0],[2]]) >>>V1_new_2 = np.c_[np.ones((2,1)), V1_new] >>>V2_predicit = V1_new_2.dot(myTheta) >>>V2_predict Array([[ 4.219424], [9.74422282]]) Now, it’s time to plot the model. Computational Complexity With the normal formula, we can compute the inverse of M^T. M — that is, a n*n matrix (n = the number of features). The complexity of this inversion is something like O(n^2.5) to O(n^3.2), which is based on the implementation. Actually, if you make the number of features like twice, you'll make the time of the computation attain between 2^2.5 and 2^3.2. The great news here is that the equation is a linear equation. This means It can easily handle huge training sets and fit the memory in. After training your model, the predictions will be not slow, and the complexity will be simple, thanks to the linear model. It’s time to go deeper into the methods of training a linear regression model, which is always used when there is a large number of features and instances in the memory. Gradient Descent This algorithm is a general algorithm that is used for optimization and for providing the optimal solution for various problems. The idea of this algorithm is to work with the parameters in an iterative way, to make the cost function as simple as possible. The gradient descent algorithm calculates the gradient of the error using the parameter theta, and it works with the method of descending gradient. If the gradient is equal to zero, you'll reach the minimum. Also, you should keep in the mind that the size of the steps is very important for this algorithm, because if it's very small – “meaning the rate of learning” is slow – it will take a long time to cover everything that it needs to. Batch Gradient Descent If you'd like to implement this algorithm, you should first calculate the gradient of your cost function using the theta parameter. If the value of the parameter theta has changed, you’ll need to know the changing rate of your cost function. We can call this change by a partial derivative We can calculate the partial derivative using the following equation: But we`ll also use the following equation to calculate the partial derivatives and the gradient vector together. Let’s implement the algorithm. Lr = 1 # Lr for learning rate Num_it = 1000 # number of iterations L = 100 myTheta = np.random.randn (2,1) Stochastic Gradient Descent You'll find a problem when you’re using the batch gradient descent: it needs to use the whole training set in order to calculate the value at each step, and that will affect performance “speed”. But when using the stochastic gradient descent, the algorithm will randomly choose an instance from your training set at each step, and then it will calculate the value. In this way, the algorithm will be faster than the batch gradient descent, since it doesn’t need to use the whole set to calculate the value. On the other hand, because of the randomness of this method, it will be irregular when compared to the batch algorithm. Let’s implement the algorithm. Nums = 50 L1, L2 = 5, 50 Def lr_sc(s): return L1 / (s + L2) myTheta = np.random.randn(2,1) for Num in range (Nums): for l in range (f) myIndex = np.random.randint(f) V1_Xi = Value1[myIndex:myIndex+1] V2_yi = V2_y[myIndex:myIndex+1] gr = 2 * V1_xi.T.dot(V1_xi.dot(myTheta) – V2_yi) Lr = lr_sc(Num * f + i) myTheta = myTheta – Lr * gr >>> myTheta Array ([[num], [num]]) Mini-Batch Gradient Descent Because you already know the batch and the stochastic algorithms, this kind of algorithms is very easy to understand and work with . As you know, both algorithms calculate the value of the gradients, based on the whole training set or Mini-Batch Gradient Descent Because you already know the batch and the stochastic algorithms, this kind of algorithms is very easy to understand and work with . As you know, both algorithms calculate the value of the gradients, based on the whole training set or

Comments