If you'd like to make sure that your model is working well and that model can
generalize with new cases, you can try out new cases with it by putting the
model in the environment and then monitoring how it will perform. This is a
good method, but if your m gv
odel is inadequate, the user will complain.
You should divide your data into two sets, one set for training and the second
one for testing, so that you can train your model using the first one and test it
using the second. The generalization error is the rate of error by evaluation of
your model on the test set. The value you get will tell you if your model is good
enough, and if it will work properly.
If the error rate is low, the model is good and will perform properly. In contrast,
if your rate is high, this means your model will perform badly and not work
properly. My advice to you is to use 80% of the data for training and 20% for
testing purposes, so that it’s very simple to test or evaluate a model.
In this chapter, we have covered many concepts of machine learning. The
following chapters will be very practical, and you'll write code, but you should
answer the following questions just to make sure you're on the right track.
1. Define machine learning
2. Describe the four types of machine-learning systems.
3. What is the difference between supervised and unsupervised learning.
4. Name the unsupervised tasks.
5. Why are testing and validation important?
6. In one sentence, describe what online learning is.
7. What is the difference between batch and offline learning?
8. Which type of machine learning system should you use to make a
robot learn how to walk?
In this chapter, you'll go deeper into classification systems, and work with the
MNIST data set. This is a set of 70,000 images of digits handwritten by students
and employees. You'll find that each image has a label and a digit that represents
it. This project is like the “Hello, world” example of traditional programming.
So every beginner to machine learning should start with this project to learn
about the classification algorithm. Scikit-Learn has many functions, including
the MNIST. Let’s take a look at the code:
>>> from sklearn.data sets import fetch_mldata
>>> mn= fetch_mldata('MNIST original')
>>> mn
{'COL_NAMES': ['label', 'data'],
'Description': 'mldata.org data set: mn-original',
'data': array([[0, 0, 0,..., 0, 0, 0],
[0, 0, 0,..., 0, 0, 0],
[0, 0, 0,..., 0, 0, 0],
...,
[0, 0, 0,..., 0, 0, 0],
[0, 0, 0,..., 0, 0, 0],
[0, 0, 0,..., 0, 0, 0]], dataType=uint8),
'tar': array([ 0., 0., 0.,..., 9., 9., 9.])} de
. Description is a key that describes the data set.
. The data key here contains an array with just one row for instance, and a
column for every feature.
. This target key contains an array with labels.
Let’s work with some of the code:
>>> X, y = mn["data"], mn["tar"]
>>> X.shape
(70000, 784)
>>> y.shape
(70000,)
. 7000 here means that there are 70,000 images, and every image has more than
Measures of Performance
If you want to evaluate a classifier, this will be more difficult than a regressor, so
let’s explain how to evaluate a classifier.
In this example, we'll use across-validation to evaluate our model.
from sklearn.model_selection import StratifiedKFold
form sklearn.base import clone
sf = StratifiedKFold(n=2, ran_state = 40)
for train_index, test_index in sf.split(x_tr, y_tr_6):
cl = clone(sgd_clf)
x_tr_fd = x_tr[train_index]
y_tr_fd = (y_tr_6[train_index])
x_tes_fd = x_tr[test_index]
y_tes_fd = (y_tr_6[test_index])
cl.fit(x_tr_fd, y_tr_fd)
y_p = cl.predict(x_tes_fd)
print(n_correct / len(y_p))
. We use the StratifiedFold class to perform stratified sampling that produces
folds that contain a ration for every class. Next, every iteration in the code will
create a clone of the classifier to make predictions on the test fold. And finally, it
will count the number of correct predictions and their ratio
. Now we'll use the cross_val_score function to evaluate the SGDClassifier by
K-fold cross validation. The k fold cross validation will divide the training set
into 3 folds, and then it will make prediction and evaluation on each fold.
from sklearn.model_selection import cross_val_score
cross_val_score(sgd_clf, x_tr, y_tr_6, cv = 3, scoring = “accuracy”)
You'll get the ratio of accuracy of “correct predictions” on all folds.Tree classifers.
The next image will illustrate the definition of a general target of collecting
functions that is just to merge different classifers into a One-classifer that has a
better generalization performance than each individual classifer alone.
As an example, assume that you collected predictions from many experts.
Ensemble methods would allow us to merge these predictions by the lots of
experts to get a prediction that is more proper and robust than the predictions of
each individual expert. As you can see later in this part, there are many different
methods to create an ensemble of classifers. In this part, we will introduce a
basic perception about how ensembles work and why they are typically
recognized for yielding a good generalization performance.
In this part, we will work with the most popular ensemble method that uses themajority voting principle. Many voting simply means that we choose the label
that has been predicted by the majority of classifers; that is, received more than
50 percent of the votes. As an example, the term here is like vote refers to just
binary class settings only. However, it is not hard to generate the majority voting
principle to multi-class settings, which is called plurality voting. After that, we
will choose the class label that received the most votes. The following diagram
illustrates the concept of majority and plurality voting for an ensemble of 10
classifers where each unique symbol (triangle, square, and circle) represents a
unique class label:
Using the training set, we start by training m different classifers (C C 1, , … m ).
Based on the method, the ensemble can be built from many classification
algorithms; for example, decision trees, support vector machines, logistic
regression classifers, and so on. In fact, you can use the same base classification
algorithm fitting different subsets of the training set. An example of this method
would be the random forest algorithm, which merges many decision ensemble
ways using majority voting.
To predict a class label via a simple majority or plurality voting, we combine the
predicted class labels of each individual classifer C j and select the class label yˆ
that received the most votes:
y m ˆ = ode{C C 1 2 ( ) x x , , ( ) …,Cm ( ) x }
For example, in a binary classification task where class1 1 = − and class2 1 = +,
we can write the majority vote prediction.
To illustrate why ensemble methods can work better than individual classifiers
alone, let's apply the simple concepts of combinatory. For the following
example, we make the assumption that all n base classifiers for a binary
classification task have an equal error rate, ε. Additionally, we assume that the
classifiers are independent and the error rates are not correlated. As you can see,
we can simply explain the error statistics of an ensemble of base classifiers as a
probability.
Comments
Post a Comment