Week 7 Wednesday#
Announcements#
Swap Thursday discussion and Friday lecture
HW6 due Wednesday
HW7 has been posted
Plan#
Multi-class Classification
import pandas as pd
import numpy as np
import altair as alt
import seaborn as sns
df = sns.load_dataset("penguins").dropna(axis=0)
Recap of Monday#
On Monday we used Logistic Regression with the flipper length and bill length columns in the penguins dataset to predict if a penguin is in the Chinstrap species.
df = sns.load_dataset("penguins").dropna(axis=0).copy()
df["isChinstrap"] = (df["species"] == "Chinstrap")
cols = ["flipper_length_mm", "bill_length_mm"]
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression()
clf.fit(df[cols], df["isChinstrap"])
df["pred"] = clf.predict(df[cols])
Multi-class Classification#
Model#
Let \(\tilde{x}\in\mathbb{R}^{p+1}\) denotes the augmented row vector (one sample). We approximate the probabilities to take value in \(K\) classes as
where we have \(K\) sets of parameters, \(\mathbf{w}_1, \mathbf{w}_2, \dots, \mathbf{w}_K\), and the sum factor normalizes the results to be a probability.
\(\mathbf{W}\) is an \((p+1)\times K\) matrix containing all \(K\) sets of parameters, obtained by concatenating \(\mathbf{w}_1, \mathbf{w}_2, \dots, \mathbf{w}_K\) into columns, so that \(\mathbf{w}_k = (w_{k0}, \dots, w_{kp})^{\top}\in \mathbb{R}^{p+1}\).
and \(\tilde{X}\mathbf{W}\) is valid and useful in vectorized code.
Another Expression: Introduce the hidden variable \(\mathbf{z} = (z_{1},...,z_{K})\) and define
or element-wise written as $\(z_{k} = \tilde{\mathbf{x}} \mathbf{w_{k}}, k = 1,2,...,K\)$
Then the predicted probability distribution can be denoted as
where \(\sigma(\mathbf{z})\) is called the soft-max function which is defined as
This is a valid probability distribution with \(K\) classes because you can check its element-wise sum is one and each component is positive.
This can be assumed as the (degenerate) simplest example of neural network that we’re going to learn in later lectures, and that’s why some people call multi-class logistic regression (also known as soft-max logistic regression) as one-layer neural network.
Loss function#
Define the following indicator function (and again can be derived from MLE): $\( 1_{\{y = k\}} = 1_{\{k\}}(y) = \delta_{yk} = \begin{cases} 1 & \text{when } y = k, \\[5pt] 0 & \text{otherwise}. \end{cases} \)$
Loss function is again using the cross entropy:
Notice that for each term in the summation over N (i.e. fix sample i), only one term is non-zero in the sum of K elements due to the indicator function.
Gradient descent#
After careful calculation, the gradient of \(L\) with respect the whole \(k\)-th set of weights is then:
In writing the code, it’s helpful to make this as the column vector, and stack all the \(K\) gradients together as a new matrix \(\mathbf{dW}\in\mathbb{R}^{(p+1)\times K}\). This makes the update of matrix \(\mathbf{W}\) very convenient in gradient descent.
Prediction#
The largest estimated probability’s class as this sample’s predicted label. $\( \hat{y} = \operatorname{arg}\max_{j} P\big(y = j| \mathbf{x}\big), \)$
Drop the “pred” column from
df
(we will make a new one below) using thedrop
method and a suitableaxis
keyword argument.
We are using axis=1
because it is the column labels that are changing (one of the column labels is being removed).
df = df.drop("pred", axis = 1)
df
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | isChinstrap | |
---|---|---|---|---|---|---|---|---|
0 | Adelie | Torgersen | 39.1 | 18.7 | 181.0 | 3750.0 | Male | False |
1 | Adelie | Torgersen | 39.5 | 17.4 | 186.0 | 3800.0 | Female | False |
2 | Adelie | Torgersen | 40.3 | 18.0 | 195.0 | 3250.0 | Female | False |
4 | Adelie | Torgersen | 36.7 | 19.3 | 193.0 | 3450.0 | Female | False |
5 | Adelie | Torgersen | 39.3 | 20.6 | 190.0 | 3650.0 | Male | False |
... | ... | ... | ... | ... | ... | ... | ... | ... |
338 | Gentoo | Biscoe | 47.2 | 13.7 | 214.0 | 4925.0 | Female | False |
340 | Gentoo | Biscoe | 46.8 | 14.3 | 215.0 | 4850.0 | Female | False |
341 | Gentoo | Biscoe | 50.4 | 15.7 | 222.0 | 5750.0 | Male | False |
342 | Gentoo | Biscoe | 45.2 | 14.8 | 212.0 | 5200.0 | Female | False |
343 | Gentoo | Biscoe | 49.9 | 16.1 | 213.0 | 5400.0 | Male | False |
333 rows Ă— 8 columns
Fit a new logistic regression classifier, using the same input features, but this time using the “species” column as our target.
Even though we have three output classes, the procedure is the same we have been using all along.
clf = LogisticRegression()
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df[cols], df["species"], test_size=0.1, random_state=42)
clf.fit(X_train, y_train)
/shared-libs/python3.9/py/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
LogisticRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LogisticRegression()
This warning indicates that the algorithm did not converge to a solution within the number of iterations specified by the max_iter parameter.
The most straightforward approach is to increase the max_iter parameter in your logistic regression model. This gives the algorithm more iterations to converge to a solution. However, be aware that setting this number too high might lead to longer training times.
help(clf)
Help on LogisticRegression in module sklearn.linear_model._logistic object:
class LogisticRegression(sklearn.linear_model._base.LinearClassifierMixin, sklearn.linear_model._base.SparseCoefMixin, sklearn.base.BaseEstimator)
| LogisticRegression(penalty='l2', *, dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver='lbfgs', max_iter=100, multi_class='auto', verbose=0, warm_start=False, n_jobs=None, l1_ratio=None)
|
| Logistic Regression (aka logit, MaxEnt) classifier.
|
| In the multiclass case, the training algorithm uses the one-vs-rest (OvR)
| scheme if the 'multi_class' option is set to 'ovr', and uses the
| cross-entropy loss if the 'multi_class' option is set to 'multinomial'.
| (Currently the 'multinomial' option is supported only by the 'lbfgs',
| 'sag', 'saga' and 'newton-cg' solvers.)
|
| This class implements regularized logistic regression using the
| 'liblinear' library, 'newton-cg', 'sag', 'saga' and 'lbfgs' solvers. **Note
| that regularization is applied by default**. It can handle both dense
| and sparse input. Use C-ordered arrays or CSR matrices containing 64-bit
| floats for optimal performance; any other input format will be converted
| (and copied).
|
| The 'newton-cg', 'sag', and 'lbfgs' solvers support only L2 regularization
| with primal formulation, or no regularization. The 'liblinear' solver
| supports both L1 and L2 regularization, with a dual formulation only for
| the L2 penalty. The Elastic-Net regularization is only supported by the
| 'saga' solver.
|
| Read more in the :ref:`User Guide <logistic_regression>`.
|
| Parameters
| ----------
| penalty : {'l1', 'l2', 'elasticnet', 'none'}, default='l2'
| Specify the norm of the penalty:
|
| - `'none'`: no penalty is added;
| - `'l2'`: add a L2 penalty term and it is the default choice;
| - `'l1'`: add a L1 penalty term;
| - `'elasticnet'`: both L1 and L2 penalty terms are added.
|
| .. warning::
| Some penalties may not work with some solvers. See the parameter
| `solver` below, to know the compatibility between the penalty and
| solver.
|
| .. versionadded:: 0.19
| l1 penalty with SAGA solver (allowing 'multinomial' + L1)
|
| dual : bool, default=False
| Dual or primal formulation. Dual formulation is only implemented for
| l2 penalty with liblinear solver. Prefer dual=False when
| n_samples > n_features.
|
| tol : float, default=1e-4
| Tolerance for stopping criteria.
|
| C : float, default=1.0
| Inverse of regularization strength; must be a positive float.
| Like in support vector machines, smaller values specify stronger
| regularization.
|
| fit_intercept : bool, default=True
| Specifies if a constant (a.k.a. bias or intercept) should be
| added to the decision function.
|
| intercept_scaling : float, default=1
| Useful only when the solver 'liblinear' is used
| and self.fit_intercept is set to True. In this case, x becomes
| [x, self.intercept_scaling],
| i.e. a "synthetic" feature with constant value equal to
| intercept_scaling is appended to the instance vector.
| The intercept becomes ``intercept_scaling * synthetic_feature_weight``.
|
| Note! the synthetic feature weight is subject to l1/l2 regularization
| as all other features.
| To lessen the effect of regularization on synthetic feature weight
| (and therefore on the intercept) intercept_scaling has to be increased.
|
| class_weight : dict or 'balanced', default=None
| Weights associated with classes in the form ``{class_label: weight}``.
| If not given, all classes are supposed to have weight one.
|
| The "balanced" mode uses the values of y to automatically adjust
| weights inversely proportional to class frequencies in the input data
| as ``n_samples / (n_classes * np.bincount(y))``.
|
| Note that these weights will be multiplied with sample_weight (passed
| through the fit method) if sample_weight is specified.
|
| .. versionadded:: 0.17
| *class_weight='balanced'*
|
| random_state : int, RandomState instance, default=None
| Used when ``solver`` == 'sag', 'saga' or 'liblinear' to shuffle the
| data. See :term:`Glossary <random_state>` for details.
|
| solver : {'newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'}, default='lbfgs'
|
| Algorithm to use in the optimization problem. Default is 'lbfgs'.
| To choose a solver, you might want to consider the following aspects:
|
| - For small datasets, 'liblinear' is a good choice, whereas 'sag'
| and 'saga' are faster for large ones;
| - For multiclass problems, only 'newton-cg', 'sag', 'saga' and
| 'lbfgs' handle multinomial loss;
| - 'liblinear' is limited to one-versus-rest schemes.
|
| .. warning::
| The choice of the algorithm depends on the penalty chosen:
| Supported penalties by solver:
|
| - 'newton-cg' - ['l2', 'none']
| - 'lbfgs' - ['l2', 'none']
| - 'liblinear' - ['l1', 'l2']
| - 'sag' - ['l2', 'none']
| - 'saga' - ['elasticnet', 'l1', 'l2', 'none']
|
| .. note::
| 'sag' and 'saga' fast convergence is only guaranteed on
| features with approximately the same scale. You can
| preprocess the data with a scaler from :mod:`sklearn.preprocessing`.
|
| .. seealso::
| Refer to the User Guide for more information regarding
| :class:`LogisticRegression` and more specifically the
| :ref:`Table <Logistic_regression>`
| summarizing solver/penalty supports.
|
| .. versionadded:: 0.17
| Stochastic Average Gradient descent solver.
| .. versionadded:: 0.19
| SAGA solver.
| .. versionchanged:: 0.22
| The default solver changed from 'liblinear' to 'lbfgs' in 0.22.
|
| max_iter : int, default=100
| Maximum number of iterations taken for the solvers to converge.
|
| multi_class : {'auto', 'ovr', 'multinomial'}, default='auto'
| If the option chosen is 'ovr', then a binary problem is fit for each
| label. For 'multinomial' the loss minimised is the multinomial loss fit
| across the entire probability distribution, *even when the data is
| binary*. 'multinomial' is unavailable when solver='liblinear'.
| 'auto' selects 'ovr' if the data is binary, or if solver='liblinear',
| and otherwise selects 'multinomial'.
|
| .. versionadded:: 0.18
| Stochastic Average Gradient descent solver for 'multinomial' case.
| .. versionchanged:: 0.22
| Default changed from 'ovr' to 'auto' in 0.22.
|
| verbose : int, default=0
| For the liblinear and lbfgs solvers set verbose to any positive
| number for verbosity.
|
| warm_start : bool, default=False
| When set to True, reuse the solution of the previous call to fit as
| initialization, otherwise, just erase the previous solution.
| Useless for liblinear solver. See :term:`the Glossary <warm_start>`.
|
| .. versionadded:: 0.17
| *warm_start* to support *lbfgs*, *newton-cg*, *sag*, *saga* solvers.
|
| n_jobs : int, default=None
| Number of CPU cores used when parallelizing over classes if
| multi_class='ovr'". This parameter is ignored when the ``solver`` is
| set to 'liblinear' regardless of whether 'multi_class' is specified or
| not. ``None`` means 1 unless in a :obj:`joblib.parallel_backend`
| context. ``-1`` means using all processors.
| See :term:`Glossary <n_jobs>` for more details.
|
| l1_ratio : float, default=None
| The Elastic-Net mixing parameter, with ``0 <= l1_ratio <= 1``. Only
| used if ``penalty='elasticnet'``. Setting ``l1_ratio=0`` is equivalent
| to using ``penalty='l2'``, while setting ``l1_ratio=1`` is equivalent
| to using ``penalty='l1'``. For ``0 < l1_ratio <1``, the penalty is a
| combination of L1 and L2.
|
| Attributes
| ----------
|
| classes_ : ndarray of shape (n_classes, )
| A list of class labels known to the classifier.
|
| coef_ : ndarray of shape (1, n_features) or (n_classes, n_features)
| Coefficient of the features in the decision function.
|
| `coef_` is of shape (1, n_features) when the given problem is binary.
| In particular, when `multi_class='multinomial'`, `coef_` corresponds
| to outcome 1 (True) and `-coef_` corresponds to outcome 0 (False).
|
| intercept_ : ndarray of shape (1,) or (n_classes,)
| Intercept (a.k.a. bias) added to the decision function.
|
| If `fit_intercept` is set to False, the intercept is set to zero.
| `intercept_` is of shape (1,) when the given problem is binary.
| In particular, when `multi_class='multinomial'`, `intercept_`
| corresponds to outcome 1 (True) and `-intercept_` corresponds to
| outcome 0 (False).
|
| n_features_in_ : int
| Number of features seen during :term:`fit`.
|
| .. versionadded:: 0.24
|
| feature_names_in_ : ndarray of shape (`n_features_in_`,)
| Names of features seen during :term:`fit`. Defined only when `X`
| has feature names that are all strings.
|
| .. versionadded:: 1.0
|
| n_iter_ : ndarray of shape (n_classes,) or (1, )
| Actual number of iterations for all classes. If binary or multinomial,
| it returns only 1 element. For liblinear solver, only the maximum
| number of iteration across all classes is given.
|
| .. versionchanged:: 0.20
|
| In SciPy <= 1.0.0 the number of lbfgs iterations may exceed
| ``max_iter``. ``n_iter_`` will now report at most ``max_iter``.
|
| See Also
| --------
| SGDClassifier : Incrementally trained logistic regression (when given
| the parameter ``loss="log"``).
| LogisticRegressionCV : Logistic regression with built-in cross validation.
|
| Notes
| -----
| The underlying C implementation uses a random number generator to
| select features when fitting the model. It is thus not uncommon,
| to have slightly different results for the same input data. If
| that happens, try with a smaller tol parameter.
|
| Predict output may not match that of standalone liblinear in certain
| cases. See :ref:`differences from liblinear <liblinear_differences>`
| in the narrative documentation.
|
| References
| ----------
|
| L-BFGS-B -- Software for Large-scale Bound-constrained Optimization
| Ciyou Zhu, Richard Byrd, Jorge Nocedal and Jose Luis Morales.
| http://users.iems.northwestern.edu/~nocedal/lbfgsb.html
|
| LIBLINEAR -- A Library for Large Linear Classification
| https://www.csie.ntu.edu.tw/~cjlin/liblinear/
|
| SAG -- Mark Schmidt, Nicolas Le Roux, and Francis Bach
| Minimizing Finite Sums with the Stochastic Average Gradient
| https://hal.inria.fr/hal-00860051/document
|
| SAGA -- Defazio, A., Bach F. & Lacoste-Julien S. (2014).
| :arxiv:`"SAGA: A Fast Incremental Gradient Method With Support
| for Non-Strongly Convex Composite Objectives" <1407.0202>`
|
| Hsiang-Fu Yu, Fang-Lan Huang, Chih-Jen Lin (2011). Dual coordinate descent
| methods for logistic regression and maximum entropy models.
| Machine Learning 85(1-2):41-75.
| https://www.csie.ntu.edu.tw/~cjlin/papers/maxent_dual.pdf
|
| Examples
| --------
| >>> from sklearn.datasets import load_iris
| >>> from sklearn.linear_model import LogisticRegression
| >>> X, y = load_iris(return_X_y=True)
| >>> clf = LogisticRegression(random_state=0).fit(X, y)
| >>> clf.predict(X[:2, :])
| array([0, 0])
| >>> clf.predict_proba(X[:2, :])
| array([[9.8...e-01, 1.8...e-02, 1.4...e-08],
| [9.7...e-01, 2.8...e-02, ...e-08]])
| >>> clf.score(X, y)
| 0.97...
|
| Method resolution order:
| LogisticRegression
| sklearn.linear_model._base.LinearClassifierMixin
| sklearn.base.ClassifierMixin
| sklearn.linear_model._base.SparseCoefMixin
| sklearn.base.BaseEstimator
| builtins.object
|
| Methods defined here:
|
| __init__(self, penalty='l2', *, dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver='lbfgs', max_iter=100, multi_class='auto', verbose=0, warm_start=False, n_jobs=None, l1_ratio=None)
| Initialize self. See help(type(self)) for accurate signature.
|
| fit(self, X, y, sample_weight=None)
| Fit the model according to the given training data.
|
| Parameters
| ----------
| X : {array-like, sparse matrix} of shape (n_samples, n_features)
| Training vector, where `n_samples` is the number of samples and
| `n_features` is the number of features.
|
| y : array-like of shape (n_samples,)
| Target vector relative to X.
|
| sample_weight : array-like of shape (n_samples,) default=None
| Array of weights that are assigned to individual samples.
| If not provided, then each sample is given unit weight.
|
| .. versionadded:: 0.17
| *sample_weight* support to LogisticRegression.
|
| Returns
| -------
| self
| Fitted estimator.
|
| Notes
| -----
| The SAGA solver supports both float64 and float32 bit arrays.
|
| predict_log_proba(self, X)
| Predict logarithm of probability estimates.
|
| The returned estimates for all classes are ordered by the
| label of classes.
|
| Parameters
| ----------
| X : array-like of shape (n_samples, n_features)
| Vector to be scored, where `n_samples` is the number of samples and
| `n_features` is the number of features.
|
| Returns
| -------
| T : array-like of shape (n_samples, n_classes)
| Returns the log-probability of the sample for each class in the
| model, where classes are ordered as they are in ``self.classes_``.
|
| predict_proba(self, X)
| Probability estimates.
|
| The returned estimates for all classes are ordered by the
| label of classes.
|
| For a multi_class problem, if multi_class is set to be "multinomial"
| the softmax function is used to find the predicted probability of
| each class.
| Else use a one-vs-rest approach, i.e calculate the probability
| of each class assuming it to be positive using the logistic function.
| and normalize these values across all the classes.
|
| Parameters
| ----------
| X : array-like of shape (n_samples, n_features)
| Vector to be scored, where `n_samples` is the number of samples and
| `n_features` is the number of features.
|
| Returns
| -------
| T : array-like of shape (n_samples, n_classes)
| Returns the probability of the sample for each class in the model,
| where classes are ordered as they are in ``self.classes_``.
|
| ----------------------------------------------------------------------
| Methods inherited from sklearn.linear_model._base.LinearClassifierMixin:
|
| decision_function(self, X)
| Predict confidence scores for samples.
|
| The confidence score for a sample is proportional to the signed
| distance of that sample to the hyperplane.
|
| Parameters
| ----------
| X : {array-like, sparse matrix} of shape (n_samples, n_features)
| The data matrix for which we want to get the confidence scores.
|
| Returns
| -------
| scores : ndarray of shape (n_samples,) or (n_samples, n_classes)
| Confidence scores per `(n_samples, n_classes)` combination. In the
| binary case, confidence score for `self.classes_[1]` where >0 means
| this class would be predicted.
|
| predict(self, X)
| Predict class labels for samples in X.
|
| Parameters
| ----------
| X : {array-like, sparse matrix} of shape (n_samples, n_features)
| The data matrix for which we want to get the predictions.
|
| Returns
| -------
| y_pred : ndarray of shape (n_samples,)
| Vector containing the class labels for each sample.
|
| ----------------------------------------------------------------------
| Methods inherited from sklearn.base.ClassifierMixin:
|
| score(self, X, y, sample_weight=None)
| Return the mean accuracy on the given test data and labels.
|
| In multi-label classification, this is the subset accuracy
| which is a harsh metric since you require for each sample that
| each label set be correctly predicted.
|
| Parameters
| ----------
| X : array-like of shape (n_samples, n_features)
| Test samples.
|
| y : array-like of shape (n_samples,) or (n_samples, n_outputs)
| True labels for `X`.
|
| sample_weight : array-like of shape (n_samples,), default=None
| Sample weights.
|
| Returns
| -------
| score : float
| Mean accuracy of ``self.predict(X)`` wrt. `y`.
|
| ----------------------------------------------------------------------
| Data descriptors inherited from sklearn.base.ClassifierMixin:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Methods inherited from sklearn.linear_model._base.SparseCoefMixin:
|
| densify(self)
| Convert coefficient matrix to dense array format.
|
| Converts the ``coef_`` member (back) to a numpy.ndarray. This is the
| default format of ``coef_`` and is required for fitting, so calling
| this method is only required on models that have previously been
| sparsified; otherwise, it is a no-op.
|
| Returns
| -------
| self
| Fitted estimator.
|
| sparsify(self)
| Convert coefficient matrix to sparse format.
|
| Converts the ``coef_`` member to a scipy.sparse matrix, which for
| L1-regularized models can be much more memory- and storage-efficient
| than the usual numpy.ndarray representation.
|
| The ``intercept_`` member is not converted.
|
| Returns
| -------
| self
| Fitted estimator.
|
| Notes
| -----
| For non-sparse models, i.e. when there are not many zeros in ``coef_``,
| this may actually *increase* memory usage, so use this method with
| care. A rule of thumb is that the number of zero elements, which can
| be computed with ``(coef_ == 0).sum()``, must be more than 50% for this
| to provide significant benefits.
|
| After calling this method, further fitting with the partial_fit
| method (if any) will not work until you call densify.
|
| ----------------------------------------------------------------------
| Methods inherited from sklearn.base.BaseEstimator:
|
| __getstate__(self)
|
| __repr__(self, N_CHAR_MAX=700)
| Return repr(self).
|
| __setstate__(self, state)
|
| get_params(self, deep=True)
| Get parameters for this estimator.
|
| Parameters
| ----------
| deep : bool, default=True
| If True, will return the parameters for this estimator and
| contained subobjects that are estimators.
|
| Returns
| -------
| params : dict
| Parameter names mapped to their values.
|
| set_params(self, **params)
| Set the parameters of this estimator.
|
| The method works on simple estimators as well as on nested objects
| (such as :class:`~sklearn.pipeline.Pipeline`). The latter have
| parameters of the form ``<component>__<parameter>`` so that it's
| possible to update each component of a nested object.
|
| Parameters
| ----------
| **params : dict
| Estimator parameters.
|
| Returns
| -------
| self : estimator instance
| Estimator instance.
clf = LogisticRegression(max_iter=400)
clf.fit(X_train, y_train)
LogisticRegression(max_iter=400)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LogisticRegression(max_iter=400)
clf.score(X_test, y_test)
0.9705882352941176
Check the
coef_
attribute. How does it relate to thecoef_
attribute we found above, where we were only considering Chinstrap penguins?
clf.coef_
array([[-0.10784061, -0.57168209],
[-0.2536906 , 0.67608474],
[ 0.36153121, -0.10440267]])
Remember how the order of False
and True
above was important. Here the order of the penguin species is also important.
clf.classes_
array(['Adelie', 'Chinstrap', 'Gentoo'], dtype=object)
Here are the predicted probabilities for that “incorrect” row 122
from above. (We know the middle number corresponds to Chinstrap, by looking at the order of the values in clf.classes_
.)
clf.predict_proba(df.loc[[122],cols])
array([[9.14508321e-01, 8.54914364e-02, 2.42247434e-07]])
df.loc[122,:]
species Adelie
island Torgersen
bill_length_mm 40.2
bill_depth_mm 17.0
flipper_length_mm 176.0
body_mass_g 3450.0
sex Female
isChinstrap False
Name: 122, dtype: object
Here is what it looks like if we evaluate the predict_proba
method on a 4-row sub-DataFrame. Notice how the output has four rows also and three columns (one column for each target class).
clf.predict_proba(df.loc[121:124,cols])
array([[9.97326453e-01, 1.66458507e-04, 2.50708838e-03],
[9.14508321e-01, 8.54914364e-02, 2.42247434e-07],
[9.07428478e-01, 8.54943022e-03, 8.40220913e-02],
[9.99954759e-01, 4.24429522e-05, 2.79784347e-06]])
Add a column “pred” to
df
containing the predicted values.
df['pred'] = clf.predict(df[cols])
Notice how the “pred” column at the far right contains penguin species strings. So the predict
method can output strings, not just numbers or Boolean values.
df
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | isChinstrap | pred | |
---|---|---|---|---|---|---|---|---|---|
0 | Adelie | Torgersen | 39.1 | 18.7 | 181.0 | 3750.0 | Male | False | Adelie |
1 | Adelie | Torgersen | 39.5 | 17.4 | 186.0 | 3800.0 | Female | False | Adelie |
2 | Adelie | Torgersen | 40.3 | 18.0 | 195.0 | 3250.0 | Female | False | Adelie |
4 | Adelie | Torgersen | 36.7 | 19.3 | 193.0 | 3450.0 | Female | False | Adelie |
5 | Adelie | Torgersen | 39.3 | 20.6 | 190.0 | 3650.0 | Male | False | Adelie |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
338 | Gentoo | Biscoe | 47.2 | 13.7 | 214.0 | 4925.0 | Female | False | Gentoo |
340 | Gentoo | Biscoe | 46.8 | 14.3 | 215.0 | 4850.0 | Female | False | Gentoo |
341 | Gentoo | Biscoe | 50.4 | 15.7 | 222.0 | 5750.0 | Male | False | Gentoo |
342 | Gentoo | Biscoe | 45.2 | 14.8 | 212.0 | 5200.0 | Female | False | Gentoo |
343 | Gentoo | Biscoe | 49.9 | 16.1 | 213.0 | 5400.0 | Male | False | Gentoo |
333 rows Ă— 9 columns
Make an Altair scatter plot showing the predicted values.
The portion where the model switches from one prediction to another prediction is called the “decision boundary”. It’s hard to recognize the decision boundary in this picture, because there is so much empty space.
alt.Chart(df).mark_circle().encode(
x = alt.X(cols[0], scale = alt.Scale(zero=False)),
y = alt.Y(cols[1], scale = alt.Scale(zero=False)),
color = 'pred:N'
)
It is a little hard to see from the above picture how the predictions are made. It turns out there are a few straight line segments, and on one side of each line segment, one prediction is made, and on the other side, another prediction is made. We will make a fake dataset from which these “decision boundaries” are more clear.
Using
np.linspace
, make a NumPy array of 70 equally spaced x-coordinates and 70 equally spaced y-coordinates. Name these NumPy arraysx
andy
.
Notice how the ranges chosen here are chosen so that it matches the approximate ranges of the flipper length and the bill length.
x = np.linspace(170, 235, 70)
y = np.linspace(30, 60, 70)
x.shape
(70,)
Make a DataFrame
df_art
(for “artificial”) containing all the possible pairs of coordinates fromx
andy
. (We chose70
above sodf_art
will have4900
rows, which is a good length for Altair.)
To get these pairs, we can use itertools.product
.
from itertools import product
product(x, y)
<itertools.product at 0x7f51428f6ec0>
# If we convert it to a list, it's more clear.
list(product(x, y))
[(170.0, 30.0),
(170.0, 30.434782608695652),
(170.0, 30.869565217391305),
(170.0, 31.304347826086957),
(170.0, 31.73913043478261),
(170.0, 32.17391304347826),
(170.0, 32.608695652173914),
(170.0, 33.04347826086956),
(170.0, 33.47826086956522),
(170.0, 33.91304347826087),
(170.0, 34.34782608695652),
(170.0, 34.78260869565217),
(170.0, 35.21739130434783),
(170.0, 35.65217391304348),
(170.0, 36.08695652173913),
(170.0, 36.52173913043478),
(170.0, 36.95652173913044),
(170.0, 37.391304347826086),
(170.0, 37.82608695652174),
(170.0, 38.26086956521739),
(170.0, 38.69565217391305),
(170.0, 39.130434782608695),
(170.0, 39.565217391304344),
(170.0, 40.0),
(170.0, 40.434782608695656),
(170.0, 40.869565217391305),
(170.0, 41.30434782608695),
(170.0, 41.73913043478261),
(170.0, 42.17391304347826),
(170.0, 42.608695652173914),
(170.0, 43.04347826086956),
(170.0, 43.47826086956522),
(170.0, 43.91304347826087),
(170.0, 44.34782608695652),
(170.0, 44.78260869565217),
(170.0, 45.21739130434783),
(170.0, 45.65217391304348),
(170.0, 46.086956521739125),
(170.0, 46.52173913043478),
(170.0, 46.95652173913044),
(170.0, 47.391304347826086),
(170.0, 47.826086956521735),
(170.0, 48.26086956521739),
(170.0, 48.69565217391305),
(170.0, 49.130434782608695),
(170.0, 49.565217391304344),
(170.0, 50.0),
(170.0, 50.434782608695656),
(170.0, 50.869565217391305),
(170.0, 51.30434782608695),
(170.0, 51.73913043478261),
(170.0, 52.173913043478265),
(170.0, 52.608695652173914),
(170.0, 53.04347826086956),
(170.0, 53.47826086956522),
(170.0, 53.91304347826087),
(170.0, 54.347826086956516),
(170.0, 54.78260869565217),
(170.0, 55.21739130434783),
(170.0, 55.65217391304348),
(170.0, 56.086956521739125),
(170.0, 56.52173913043478),
(170.0, 56.95652173913044),
(170.0, 57.391304347826086),
(170.0, 57.826086956521735),
(170.0, 58.26086956521739),
(170.0, 58.69565217391305),
(170.0, 59.130434782608695),
(170.0, 59.565217391304344),
(170.0, 60.0),
(170.94202898550725, 30.0),
(170.94202898550725, 30.434782608695652),
(170.94202898550725, 30.869565217391305),
(170.94202898550725, 31.304347826086957),
(170.94202898550725, 31.73913043478261),
(170.94202898550725, 32.17391304347826),
(170.94202898550725, 32.608695652173914),
(170.94202898550725, 33.04347826086956),
(170.94202898550725, 33.47826086956522),
(170.94202898550725, 33.91304347826087),
(170.94202898550725, 34.34782608695652),
(170.94202898550725, 34.78260869565217),
(170.94202898550725, 35.21739130434783),
(170.94202898550725, 35.65217391304348),
(170.94202898550725, 36.08695652173913),
(170.94202898550725, 36.52173913043478),
(170.94202898550725, 36.95652173913044),
(170.94202898550725, 37.391304347826086),
(170.94202898550725, 37.82608695652174),
(170.94202898550725, 38.26086956521739),
(170.94202898550725, 38.69565217391305),
(170.94202898550725, 39.130434782608695),
(170.94202898550725, 39.565217391304344),
(170.94202898550725, 40.0),
(170.94202898550725, 40.434782608695656),
(170.94202898550725, 40.869565217391305),
(170.94202898550725, 41.30434782608695),
(170.94202898550725, 41.73913043478261),
(170.94202898550725, 42.17391304347826),
(170.94202898550725, 42.608695652173914),
(170.94202898550725, 43.04347826086956),
(170.94202898550725, 43.47826086956522),
(170.94202898550725, 43.91304347826087),
(170.94202898550725, 44.34782608695652),
(170.94202898550725, 44.78260869565217),
(170.94202898550725, 45.21739130434783),
(170.94202898550725, 45.65217391304348),
(170.94202898550725, 46.086956521739125),
(170.94202898550725, 46.52173913043478),
(170.94202898550725, 46.95652173913044),
(170.94202898550725, 47.391304347826086),
(170.94202898550725, 47.826086956521735),
(170.94202898550725, 48.26086956521739),
(170.94202898550725, 48.69565217391305),
(170.94202898550725, 49.130434782608695),
(170.94202898550725, 49.565217391304344),
(170.94202898550725, 50.0),
(170.94202898550725, 50.434782608695656),
(170.94202898550725, 50.869565217391305),
(170.94202898550725, 51.30434782608695),
(170.94202898550725, 51.73913043478261),
(170.94202898550725, 52.173913043478265),
(170.94202898550725, 52.608695652173914),
(170.94202898550725, 53.04347826086956),
(170.94202898550725, 53.47826086956522),
(170.94202898550725, 53.91304347826087),
(170.94202898550725, 54.347826086956516),
(170.94202898550725, 54.78260869565217),
(170.94202898550725, 55.21739130434783),
(170.94202898550725, 55.65217391304348),
(170.94202898550725, 56.086956521739125),
(170.94202898550725, 56.52173913043478),
(170.94202898550725, 56.95652173913044),
(170.94202898550725, 57.391304347826086),
(170.94202898550725, 57.826086956521735),
(170.94202898550725, 58.26086956521739),
(170.94202898550725, 58.69565217391305),
(170.94202898550725, 59.130434782608695),
(170.94202898550725, 59.565217391304344),
(170.94202898550725, 60.0),
(171.8840579710145, 30.0),
(171.8840579710145, 30.434782608695652),
(171.8840579710145, 30.869565217391305),
(171.8840579710145, 31.304347826086957),
(171.8840579710145, 31.73913043478261),
(171.8840579710145, 32.17391304347826),
(171.8840579710145, 32.608695652173914),
(171.8840579710145, 33.04347826086956),
(171.8840579710145, 33.47826086956522),
(171.8840579710145, 33.91304347826087),
(171.8840579710145, 34.34782608695652),
(171.8840579710145, 34.78260869565217),
(171.8840579710145, 35.21739130434783),
(171.8840579710145, 35.65217391304348),
(171.8840579710145, 36.08695652173913),
(171.8840579710145, 36.52173913043478),
(171.8840579710145, 36.95652173913044),
(171.8840579710145, 37.391304347826086),
(171.8840579710145, 37.82608695652174),
(171.8840579710145, 38.26086956521739),
(171.8840579710145, 38.69565217391305),
(171.8840579710145, 39.130434782608695),
(171.8840579710145, 39.565217391304344),
(171.8840579710145, 40.0),
(171.8840579710145, 40.434782608695656),
(171.8840579710145, 40.869565217391305),
(171.8840579710145, 41.30434782608695),
(171.8840579710145, 41.73913043478261),
(171.8840579710145, 42.17391304347826),
(171.8840579710145, 42.608695652173914),
(171.8840579710145, 43.04347826086956),
(171.8840579710145, 43.47826086956522),
(171.8840579710145, 43.91304347826087),
(171.8840579710145, 44.34782608695652),
(171.8840579710145, 44.78260869565217),
(171.8840579710145, 45.21739130434783),
(171.8840579710145, 45.65217391304348),
(171.8840579710145, 46.086956521739125),
(171.8840579710145, 46.52173913043478),
(171.8840579710145, 46.95652173913044),
(171.8840579710145, 47.391304347826086),
(171.8840579710145, 47.826086956521735),
(171.8840579710145, 48.26086956521739),
(171.8840579710145, 48.69565217391305),
(171.8840579710145, 49.130434782608695),
(171.8840579710145, 49.565217391304344),
(171.8840579710145, 50.0),
(171.8840579710145, 50.434782608695656),
(171.8840579710145, 50.869565217391305),
(171.8840579710145, 51.30434782608695),
(171.8840579710145, 51.73913043478261),
(171.8840579710145, 52.173913043478265),
(171.8840579710145, 52.608695652173914),
(171.8840579710145, 53.04347826086956),
(171.8840579710145, 53.47826086956522),
(171.8840579710145, 53.91304347826087),
(171.8840579710145, 54.347826086956516),
(171.8840579710145, 54.78260869565217),
(171.8840579710145, 55.21739130434783),
(171.8840579710145, 55.65217391304348),
(171.8840579710145, 56.086956521739125),
(171.8840579710145, 56.52173913043478),
(171.8840579710145, 56.95652173913044),
(171.8840579710145, 57.391304347826086),
(171.8840579710145, 57.826086956521735),
(171.8840579710145, 58.26086956521739),
(171.8840579710145, 58.69565217391305),
(171.8840579710145, 59.130434782608695),
(171.8840579710145, 59.565217391304344),
(171.8840579710145, 60.0),
(172.82608695652175, 30.0),
(172.82608695652175, 30.434782608695652),
(172.82608695652175, 30.869565217391305),
(172.82608695652175, 31.304347826086957),
(172.82608695652175, 31.73913043478261),
(172.82608695652175, 32.17391304347826),
(172.82608695652175, 32.608695652173914),
(172.82608695652175, 33.04347826086956),
(172.82608695652175, 33.47826086956522),
(172.82608695652175, 33.91304347826087),
(172.82608695652175, 34.34782608695652),
(172.82608695652175, 34.78260869565217),
(172.82608695652175, 35.21739130434783),
(172.82608695652175, 35.65217391304348),
(172.82608695652175, 36.08695652173913),
(172.82608695652175, 36.52173913043478),
(172.82608695652175, 36.95652173913044),
(172.82608695652175, 37.391304347826086),
(172.82608695652175, 37.82608695652174),
(172.82608695652175, 38.26086956521739),
(172.82608695652175, 38.69565217391305),
(172.82608695652175, 39.130434782608695),
(172.82608695652175, 39.565217391304344),
(172.82608695652175, 40.0),
(172.82608695652175, 40.434782608695656),
(172.82608695652175, 40.869565217391305),
(172.82608695652175, 41.30434782608695),
(172.82608695652175, 41.73913043478261),
(172.82608695652175, 42.17391304347826),
(172.82608695652175, 42.608695652173914),
(172.82608695652175, 43.04347826086956),
(172.82608695652175, 43.47826086956522),
(172.82608695652175, 43.91304347826087),
(172.82608695652175, 44.34782608695652),
(172.82608695652175, 44.78260869565217),
(172.82608695652175, 45.21739130434783),
(172.82608695652175, 45.65217391304348),
(172.82608695652175, 46.086956521739125),
(172.82608695652175, 46.52173913043478),
(172.82608695652175, 46.95652173913044),
(172.82608695652175, 47.391304347826086),
(172.82608695652175, 47.826086956521735),
(172.82608695652175, 48.26086956521739),
(172.82608695652175, 48.69565217391305),
(172.82608695652175, 49.130434782608695),
(172.82608695652175, 49.565217391304344),
(172.82608695652175, 50.0),
(172.82608695652175, 50.434782608695656),
(172.82608695652175, 50.869565217391305),
(172.82608695652175, 51.30434782608695),
(172.82608695652175, 51.73913043478261),
(172.82608695652175, 52.173913043478265),
(172.82608695652175, 52.608695652173914),
(172.82608695652175, 53.04347826086956),
(172.82608695652175, 53.47826086956522),
(172.82608695652175, 53.91304347826087),
(172.82608695652175, 54.347826086956516),
(172.82608695652175, 54.78260869565217),
(172.82608695652175, 55.21739130434783),
(172.82608695652175, 55.65217391304348),
(172.82608695652175, 56.086956521739125),
(172.82608695652175, 56.52173913043478),
(172.82608695652175, 56.95652173913044),
(172.82608695652175, 57.391304347826086),
(172.82608695652175, 57.826086956521735),
(172.82608695652175, 58.26086956521739),
(172.82608695652175, 58.69565217391305),
(172.82608695652175, 59.130434782608695),
(172.82608695652175, 59.565217391304344),
(172.82608695652175, 60.0),
(173.768115942029, 30.0),
(173.768115942029, 30.434782608695652),
(173.768115942029, 30.869565217391305),
(173.768115942029, 31.304347826086957),
(173.768115942029, 31.73913043478261),
(173.768115942029, 32.17391304347826),
(173.768115942029, 32.608695652173914),
(173.768115942029, 33.04347826086956),
(173.768115942029, 33.47826086956522),
(173.768115942029, 33.91304347826087),
(173.768115942029, 34.34782608695652),
(173.768115942029, 34.78260869565217),
(173.768115942029, 35.21739130434783),
(173.768115942029, 35.65217391304348),
(173.768115942029, 36.08695652173913),
(173.768115942029, 36.52173913043478),
(173.768115942029, 36.95652173913044),
(173.768115942029, 37.391304347826086),
(173.768115942029, 37.82608695652174),
(173.768115942029, 38.26086956521739),
(173.768115942029, 38.69565217391305),
(173.768115942029, 39.130434782608695),
(173.768115942029, 39.565217391304344),
(173.768115942029, 40.0),
(173.768115942029, 40.434782608695656),
(173.768115942029, 40.869565217391305),
(173.768115942029, 41.30434782608695),
(173.768115942029, 41.73913043478261),
(173.768115942029, 42.17391304347826),
(173.768115942029, 42.608695652173914),
(173.768115942029, 43.04347826086956),
(173.768115942029, 43.47826086956522),
(173.768115942029, 43.91304347826087),
(173.768115942029, 44.34782608695652),
(173.768115942029, 44.78260869565217),
(173.768115942029, 45.21739130434783),
(173.768115942029, 45.65217391304348),
(173.768115942029, 46.086956521739125),
(173.768115942029, 46.52173913043478),
(173.768115942029, 46.95652173913044),
(173.768115942029, 47.391304347826086),
(173.768115942029, 47.826086956521735),
(173.768115942029, 48.26086956521739),
(173.768115942029, 48.69565217391305),
(173.768115942029, 49.130434782608695),
(173.768115942029, 49.565217391304344),
(173.768115942029, 50.0),
(173.768115942029, 50.434782608695656),
(173.768115942029, 50.869565217391305),
(173.768115942029, 51.30434782608695),
(173.768115942029, 51.73913043478261),
(173.768115942029, 52.173913043478265),
(173.768115942029, 52.608695652173914),
(173.768115942029, 53.04347826086956),
(173.768115942029, 53.47826086956522),
(173.768115942029, 53.91304347826087),
(173.768115942029, 54.347826086956516),
(173.768115942029, 54.78260869565217),
(173.768115942029, 55.21739130434783),
(173.768115942029, 55.65217391304348),
(173.768115942029, 56.086956521739125),
(173.768115942029, 56.52173913043478),
(173.768115942029, 56.95652173913044),
(173.768115942029, 57.391304347826086),
(173.768115942029, 57.826086956521735),
(173.768115942029, 58.26086956521739),
(173.768115942029, 58.69565217391305),
(173.768115942029, 59.130434782608695),
(173.768115942029, 59.565217391304344),
(173.768115942029, 60.0),
(174.71014492753622, 30.0),
(174.71014492753622, 30.434782608695652),
(174.71014492753622, 30.869565217391305),
(174.71014492753622, 31.304347826086957),
(174.71014492753622, 31.73913043478261),
(174.71014492753622, 32.17391304347826),
(174.71014492753622, 32.608695652173914),
(174.71014492753622, 33.04347826086956),
(174.71014492753622, 33.47826086956522),
(174.71014492753622, 33.91304347826087),
(174.71014492753622, 34.34782608695652),
(174.71014492753622, 34.78260869565217),
(174.71014492753622, 35.21739130434783),
(174.71014492753622, 35.65217391304348),
(174.71014492753622, 36.08695652173913),
(174.71014492753622, 36.52173913043478),
(174.71014492753622, 36.95652173913044),
(174.71014492753622, 37.391304347826086),
(174.71014492753622, 37.82608695652174),
(174.71014492753622, 38.26086956521739),
(174.71014492753622, 38.69565217391305),
(174.71014492753622, 39.130434782608695),
(174.71014492753622, 39.565217391304344),
(174.71014492753622, 40.0),
(174.71014492753622, 40.434782608695656),
(174.71014492753622, 40.869565217391305),
(174.71014492753622, 41.30434782608695),
(174.71014492753622, 41.73913043478261),
(174.71014492753622, 42.17391304347826),
(174.71014492753622, 42.608695652173914),
(174.71014492753622, 43.04347826086956),
(174.71014492753622, 43.47826086956522),
(174.71014492753622, 43.91304347826087),
(174.71014492753622, 44.34782608695652),
(174.71014492753622, 44.78260869565217),
(174.71014492753622, 45.21739130434783),
(174.71014492753622, 45.65217391304348),
(174.71014492753622, 46.086956521739125),
(174.71014492753622, 46.52173913043478),
(174.71014492753622, 46.95652173913044),
(174.71014492753622, 47.391304347826086),
(174.71014492753622, 47.826086956521735),
(174.71014492753622, 48.26086956521739),
(174.71014492753622, 48.69565217391305),
(174.71014492753622, 49.130434782608695),
(174.71014492753622, 49.565217391304344),
(174.71014492753622, 50.0),
(174.71014492753622, 50.434782608695656),
(174.71014492753622, 50.869565217391305),
(174.71014492753622, 51.30434782608695),
(174.71014492753622, 51.73913043478261),
(174.71014492753622, 52.173913043478265),
(174.71014492753622, 52.608695652173914),
(174.71014492753622, 53.04347826086956),
(174.71014492753622, 53.47826086956522),
(174.71014492753622, 53.91304347826087),
(174.71014492753622, 54.347826086956516),
(174.71014492753622, 54.78260869565217),
(174.71014492753622, 55.21739130434783),
(174.71014492753622, 55.65217391304348),
(174.71014492753622, 56.086956521739125),
(174.71014492753622, 56.52173913043478),
(174.71014492753622, 56.95652173913044),
(174.71014492753622, 57.391304347826086),
(174.71014492753622, 57.826086956521735),
(174.71014492753622, 58.26086956521739),
(174.71014492753622, 58.69565217391305),
(174.71014492753622, 59.130434782608695),
(174.71014492753622, 59.565217391304344),
(174.71014492753622, 60.0),
(175.65217391304347, 30.0),
(175.65217391304347, 30.434782608695652),
(175.65217391304347, 30.869565217391305),
(175.65217391304347, 31.304347826086957),
(175.65217391304347, 31.73913043478261),
(175.65217391304347, 32.17391304347826),
(175.65217391304347, 32.608695652173914),
(175.65217391304347, 33.04347826086956),
(175.65217391304347, 33.47826086956522),
(175.65217391304347, 33.91304347826087),
(175.65217391304347, 34.34782608695652),
(175.65217391304347, 34.78260869565217),
(175.65217391304347, 35.21739130434783),
(175.65217391304347, 35.65217391304348),
(175.65217391304347, 36.08695652173913),
(175.65217391304347, 36.52173913043478),
(175.65217391304347, 36.95652173913044),
(175.65217391304347, 37.391304347826086),
(175.65217391304347, 37.82608695652174),
(175.65217391304347, 38.26086956521739),
(175.65217391304347, 38.69565217391305),
(175.65217391304347, 39.130434782608695),
(175.65217391304347, 39.565217391304344),
(175.65217391304347, 40.0),
(175.65217391304347, 40.434782608695656),
(175.65217391304347, 40.869565217391305),
(175.65217391304347, 41.30434782608695),
(175.65217391304347, 41.73913043478261),
(175.65217391304347, 42.17391304347826),
(175.65217391304347, 42.608695652173914),
(175.65217391304347, 43.04347826086956),
(175.65217391304347, 43.47826086956522),
(175.65217391304347, 43.91304347826087),
(175.65217391304347, 44.34782608695652),
(175.65217391304347, 44.78260869565217),
(175.65217391304347, 45.21739130434783),
(175.65217391304347, 45.65217391304348),
(175.65217391304347, 46.086956521739125),
(175.65217391304347, 46.52173913043478),
(175.65217391304347, 46.95652173913044),
(175.65217391304347, 47.391304347826086),
(175.65217391304347, 47.826086956521735),
(175.65217391304347, 48.26086956521739),
(175.65217391304347, 48.69565217391305),
(175.65217391304347, 49.130434782608695),
(175.65217391304347, 49.565217391304344),
(175.65217391304347, 50.0),
(175.65217391304347, 50.434782608695656),
(175.65217391304347, 50.869565217391305),
(175.65217391304347, 51.30434782608695),
(175.65217391304347, 51.73913043478261),
(175.65217391304347, 52.173913043478265),
(175.65217391304347, 52.608695652173914),
(175.65217391304347, 53.04347826086956),
(175.65217391304347, 53.47826086956522),
(175.65217391304347, 53.91304347826087),
(175.65217391304347, 54.347826086956516),
(175.65217391304347, 54.78260869565217),
(175.65217391304347, 55.21739130434783),
(175.65217391304347, 55.65217391304348),
(175.65217391304347, 56.086956521739125),
(175.65217391304347, 56.52173913043478),
(175.65217391304347, 56.95652173913044),
(175.65217391304347, 57.391304347826086),
(175.65217391304347, 57.826086956521735),
(175.65217391304347, 58.26086956521739),
(175.65217391304347, 58.69565217391305),
(175.65217391304347, 59.130434782608695),
(175.65217391304347, 59.565217391304344),
(175.65217391304347, 60.0),
(176.59420289855072, 30.0),
(176.59420289855072, 30.434782608695652),
(176.59420289855072, 30.869565217391305),
(176.59420289855072, 31.304347826086957),
(176.59420289855072, 31.73913043478261),
(176.59420289855072, 32.17391304347826),
(176.59420289855072, 32.608695652173914),
(176.59420289855072, 33.04347826086956),
(176.59420289855072, 33.47826086956522),
(176.59420289855072, 33.91304347826087),
(176.59420289855072, 34.34782608695652),
(176.59420289855072, 34.78260869565217),
(176.59420289855072, 35.21739130434783),
(176.59420289855072, 35.65217391304348),
(176.59420289855072, 36.08695652173913),
(176.59420289855072, 36.52173913043478),
(176.59420289855072, 36.95652173913044),
(176.59420289855072, 37.391304347826086),
(176.59420289855072, 37.82608695652174),
(176.59420289855072, 38.26086956521739),
(176.59420289855072, 38.69565217391305),
(176.59420289855072, 39.130434782608695),
(176.59420289855072, 39.565217391304344),
(176.59420289855072, 40.0),
(176.59420289855072, 40.434782608695656),
(176.59420289855072, 40.869565217391305),
(176.59420289855072, 41.30434782608695),
(176.59420289855072, 41.73913043478261),
(176.59420289855072, 42.17391304347826),
(176.59420289855072, 42.608695652173914),
(176.59420289855072, 43.04347826086956),
(176.59420289855072, 43.47826086956522),
(176.59420289855072, 43.91304347826087),
(176.59420289855072, 44.34782608695652),
(176.59420289855072, 44.78260869565217),
(176.59420289855072, 45.21739130434783),
(176.59420289855072, 45.65217391304348),
(176.59420289855072, 46.086956521739125),
(176.59420289855072, 46.52173913043478),
(176.59420289855072, 46.95652173913044),
(176.59420289855072, 47.391304347826086),
(176.59420289855072, 47.826086956521735),
(176.59420289855072, 48.26086956521739),
(176.59420289855072, 48.69565217391305),
(176.59420289855072, 49.130434782608695),
(176.59420289855072, 49.565217391304344),
(176.59420289855072, 50.0),
(176.59420289855072, 50.434782608695656),
(176.59420289855072, 50.869565217391305),
(176.59420289855072, 51.30434782608695),
(176.59420289855072, 51.73913043478261),
(176.59420289855072, 52.173913043478265),
(176.59420289855072, 52.608695652173914),
(176.59420289855072, 53.04347826086956),
(176.59420289855072, 53.47826086956522),
(176.59420289855072, 53.91304347826087),
(176.59420289855072, 54.347826086956516),
(176.59420289855072, 54.78260869565217),
(176.59420289855072, 55.21739130434783),
(176.59420289855072, 55.65217391304348),
(176.59420289855072, 56.086956521739125),
(176.59420289855072, 56.52173913043478),
(176.59420289855072, 56.95652173913044),
(176.59420289855072, 57.391304347826086),
(176.59420289855072, 57.826086956521735),
(176.59420289855072, 58.26086956521739),
(176.59420289855072, 58.69565217391305),
(176.59420289855072, 59.130434782608695),
(176.59420289855072, 59.565217391304344),
(176.59420289855072, 60.0),
(177.53623188405797, 30.0),
(177.53623188405797, 30.434782608695652),
(177.53623188405797, 30.869565217391305),
(177.53623188405797, 31.304347826086957),
(177.53623188405797, 31.73913043478261),
(177.53623188405797, 32.17391304347826),
(177.53623188405797, 32.608695652173914),
(177.53623188405797, 33.04347826086956),
(177.53623188405797, 33.47826086956522),
(177.53623188405797, 33.91304347826087),
(177.53623188405797, 34.34782608695652),
(177.53623188405797, 34.78260869565217),
(177.53623188405797, 35.21739130434783),
(177.53623188405797, 35.65217391304348),
(177.53623188405797, 36.08695652173913),
(177.53623188405797, 36.52173913043478),
(177.53623188405797, 36.95652173913044),
(177.53623188405797, 37.391304347826086),
(177.53623188405797, 37.82608695652174),
(177.53623188405797, 38.26086956521739),
(177.53623188405797, 38.69565217391305),
(177.53623188405797, 39.130434782608695),
(177.53623188405797, 39.565217391304344),
(177.53623188405797, 40.0),
(177.53623188405797, 40.434782608695656),
(177.53623188405797, 40.869565217391305),
(177.53623188405797, 41.30434782608695),
(177.53623188405797, 41.73913043478261),
(177.53623188405797, 42.17391304347826),
(177.53623188405797, 42.608695652173914),
(177.53623188405797, 43.04347826086956),
(177.53623188405797, 43.47826086956522),
(177.53623188405797, 43.91304347826087),
(177.53623188405797, 44.34782608695652),
(177.53623188405797, 44.78260869565217),
(177.53623188405797, 45.21739130434783),
(177.53623188405797, 45.65217391304348),
(177.53623188405797, 46.086956521739125),
(177.53623188405797, 46.52173913043478),
(177.53623188405797, 46.95652173913044),
(177.53623188405797, 47.391304347826086),
(177.53623188405797, 47.826086956521735),
(177.53623188405797, 48.26086956521739),
(177.53623188405797, 48.69565217391305),
(177.53623188405797, 49.130434782608695),
(177.53623188405797, 49.565217391304344),
(177.53623188405797, 50.0),
(177.53623188405797, 50.434782608695656),
(177.53623188405797, 50.869565217391305),
(177.53623188405797, 51.30434782608695),
(177.53623188405797, 51.73913043478261),
(177.53623188405797, 52.173913043478265),
(177.53623188405797, 52.608695652173914),
(177.53623188405797, 53.04347826086956),
(177.53623188405797, 53.47826086956522),
(177.53623188405797, 53.91304347826087),
(177.53623188405797, 54.347826086956516),
(177.53623188405797, 54.78260869565217),
(177.53623188405797, 55.21739130434783),
(177.53623188405797, 55.65217391304348),
(177.53623188405797, 56.086956521739125),
(177.53623188405797, 56.52173913043478),
(177.53623188405797, 56.95652173913044),
(177.53623188405797, 57.391304347826086),
(177.53623188405797, 57.826086956521735),
(177.53623188405797, 58.26086956521739),
(177.53623188405797, 58.69565217391305),
(177.53623188405797, 59.130434782608695),
(177.53623188405797, 59.565217391304344),
(177.53623188405797, 60.0),
(178.47826086956522, 30.0),
(178.47826086956522, 30.434782608695652),
(178.47826086956522, 30.869565217391305),
(178.47826086956522, 31.304347826086957),
(178.47826086956522, 31.73913043478261),
(178.47826086956522, 32.17391304347826),
(178.47826086956522, 32.608695652173914),
(178.47826086956522, 33.04347826086956),
(178.47826086956522, 33.47826086956522),
(178.47826086956522, 33.91304347826087),
(178.47826086956522, 34.34782608695652),
(178.47826086956522, 34.78260869565217),
(178.47826086956522, 35.21739130434783),
(178.47826086956522, 35.65217391304348),
(178.47826086956522, 36.08695652173913),
(178.47826086956522, 36.52173913043478),
(178.47826086956522, 36.95652173913044),
(178.47826086956522, 37.391304347826086),
(178.47826086956522, 37.82608695652174),
(178.47826086956522, 38.26086956521739),
(178.47826086956522, 38.69565217391305),
(178.47826086956522, 39.130434782608695),
(178.47826086956522, 39.565217391304344),
(178.47826086956522, 40.0),
(178.47826086956522, 40.434782608695656),
(178.47826086956522, 40.869565217391305),
(178.47826086956522, 41.30434782608695),
(178.47826086956522, 41.73913043478261),
(178.47826086956522, 42.17391304347826),
(178.47826086956522, 42.608695652173914),
(178.47826086956522, 43.04347826086956),
(178.47826086956522, 43.47826086956522),
(178.47826086956522, 43.91304347826087),
(178.47826086956522, 44.34782608695652),
(178.47826086956522, 44.78260869565217),
(178.47826086956522, 45.21739130434783),
(178.47826086956522, 45.65217391304348),
(178.47826086956522, 46.086956521739125),
(178.47826086956522, 46.52173913043478),
(178.47826086956522, 46.95652173913044),
(178.47826086956522, 47.391304347826086),
(178.47826086956522, 47.826086956521735),
(178.47826086956522, 48.26086956521739),
(178.47826086956522, 48.69565217391305),
(178.47826086956522, 49.130434782608695),
(178.47826086956522, 49.565217391304344),
(178.47826086956522, 50.0),
(178.47826086956522, 50.434782608695656),
(178.47826086956522, 50.869565217391305),
(178.47826086956522, 51.30434782608695),
(178.47826086956522, 51.73913043478261),
(178.47826086956522, 52.173913043478265),
(178.47826086956522, 52.608695652173914),
(178.47826086956522, 53.04347826086956),
(178.47826086956522, 53.47826086956522),
(178.47826086956522, 53.91304347826087),
(178.47826086956522, 54.347826086956516),
(178.47826086956522, 54.78260869565217),
(178.47826086956522, 55.21739130434783),
(178.47826086956522, 55.65217391304348),
(178.47826086956522, 56.086956521739125),
(178.47826086956522, 56.52173913043478),
(178.47826086956522, 56.95652173913044),
(178.47826086956522, 57.391304347826086),
(178.47826086956522, 57.826086956521735),
(178.47826086956522, 58.26086956521739),
(178.47826086956522, 58.69565217391305),
(178.47826086956522, 59.130434782608695),
(178.47826086956522, 59.565217391304344),
(178.47826086956522, 60.0),
(179.42028985507247, 30.0),
(179.42028985507247, 30.434782608695652),
(179.42028985507247, 30.869565217391305),
(179.42028985507247, 31.304347826086957),
(179.42028985507247, 31.73913043478261),
(179.42028985507247, 32.17391304347826),
(179.42028985507247, 32.608695652173914),
(179.42028985507247, 33.04347826086956),
(179.42028985507247, 33.47826086956522),
(179.42028985507247, 33.91304347826087),
(179.42028985507247, 34.34782608695652),
(179.42028985507247, 34.78260869565217),
(179.42028985507247, 35.21739130434783),
(179.42028985507247, 35.65217391304348),
(179.42028985507247, 36.08695652173913),
(179.42028985507247, 36.52173913043478),
(179.42028985507247, 36.95652173913044),
(179.42028985507247, 37.391304347826086),
(179.42028985507247, 37.82608695652174),
(179.42028985507247, 38.26086956521739),
(179.42028985507247, 38.69565217391305),
(179.42028985507247, 39.130434782608695),
(179.42028985507247, 39.565217391304344),
(179.42028985507247, 40.0),
(179.42028985507247, 40.434782608695656),
(179.42028985507247, 40.869565217391305),
(179.42028985507247, 41.30434782608695),
(179.42028985507247, 41.73913043478261),
(179.42028985507247, 42.17391304347826),
(179.42028985507247, 42.608695652173914),
(179.42028985507247, 43.04347826086956),
(179.42028985507247, 43.47826086956522),
(179.42028985507247, 43.91304347826087),
(179.42028985507247, 44.34782608695652),
(179.42028985507247, 44.78260869565217),
(179.42028985507247, 45.21739130434783),
(179.42028985507247, 45.65217391304348),
(179.42028985507247, 46.086956521739125),
(179.42028985507247, 46.52173913043478),
(179.42028985507247, 46.95652173913044),
(179.42028985507247, 47.391304347826086),
(179.42028985507247, 47.826086956521735),
(179.42028985507247, 48.26086956521739),
(179.42028985507247, 48.69565217391305),
(179.42028985507247, 49.130434782608695),
(179.42028985507247, 49.565217391304344),
(179.42028985507247, 50.0),
(179.42028985507247, 50.434782608695656),
(179.42028985507247, 50.869565217391305),
(179.42028985507247, 51.30434782608695),
(179.42028985507247, 51.73913043478261),
(179.42028985507247, 52.173913043478265),
(179.42028985507247, 52.608695652173914),
(179.42028985507247, 53.04347826086956),
(179.42028985507247, 53.47826086956522),
(179.42028985507247, 53.91304347826087),
(179.42028985507247, 54.347826086956516),
(179.42028985507247, 54.78260869565217),
(179.42028985507247, 55.21739130434783),
(179.42028985507247, 55.65217391304348),
(179.42028985507247, 56.086956521739125),
(179.42028985507247, 56.52173913043478),
(179.42028985507247, 56.95652173913044),
(179.42028985507247, 57.391304347826086),
(179.42028985507247, 57.826086956521735),
(179.42028985507247, 58.26086956521739),
(179.42028985507247, 58.69565217391305),
(179.42028985507247, 59.130434782608695),
(179.42028985507247, 59.565217391304344),
(179.42028985507247, 60.0),
(180.36231884057972, 30.0),
(180.36231884057972, 30.434782608695652),
(180.36231884057972, 30.869565217391305),
(180.36231884057972, 31.304347826086957),
(180.36231884057972, 31.73913043478261),
(180.36231884057972, 32.17391304347826),
(180.36231884057972, 32.608695652173914),
(180.36231884057972, 33.04347826086956),
(180.36231884057972, 33.47826086956522),
(180.36231884057972, 33.91304347826087),
(180.36231884057972, 34.34782608695652),
(180.36231884057972, 34.78260869565217),
(180.36231884057972, 35.21739130434783),
(180.36231884057972, 35.65217391304348),
(180.36231884057972, 36.08695652173913),
(180.36231884057972, 36.52173913043478),
(180.36231884057972, 36.95652173913044),
(180.36231884057972, 37.391304347826086),
(180.36231884057972, 37.82608695652174),
(180.36231884057972, 38.26086956521739),
(180.36231884057972, 38.69565217391305),
(180.36231884057972, 39.130434782608695),
(180.36231884057972, 39.565217391304344),
(180.36231884057972, 40.0),
(180.36231884057972, 40.434782608695656),
(180.36231884057972, 40.869565217391305),
(180.36231884057972, 41.30434782608695),
(180.36231884057972, 41.73913043478261),
(180.36231884057972, 42.17391304347826),
(180.36231884057972, 42.608695652173914),
(180.36231884057972, 43.04347826086956),
(180.36231884057972, 43.47826086956522),
(180.36231884057972, 43.91304347826087),
(180.36231884057972, 44.34782608695652),
(180.36231884057972, 44.78260869565217),
(180.36231884057972, 45.21739130434783),
(180.36231884057972, 45.65217391304348),
(180.36231884057972, 46.086956521739125),
(180.36231884057972, 46.52173913043478),
(180.36231884057972, 46.95652173913044),
(180.36231884057972, 47.391304347826086),
(180.36231884057972, 47.826086956521735),
(180.36231884057972, 48.26086956521739),
(180.36231884057972, 48.69565217391305),
(180.36231884057972, 49.130434782608695),
(180.36231884057972, 49.565217391304344),
(180.36231884057972, 50.0),
(180.36231884057972, 50.434782608695656),
(180.36231884057972, 50.869565217391305),
(180.36231884057972, 51.30434782608695),
(180.36231884057972, 51.73913043478261),
(180.36231884057972, 52.173913043478265),
(180.36231884057972, 52.608695652173914),
(180.36231884057972, 53.04347826086956),
(180.36231884057972, 53.47826086956522),
(180.36231884057972, 53.91304347826087),
(180.36231884057972, 54.347826086956516),
(180.36231884057972, 54.78260869565217),
(180.36231884057972, 55.21739130434783),
(180.36231884057972, 55.65217391304348),
(180.36231884057972, 56.086956521739125),
(180.36231884057972, 56.52173913043478),
(180.36231884057972, 56.95652173913044),
(180.36231884057972, 57.391304347826086),
(180.36231884057972, 57.826086956521735),
(180.36231884057972, 58.26086956521739),
(180.36231884057972, 58.69565217391305),
(180.36231884057972, 59.130434782608695),
(180.36231884057972, 59.565217391304344),
(180.36231884057972, 60.0),
(181.30434782608697, 30.0),
(181.30434782608697, 30.434782608695652),
(181.30434782608697, 30.869565217391305),
(181.30434782608697, 31.304347826086957),
(181.30434782608697, 31.73913043478261),
(181.30434782608697, 32.17391304347826),
(181.30434782608697, 32.608695652173914),
(181.30434782608697, 33.04347826086956),
(181.30434782608697, 33.47826086956522),
(181.30434782608697, 33.91304347826087),
(181.30434782608697, 34.34782608695652),
(181.30434782608697, 34.78260869565217),
(181.30434782608697, 35.21739130434783),
(181.30434782608697, 35.65217391304348),
(181.30434782608697, 36.08695652173913),
(181.30434782608697, 36.52173913043478),
(181.30434782608697, 36.95652173913044),
(181.30434782608697, 37.391304347826086),
(181.30434782608697, 37.82608695652174),
(181.30434782608697, 38.26086956521739),
(181.30434782608697, 38.69565217391305),
(181.30434782608697, 39.130434782608695),
(181.30434782608697, 39.565217391304344),
(181.30434782608697, 40.0),
(181.30434782608697, 40.434782608695656),
(181.30434782608697, 40.869565217391305),
(181.30434782608697, 41.30434782608695),
(181.30434782608697, 41.73913043478261),
(181.30434782608697, 42.17391304347826),
(181.30434782608697, 42.608695652173914),
(181.30434782608697, 43.04347826086956),
(181.30434782608697, 43.47826086956522),
(181.30434782608697, 43.91304347826087),
(181.30434782608697, 44.34782608695652),
(181.30434782608697, 44.78260869565217),
(181.30434782608697, 45.21739130434783),
(181.30434782608697, 45.65217391304348),
(181.30434782608697, 46.086956521739125),
(181.30434782608697, 46.52173913043478),
(181.30434782608697, 46.95652173913044),
(181.30434782608697, 47.391304347826086),
(181.30434782608697, 47.826086956521735),
(181.30434782608697, 48.26086956521739),
(181.30434782608697, 48.69565217391305),
(181.30434782608697, 49.130434782608695),
(181.30434782608697, 49.565217391304344),
(181.30434782608697, 50.0),
(181.30434782608697, 50.434782608695656),
(181.30434782608697, 50.869565217391305),
(181.30434782608697, 51.30434782608695),
(181.30434782608697, 51.73913043478261),
(181.30434782608697, 52.173913043478265),
(181.30434782608697, 52.608695652173914),
(181.30434782608697, 53.04347826086956),
(181.30434782608697, 53.47826086956522),
(181.30434782608697, 53.91304347826087),
(181.30434782608697, 54.347826086956516),
(181.30434782608697, 54.78260869565217),
(181.30434782608697, 55.21739130434783),
(181.30434782608697, 55.65217391304348),
(181.30434782608697, 56.086956521739125),
(181.30434782608697, 56.52173913043478),
(181.30434782608697, 56.95652173913044),
(181.30434782608697, 57.391304347826086),
(181.30434782608697, 57.826086956521735),
(181.30434782608697, 58.26086956521739),
(181.30434782608697, 58.69565217391305),
(181.30434782608697, 59.130434782608695),
(181.30434782608697, 59.565217391304344),
(181.30434782608697, 60.0),
(182.2463768115942, 30.0),
(182.2463768115942, 30.434782608695652),
(182.2463768115942, 30.869565217391305),
(182.2463768115942, 31.304347826086957),
(182.2463768115942, 31.73913043478261),
(182.2463768115942, 32.17391304347826),
(182.2463768115942, 32.608695652173914),
(182.2463768115942, 33.04347826086956),
(182.2463768115942, 33.47826086956522),
(182.2463768115942, 33.91304347826087),
(182.2463768115942, 34.34782608695652),
(182.2463768115942, 34.78260869565217),
(182.2463768115942, 35.21739130434783),
(182.2463768115942, 35.65217391304348),
(182.2463768115942, 36.08695652173913),
(182.2463768115942, 36.52173913043478),
(182.2463768115942, 36.95652173913044),
(182.2463768115942, 37.391304347826086),
(182.2463768115942, 37.82608695652174),
(182.2463768115942, 38.26086956521739),
(182.2463768115942, 38.69565217391305),
(182.2463768115942, 39.130434782608695),
(182.2463768115942, 39.565217391304344),
(182.2463768115942, 40.0),
(182.2463768115942, 40.434782608695656),
(182.2463768115942, 40.869565217391305),
(182.2463768115942, 41.30434782608695),
(182.2463768115942, 41.73913043478261),
(182.2463768115942, 42.17391304347826),
(182.2463768115942, 42.608695652173914),
(182.2463768115942, 43.04347826086956),
(182.2463768115942, 43.47826086956522),
(182.2463768115942, 43.91304347826087),
(182.2463768115942, 44.34782608695652),
(182.2463768115942, 44.78260869565217),
(182.2463768115942, 45.21739130434783),
(182.2463768115942, 45.65217391304348),
(182.2463768115942, 46.086956521739125),
(182.2463768115942, 46.52173913043478),
(182.2463768115942, 46.95652173913044),
(182.2463768115942, 47.391304347826086),
(182.2463768115942, 47.826086956521735),
(182.2463768115942, 48.26086956521739),
(182.2463768115942, 48.69565217391305),
(182.2463768115942, 49.130434782608695),
(182.2463768115942, 49.565217391304344),
(182.2463768115942, 50.0),
(182.2463768115942, 50.434782608695656),
(182.2463768115942, 50.869565217391305),
(182.2463768115942, 51.30434782608695),
(182.2463768115942, 51.73913043478261),
(182.2463768115942, 52.173913043478265),
(182.2463768115942, 52.608695652173914),
(182.2463768115942, 53.04347826086956),
(182.2463768115942, 53.47826086956522),
(182.2463768115942, 53.91304347826087),
(182.2463768115942, 54.347826086956516),
(182.2463768115942, 54.78260869565217),
(182.2463768115942, 55.21739130434783),
(182.2463768115942, 55.65217391304348),
(182.2463768115942, 56.086956521739125),
(182.2463768115942, 56.52173913043478),
(182.2463768115942, 56.95652173913044),
(182.2463768115942, 57.391304347826086),
(182.2463768115942, 57.826086956521735),
(182.2463768115942, 58.26086956521739),
(182.2463768115942, 58.69565217391305),
(182.2463768115942, 59.130434782608695),
(182.2463768115942, 59.565217391304344),
(182.2463768115942, 60.0),
(183.18840579710144, 30.0),
(183.18840579710144, 30.434782608695652),
(183.18840579710144, 30.869565217391305),
(183.18840579710144, 31.304347826086957),
(183.18840579710144, 31.73913043478261),
(183.18840579710144, 32.17391304347826),
(183.18840579710144, 32.608695652173914),
(183.18840579710144, 33.04347826086956),
(183.18840579710144, 33.47826086956522),
(183.18840579710144, 33.91304347826087),
(183.18840579710144, 34.34782608695652),
(183.18840579710144, 34.78260869565217),
(183.18840579710144, 35.21739130434783),
(183.18840579710144, 35.65217391304348),
(183.18840579710144, 36.08695652173913),
(183.18840579710144, 36.52173913043478),
(183.18840579710144, 36.95652173913044),
(183.18840579710144, 37.391304347826086),
(183.18840579710144, 37.82608695652174),
(183.18840579710144, 38.26086956521739),
...]
There are 4900
tuples in this list (\(70 \cdot 70\)).
len(list(product(x, y)))
4900
We convert this into a DataFrame with 4900 rows and two columns.
df_art = pd.DataFrame(list(product(x, y)))
df_art
0 | 1 | |
---|---|---|
0 | 170.0 | 30.000000 |
1 | 170.0 | 30.434783 |
2 | 170.0 | 30.869565 |
3 | 170.0 | 31.304348 |
4 | 170.0 | 31.739130 |
... | ... | ... |
4895 | 235.0 | 58.260870 |
4896 | 235.0 | 58.695652 |
4897 | 235.0 | 59.130435 |
4898 | 235.0 | 59.565217 |
4899 | 235.0 | 60.000000 |
4900 rows Ă— 2 columns
We give the columns the same names as our input features.
df_art.columns = cols
df_art
flipper_length_mm | bill_length_mm | |
---|---|---|
0 | 170.0 | 30.000000 |
1 | 170.0 | 30.434783 |
2 | 170.0 | 30.869565 |
3 | 170.0 | 31.304348 |
4 | 170.0 | 31.739130 |
... | ... | ... |
4895 | 235.0 | 58.260870 |
4896 | 235.0 | 58.695652 |
4897 | 235.0 | 59.130435 |
4898 | 235.0 | 59.565217 |
4899 | 235.0 | 60.000000 |
4900 rows Ă— 2 columns
Add a corresponding “pred” column to
df_art
.
We now add a prediction column. (A warning would be raised if we didn’t have the same column names as when we fit the classifier using clf.fit
.)
df_art['pred'] = clf.predict(df_art[cols])
Here is the new DataFrame. For example, our classifier predicts a penguin with flipper length 170 and bill length 30 is an Adelie penguin. (Notice how there is probably no actual penguin with these measurements.)
df_art
flipper_length_mm | bill_length_mm | pred | |
---|---|---|---|
0 | 170.0 | 30.000000 | Adelie |
1 | 170.0 | 30.434783 | Adelie |
2 | 170.0 | 30.869565 | Adelie |
3 | 170.0 | 31.304348 | Adelie |
4 | 170.0 | 31.739130 | Adelie |
... | ... | ... | ... |
4895 | 235.0 | 58.260870 | Gentoo |
4896 | 235.0 | 58.695652 | Gentoo |
4897 | 235.0 | 59.130435 | Gentoo |
4898 | 235.0 | 59.565217 | Gentoo |
4899 | 235.0 | 60.000000 | Gentoo |
4900 rows Ă— 3 columns
Make another Altair scatter plot of the predicted species, this time using
df_art
.
The most important thing to recognize from the following picture is that there are three regions (corresponding to the three classes) and that regions are separated by linear boundaries. This partly explains why LogisticRegression
is defined in the linear_model
library of scikit-learn.
alt.Chart(df_art).mark_circle().encode(
x = alt.X(cols[0], scale = alt.Scale(zero=False)),
y = alt.Y(cols[1], scale = alt.Scale(zero=False)),
color = 'pred:N'
)