Week 1 Wednesday#

Keyboard Shortcut Add new code block below current one: + J (MAC) ctrl (+ shift) + J (WINDOWS & LINUX)

  • Make a 10x3 NumPy array arr of random integers between 0 (inclusive) and 5 (exclusive). Here, we will specify the size keyword argument using a tuple rather than an int. Use seed=100 so we all have the same values.

import numpy as np
rng = np.random.default_rng(seed=100)
arr = rng.integers(0, 5, size=(10,3))
arr
array([[3, 4, 0],
       [2, 0, 1],
       [2, 0, 2],
       [4, 4, 2],
       [2, 3, 4],
       [4, 0, 3],
       [3, 0, 2],
       [4, 3, 1],
       [1, 3, 0],
       [2, 2, 2]])
  • Define a variable col that is equal to the 0-th column of arr

col = arr[:,0]
col
array([3, 2, 2, 4, 2, 4, 3, 4, 1, 2])

What if we had used two sets of square brackets, like what we need if we are using lists of lists? Break the following up into pieces. arr2[:] is getting the entire array (“every row”), and then [0] is getting the top row.

arr[:][0] #arr[:] every row
array([3, 4, 0])
arr[:][2] #index 2 row
array([2, 0, 2])
arr[2,:]
array([2, 0, 2])
  • Create the subarray of arr containing the rows which begin with a 2.

We can see what number each row starts with by using col which we defined above.

col
array([3, 2, 2, 4, 2, 4, 3, 4, 1, 2])

Find where col is equal to 2.

col == 2
array([False,  True,  True, False,  True, False, False, False, False,
        True])

Use Boolean indexing to extract all the rows that begin with 2.

subarr = arr[col == 2,:]
subarr
array([[2, 0, 1],
       [2, 0, 2],
       [2, 3, 4],
       [2, 2, 2]])
arr[:][col == 2] # arr[:] every row
array([[2, 0, 1],
       [2, 0, 2],
       [2, 3, 4],
       [2, 2, 2]])
arr[col == 2]
array([[2, 0, 1],
       [2, 0, 2],
       [2, 3, 4],
       [2, 2, 2]])

More complex example of Boolean indexing#

Define arr as above. We will then create the subarray of arr containing the rows which have at least two 2s using the following strategy.

  • Make a 10x3 Boolean array indicating where arr is equal to 2.

arr == 2
array([[False, False, False],
       [ True, False, False],
       [ True, False,  True],
       [False, False,  True],
       [ True, False, False],
       [False, False, False],
       [False, False,  True],
       [False, False, False],
       [False, False, False],
       [ True,  True,  True]])
(arr == 2).shape
(10, 3)
  • Use the sum method with axis=1 to find how many 2s there are in each row.

help(np.sum)
Help on function sum in module numpy:

sum(a, axis=None, dtype=None, out=None, keepdims=<no value>, initial=<no value>, where=<no value>)
    Sum of array elements over a given axis.
    
    Parameters
    ----------
    a : array_like
        Elements to sum.
    axis : None or int or tuple of ints, optional
        Axis or axes along which a sum is performed.  The default,
        axis=None, will sum all of the elements of the input array.  If
        axis is negative it counts from the last to the first axis.
    
        .. versionadded:: 1.7.0
    
        If axis is a tuple of ints, a sum is performed on all of the axes
        specified in the tuple instead of a single axis or all the axes as
        before.
    dtype : dtype, optional
        The type of the returned array and of the accumulator in which the
        elements are summed.  The dtype of `a` is used by default unless `a`
        has an integer dtype of less precision than the default platform
        integer.  In that case, if `a` is signed then the platform integer
        is used while if `a` is unsigned then an unsigned integer of the
        same precision as the platform integer is used.
    out : ndarray, optional
        Alternative output array in which to place the result. It must have
        the same shape as the expected output, but the type of the output
        values will be cast if necessary.
    keepdims : bool, optional
        If this is set to True, the axes which are reduced are left
        in the result as dimensions with size one. With this option,
        the result will broadcast correctly against the input array.
    
        If the default value is passed, then `keepdims` will not be
        passed through to the `sum` method of sub-classes of
        `ndarray`, however any non-default value will be.  If the
        sub-class' method does not implement `keepdims` any
        exceptions will be raised.
    initial : scalar, optional
        Starting value for the sum. See `~numpy.ufunc.reduce` for details.
    
        .. versionadded:: 1.15.0
    
    where : array_like of bool, optional
        Elements to include in the sum. See `~numpy.ufunc.reduce` for details.
    
        .. versionadded:: 1.17.0
    
    Returns
    -------
    sum_along_axis : ndarray
        An array with the same shape as `a`, with the specified
        axis removed.   If `a` is a 0-d array, or if `axis` is None, a scalar
        is returned.  If an output array is specified, a reference to
        `out` is returned.
    
    See Also
    --------
    ndarray.sum : Equivalent method.
    
    add.reduce : Equivalent functionality of `add`.
    
    cumsum : Cumulative sum of array elements.
    
    trapz : Integration of array values using the composite trapezoidal rule.
    
    mean, average
    
    Notes
    -----
    Arithmetic is modular when using integer types, and no error is
    raised on overflow.
    
    The sum of an empty array is the neutral element 0:
    
    >>> np.sum([])
    0.0
    
    For floating point numbers the numerical precision of sum (and
    ``np.add.reduce``) is in general limited by directly adding each number
    individually to the result causing rounding errors in every step.
    However, often numpy will use a  numerically better approach (partial
    pairwise summation) leading to improved precision in many use-cases.
    This improved precision is always provided when no ``axis`` is given.
    When ``axis`` is given, it will depend on which axis is summed.
    Technically, to provide the best speed possible, the improved precision
    is only used when the summation is along the fast axis in memory.
    Note that the exact precision may vary depending on other parameters.
    In contrast to NumPy, Python's ``math.fsum`` function uses a slower but
    more precise approach to summation.
    Especially when summing a large number of lower precision floating point
    numbers, such as ``float32``, numerical errors can become significant.
    In such cases it can be advisable to use `dtype="float64"` to use a higher
    precision for the output.
    
    Examples
    --------
    >>> np.sum([0.5, 1.5])
    2.0
    >>> np.sum([0.5, 0.7, 0.2, 1.5], dtype=np.int32)
    1
    >>> np.sum([[0, 1], [0, 5]])
    6
    >>> np.sum([[0, 1], [0, 5]], axis=0)
    array([0, 6])
    >>> np.sum([[0, 1], [0, 5]], axis=1)
    array([1, 5])
    >>> np.sum([[0, 1], [np.nan, 5]], where=[False, True], axis=1)
    array([1., 5.])
    
    If the accumulator is too small, overflow occurs:
    
    >>> np.ones(128, dtype=np.int8).sum(dtype=np.int8)
    -128
    
    You can also start the sum with a value other than zero:
    
    >>> np.sum([10], initial=5)
    15
ba = (arr == 2)
np.sum(ba, axis = 1) # axis = 1 means taking sum of all elements (cols) in each row
array([0, 1, 2, 1, 1, 0, 1, 0, 0, 3])
ba.sum(axis = 1)
array([0, 1, 2, 1, 1, 0, 1, 0, 0, 3])

Axis along which a sum is performed. Axis = 0 means taking sum along the row

np.sum(ba, axis = 0) 
array([4, 1, 4])
np.sum(ba, axis =2) # we do not have the third dimension
---------------------------------------------------------------------------
AxisError                                 Traceback (most recent call last)
Cell In [18], line 1
----> 1 np.sum(ba, axis =2)

File <__array_function__ internals>:180, in sum(*args, **kwargs)

File /shared-libs/python3.9/py/lib/python3.9/site-packages/numpy/core/fromnumeric.py:2298, in sum(a, axis, dtype, out, keepdims, initial, where)
   2295         return out
   2296     return res
-> 2298 return _wrapreduction(a, np.add, 'sum', axis, dtype, out, keepdims=keepdims,
   2299                       initial=initial, where=where)

File /shared-libs/python3.9/py/lib/python3.9/site-packages/numpy/core/fromnumeric.py:86, in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
     83         else:
     84             return reduction(axis=axis, out=out, **passkwargs)
---> 86 return ufunc.reduce(obj, axis, dtype, out, **passkwargs)

AxisError: axis 2 is out of bounds for array of dimension 2
  • Use Boolean indexing to create the subarray of arr containing only the rows which have at least two 2s.

num2 = ba.sum(axis = 1)
num2 >= 2
array([False, False,  True, False, False, False, False, False, False,
        True])
num2 > 1
array([False, False,  True, False, False, False, False, False, False,
        True])

Keep the rows corresponding to the True values, that is, we keep the rows that have at least two 2s.

subar2 = arr[num2 > 1]
subar2
array([[2, 0, 2],
       [2, 2, 2]])
arr
array([[3, 4, 0],
       [2, 0, 1],
       [2, 0, 2],
       [4, 4, 2],
       [2, 3, 4],
       [4, 0, 3],
       [3, 0, 2],
       [4, 3, 1],
       [1, 3, 0],
       [2, 2, 2]])

We’ve seen that we can use Boolean arrays to keep certain rows. We can also use a list of indices. Here we get the row at index 0, 5 and 9.

arr[[0, 5, 9]] #arr[[0,5,6],:] #arr[:][[0, 5, 9]]
array([[3, 4, 0],
       [4, 0, 3],
       [2, 2, 2]])
arr[[0,5,6],:]
array([[3, 4, 0],
       [4, 0, 3],
       [3, 0, 2]])
arr[0,5,9]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In [27], line 1
----> 1 arr[0,5,6]

IndexError: too many indices for array: array is 2-dimensional, but 3 were indexed

Get the row at index 9 repeated three times.

arr[[9, 9, 9]] # repeat last row three times
array([[2, 2, 2],
       [2, 2, 2],
       [2, 2, 2]])
arr
array([[3, 4, 0],
       [2, 0, 1],
       [2, 0, 2],
       [4, 4, 2],
       [2, 3, 4],
       [4, 0, 3],
       [3, 0, 2],
       [4, 3, 1],
       [1, 3, 0],
       [2, 2, 2]])

Why did we use the double square brackets above? The outer square brackets are for indexing. The inner square brackets are for a list. Why do we need the list inside? Here is what happens if we omit the inner square brackets. The 0 gets us to the row at index 0, and the 1 gets to the element at index 1 in that row. (Remember that numbering in Python starts at 0.)

arr[0,1]
4

Get the column at index 0 repeated twice.

arr[:,[0, 0]]
array([[3, 3],
       [2, 2],
       [2, 2],
       [4, 4],
       [2, 2],
       [4, 4],
       [3, 3],
       [4, 4],
       [1, 1],
       [2, 2]])
type([1,2,3,4,5])
list
arrb = np.array([1,2,3,4,5])
type(arrb)
numpy.ndarray

Another example of the axis keyword argument#

  • What is the result of evaluating the following?

  • arr.max()

  • arr.max(axis=0)

  • arr.max(axis=1)

help(np.max)
Help on function amax in module numpy:

amax(a, axis=None, out=None, keepdims=<no value>, initial=<no value>, where=<no value>)
    Return the maximum of an array or maximum along an axis.
    
    Parameters
    ----------
    a : array_like
        Input data.
    axis : None or int or tuple of ints, optional
        Axis or axes along which to operate.  By default, flattened input is
        used.
    
        .. versionadded:: 1.7.0
    
        If this is a tuple of ints, the maximum is selected over multiple axes,
        instead of a single axis or all the axes as before.
    out : ndarray, optional
        Alternative output array in which to place the result.  Must
        be of the same shape and buffer length as the expected output.
        See :ref:`ufuncs-output-type` for more details.
    
    keepdims : bool, optional
        If this is set to True, the axes which are reduced are left
        in the result as dimensions with size one. With this option,
        the result will broadcast correctly against the input array.
    
        If the default value is passed, then `keepdims` will not be
        passed through to the `amax` method of sub-classes of
        `ndarray`, however any non-default value will be.  If the
        sub-class' method does not implement `keepdims` any
        exceptions will be raised.
    
    initial : scalar, optional
        The minimum value of an output element. Must be present to allow
        computation on empty slice. See `~numpy.ufunc.reduce` for details.
    
        .. versionadded:: 1.15.0
    
    where : array_like of bool, optional
        Elements to compare for the maximum. See `~numpy.ufunc.reduce`
        for details.
    
        .. versionadded:: 1.17.0
    
    Returns
    -------
    amax : ndarray or scalar
        Maximum of `a`. If `axis` is None, the result is a scalar value.
        If `axis` is given, the result is an array of dimension
        ``a.ndim - 1``.
    
    See Also
    --------
    amin :
        The minimum value of an array along a given axis, propagating any NaNs.
    nanmax :
        The maximum value of an array along a given axis, ignoring any NaNs.
    maximum :
        Element-wise maximum of two arrays, propagating any NaNs.
    fmax :
        Element-wise maximum of two arrays, ignoring any NaNs.
    argmax :
        Return the indices of the maximum values.
    
    nanmin, minimum, fmin
    
    Notes
    -----
    NaN values are propagated, that is if at least one item is NaN, the
    corresponding max value will be NaN as well. To ignore NaN values
    (MATLAB behavior), please use nanmax.
    
    Don't use `amax` for element-wise comparison of 2 arrays; when
    ``a.shape[0]`` is 2, ``maximum(a[0], a[1])`` is faster than
    ``amax(a, axis=0)``.
    
    Examples
    --------
    >>> a = np.arange(4).reshape((2,2))
    >>> a
    array([[0, 1],
           [2, 3]])
    >>> np.amax(a)           # Maximum of the flattened array
    3
    >>> np.amax(a, axis=0)   # Maxima along the first axis
    array([2, 3])
    >>> np.amax(a, axis=1)   # Maxima along the second axis
    array([1, 3])
    >>> np.amax(a, where=[False, True], initial=-1, axis=0)
    array([-1,  3])
    >>> b = np.arange(5, dtype=float)
    >>> b[2] = np.NaN
    >>> np.amax(b)
    nan
    >>> np.amax(b, where=~np.isnan(b), initial=-1)
    4.0
    >>> np.nanmax(b)
    4.0
    
    You can use an initial value to compute the maximum of an empty slice, or
    to initialize it to a different value:
    
    >>> np.amax([[-50], [10]], axis=-1, initial=0)
    array([ 0, 10])
    
    Notice that the initial value is used as one of the elements for which the
    maximum is determined, unlike for the default argument Python's max
    function, which is only used for empty iterables.
    
    >>> np.amax([5], initial=6)
    6
    >>> max([5], default=6)
    5
arr
array([[3, 4, 0],
       [2, 0, 1],
       [2, 0, 2],
       [4, 4, 2],
       [2, 3, 4],
       [4, 0, 3],
       [3, 0, 2],
       [4, 3, 1],
       [1, 3, 0],
       [2, 2, 2]])

If we use max() without any axis argument, it returns the overall maximum in the array. This max (like sum above) is an example of a method. Methods are types of functions in Python, but they’re functions which are attached to an object. This max method and the sum method above are attached to NumPy array objects.

arr.max()
4

Axis denotes along which to operate. (Why does “rows” correspond to axis=0? For example, when we say 10x3, the 10 refers to rows.)

arr.max(axis = 0)
array([4, 4, 4])
arr.max(axis = 1)
array([4, 2, 2, 4, 4, 4, 3, 4, 3, 2])
  • In how many rows of arr is the maximum entry in that row 2 or less?

Find the maximum in each row and create a Boolean array, indicating if the maximum is less than or equal to 2 or not.

maxrow = arr.max(axis = 1)
maxrow <= 2
array([False,  True,  True, False, False, False, False, False, False,
        True])
(maxrow <= 2).sum()
3

Functions in Python#

  • Write a function getsub which takes two inputs, a NumPy arr and an integer n, and as output returns the subarray of arr containing all rows with at least two entries equal to n.