Week 1 Monday#

  • The Friday file has been posted and some explanations were added.

  • HW 1 distributed today.

  • I’ll have office hours on Mondays 2-3pm RH540R

Boolean arrays and Boolean indexing#

import numpy as np
# Instantiate a random number generator object
rng = np.random.default_rng()
help(np.random.default_rng)
Help on built-in function default_rng in module numpy.random._generator:

default_rng(...)
    Construct a new Generator with the default BitGenerator (PCG64).
    
    Parameters
    ----------
    seed : {None, int, array_like[ints], SeedSequence, BitGenerator, Generator}, optional
        A seed to initialize the `BitGenerator`. If None, then fresh,
        unpredictable entropy will be pulled from the OS. If an ``int`` or
        ``array_like[ints]`` is passed, then it will be passed to
        `SeedSequence` to derive the initial `BitGenerator` state. One may also
        pass in a `SeedSequence` instance.
        Additionally, when passed a `BitGenerator`, it will be wrapped by
        `Generator`. If passed a `Generator`, it will be returned unaltered.
    
    Returns
    -------
    Generator
        The initialized generator object.
    
    Notes
    -----
    If ``seed`` is not a `BitGenerator` or a `Generator`, a new `BitGenerator`
    is instantiated. This function does not manage a default global instance.
    
    Examples
    --------
    ``default_rng`` is the recommended constructor for the random number class
    ``Generator``. Here are several ways we can construct a random 
    number generator using ``default_rng`` and the ``Generator`` class. 
    
    Here we use ``default_rng`` to generate a random float:
    
    >>> import numpy as np
    >>> rng = np.random.default_rng(12345)
    >>> print(rng)
    Generator(PCG64)
    >>> rfloat = rng.random()
    >>> rfloat
    0.22733602246716966
    >>> type(rfloat)
    <class 'float'>
     
    Here we use ``default_rng`` to generate 3 random integers between 0 
    (inclusive) and 10 (exclusive):
        
    >>> import numpy as np
    >>> rng = np.random.default_rng(12345)
    >>> rints = rng.integers(low=0, high=10, size=3)
    >>> rints
    array([6, 2, 7])
    >>> type(rints[0])
    <class 'numpy.int64'>
    
    Here we specify a seed so that we have reproducible results:
    
    >>> import numpy as np
    >>> rng = np.random.default_rng(seed=42)
    >>> print(rng)
    Generator(PCG64)
    >>> arr1 = rng.random((3, 3))
    >>> arr1
    array([[0.77395605, 0.43887844, 0.85859792],
           [0.69736803, 0.09417735, 0.97562235],
           [0.7611397 , 0.78606431, 0.12811363]])
    
    If we exit and restart our Python interpreter, we'll see that we
    generate the same random numbers again:
    
    >>> import numpy as np
    >>> rng = np.random.default_rng(seed=42)
    >>> arr2 = rng.random((3, 3))
    >>> arr2
    array([[0.77395605, 0.43887844, 0.85859792],
           [0.69736803, 0.09417735, 0.97562235],
           [0.7611397 , 0.78606431, 0.12811363]])

Here we use rng to make a length 10 NumPy array of random integers between 0 (inclusive) and 5 (exclusive).

rng = np.random.default_rng()
arr = rng.integers(0,5,size = 10)
arr
array([4, 4, 1, 3, 2, 4, 4, 4, 3, 3])
arr
array([[3, 4, 0],
       [2, 0, 1]])

When we do not specify a seed, we get different results every time.

arr = rng.integers(0,5,size = 10)
arr
array([2, 3, 3, 1, 2, 3, 2, 2, 0, 2])
  • How can we guarantee consistent (or reproducible) random integers?

# Generator(PCG64) at SOME_MEMORY_ADDRESS
rng 
Generator(PCG64) at 0x7F00F834BBA0
type(rng)
numpy.random._generator.Generator

When we use a fixed seed keyword argument, we get the same output every time.

rng = np.random.default_rng(seed=50)
arr = rng.integers(0,5,size = 10)
arr
array([3, 3, 3, 4, 4, 2, 1, 4, 4, 1])

If we change to a different seed, we get a new output.

rng = np.random.default_rng(seed=110)
arr = rng.integers(0,5,size = 10)
arr
array([2, 4, 3, 2, 1, 3, 3, 2, 0, 1])
rng = np.random.default_rng(seed=110)
arr = rng.integers(0,5,size = 5)
arr
array([2, 4, 3, 2, 1])

If we run the same code again, we will get new integers.

arr = rng.integers(0,5,size = 5)
arr
array([3, 3, 2, 0, 1])
  • Make a Boolean array indicating where the array is equal to 2.

To get consistent results, it helps to put all of these lines into the same cell.

rng = np.random.default_rng(seed=110)
arr = rng.integers(0,5,size = 10)
arr
array([2, 4, 3, 2, 1, 3, 3, 2, 0, 1])

Be sure you understand how these Boolean values correspond to the values in the array. Also, notice that we are using two equals signs, not one, to compare for elementwise equality. (One equals sign is for assignment.)

arr == 2
array([ True, False, False,  True, False, False, False,  True, False,
       False])
  • Count how many of these entries are equal to 2.

Because True is treated like 1 and False is treated like 0, we can count the number of True values (in this case, that is the number of 2 values in the original array) by using sum.

Here we use the built-in Python function sum.

# use sum (built-in python function)
sum(arr == 2)
3
type(arr)
numpy.ndarray

Here we use the NumPy array method sum.

np.sum(arr == 2)
3
help(np.sum)
Help on function sum in module numpy:

sum(a, axis=None, dtype=None, out=None, keepdims=<no value>, initial=<no value>, where=<no value>)
    Sum of array elements over a given axis.
    
    Parameters
    ----------
    a : array_like
        Elements to sum.
    axis : None or int or tuple of ints, optional
        Axis or axes along which a sum is performed.  The default,
        axis=None, will sum all of the elements of the input array.  If
        axis is negative it counts from the last to the first axis.
    
        .. versionadded:: 1.7.0
    
        If axis is a tuple of ints, a sum is performed on all of the axes
        specified in the tuple instead of a single axis or all the axes as
        before.
    dtype : dtype, optional
        The type of the returned array and of the accumulator in which the
        elements are summed.  The dtype of `a` is used by default unless `a`
        has an integer dtype of less precision than the default platform
        integer.  In that case, if `a` is signed then the platform integer
        is used while if `a` is unsigned then an unsigned integer of the
        same precision as the platform integer is used.
    out : ndarray, optional
        Alternative output array in which to place the result. It must have
        the same shape as the expected output, but the type of the output
        values will be cast if necessary.
    keepdims : bool, optional
        If this is set to True, the axes which are reduced are left
        in the result as dimensions with size one. With this option,
        the result will broadcast correctly against the input array.
    
        If the default value is passed, then `keepdims` will not be
        passed through to the `sum` method of sub-classes of
        `ndarray`, however any non-default value will be.  If the
        sub-class' method does not implement `keepdims` any
        exceptions will be raised.
    initial : scalar, optional
        Starting value for the sum. See `~numpy.ufunc.reduce` for details.
    
        .. versionadded:: 1.15.0
    
    where : array_like of bool, optional
        Elements to include in the sum. See `~numpy.ufunc.reduce` for details.
    
        .. versionadded:: 1.17.0
    
    Returns
    -------
    sum_along_axis : ndarray
        An array with the same shape as `a`, with the specified
        axis removed.   If `a` is a 0-d array, or if `axis` is None, a scalar
        is returned.  If an output array is specified, a reference to
        `out` is returned.
    
    See Also
    --------
    ndarray.sum : Equivalent method.
    
    add.reduce : Equivalent functionality of `add`.
    
    cumsum : Cumulative sum of array elements.
    
    trapz : Integration of array values using the composite trapezoidal rule.
    
    mean, average
    
    Notes
    -----
    Arithmetic is modular when using integer types, and no error is
    raised on overflow.
    
    The sum of an empty array is the neutral element 0:
    
    >>> np.sum([])
    0.0
    
    For floating point numbers the numerical precision of sum (and
    ``np.add.reduce``) is in general limited by directly adding each number
    individually to the result causing rounding errors in every step.
    However, often numpy will use a  numerically better approach (partial
    pairwise summation) leading to improved precision in many use-cases.
    This improved precision is always provided when no ``axis`` is given.
    When ``axis`` is given, it will depend on which axis is summed.
    Technically, to provide the best speed possible, the improved precision
    is only used when the summation is along the fast axis in memory.
    Note that the exact precision may vary depending on other parameters.
    In contrast to NumPy, Python's ``math.fsum`` function uses a slower but
    more precise approach to summation.
    Especially when summing a large number of lower precision floating point
    numbers, such as ``float32``, numerical errors can become significant.
    In such cases it can be advisable to use `dtype="float64"` to use a higher
    precision for the output.
    
    Examples
    --------
    >>> np.sum([0.5, 1.5])
    2.0
    >>> np.sum([0.5, 0.7, 0.2, 1.5], dtype=np.int32)
    1
    >>> np.sum([[0, 1], [0, 5]])
    6
    >>> np.sum([[0, 1], [0, 5]], axis=0)
    array([0, 6])
    >>> np.sum([[0, 1], [0, 5]], axis=1)
    array([1, 5])
    >>> np.sum([[0, 1], [np.nan, 5]], where=[False, True], axis=1)
    array([1., 5.])
    
    If the accumulator is too small, overflow occurs:
    
    >>> np.ones(128, dtype=np.int8).sum(dtype=np.int8)
    -128
    
    You can also start the sum with a value other than zero:
    
    >>> np.sum([10], initial=5)
    15

Another way to compute the sum of the elements in a NumPy array

(arr == 2).sum() 
3

Sometimes it is more elegant to save the intermediate values along the way, rather than copy-pasting. Here we save the Boolean array with the variable name ba.

ba = (arr == 2) #boolean indexing for entries = 2
ba.sum()
3

Notice that ba really is a NumPy array.

type(ba)
numpy.ndarray
  • Make a Boolean array indicating where arr is strictly greater than 1 and less than or equal to 3.

Here we check where it’s strictly greater than 1.

arr > 1
array([ True,  True,  True,  True, False,  True,  True,  True, False,
       False])

Here we check where it’s less than or equal to 3.

arr <= 3
array([ True, False,  True,  True,  True,  True,  True,  True,  True,
        True])

We now want to check where both are satisfied. Python gets confused because we do not have parentheses.

arr > 1 & arr <= 3
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [33], line 1
----> 1 arr > 1 & arr <= 3

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
(arr > 1) and (arr <= 3)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [38], line 1
----> 1 (arr > 1) and (arr <= 3)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
  • Thanks to Katie finding the Python Operators. It mentions that for logical operators we can use and to return true id both statements are true, such as:

x = 5

(x > 3) and (x < 10)
True
  • While for bitwise operators, we use &, such as:

ba2 = (arr > 1) & (arr <= 3)
ba2
array([ True, False,  True,  True, False,  True,  True,  True, False,
       False])
  • Using Boolean indexing, produce the subarray of arr containing the values which are strictly greater than 1 and less than or equal to 3.

arr
array([2, 4, 3, 2, 1, 3, 3, 2, 0, 1])
arr[ba2]
array([2, 3, 2, 3, 3, 2])
  • Make a 10x3 NumPy array arr2 of random integers between 0 (inclusive) and 5 (exclusive). Here, we will specify the size keyword argument using a tuple rather than an int. Use seed=100 so we all have the same values.

rng = np.random.default_rng(seed = 100)
arr2 = rng.integers(0,5, size = (10,3))
arr2
array([[3, 4, 0],
       [2, 0, 1],
       [2, 0, 2],
       [4, 4, 2],
       [2, 3, 4],
       [4, 0, 3],
       [3, 0, 2],
       [4, 3, 1],
       [1, 3, 0],
       [2, 2, 2]])
type(arr2)
numpy.ndarray
arr2.shape
(10, 3)
  • Define a variable col that is equal to the 0-th column of arr2.

(Let’s try to consistently start counting at 0 in this class)

arr2
array([[3, 4, 0],
       [2, 0, 1],
       [2, 0, 2],
       [4, 4, 2],
       [2, 3, 4],
       [4, 0, 3],
       [3, 0, 2],
       [4, 3, 1],
       [1, 3, 0],
       [2, 2, 2]])
col = arr2[:,0]
col
array([3, 2, 2, 4, 2, 4, 3, 4, 1, 2])
# get row 1
row1 = arr2[1,:]
row1
array([2, 0, 1])