Week 1 Monday#
The Friday file has been posted and some explanations were added.
HW 1 distributed today.
I’ll have office hours on Mondays 2-3pm RH540R
Boolean arrays and Boolean indexing#
import numpy as np
# Instantiate a random number generator object
rng = np.random.default_rng()
help(np.random.default_rng)
Help on built-in function default_rng in module numpy.random._generator:
default_rng(...)
Construct a new Generator with the default BitGenerator (PCG64).
Parameters
----------
seed : {None, int, array_like[ints], SeedSequence, BitGenerator, Generator}, optional
A seed to initialize the `BitGenerator`. If None, then fresh,
unpredictable entropy will be pulled from the OS. If an ``int`` or
``array_like[ints]`` is passed, then it will be passed to
`SeedSequence` to derive the initial `BitGenerator` state. One may also
pass in a `SeedSequence` instance.
Additionally, when passed a `BitGenerator`, it will be wrapped by
`Generator`. If passed a `Generator`, it will be returned unaltered.
Returns
-------
Generator
The initialized generator object.
Notes
-----
If ``seed`` is not a `BitGenerator` or a `Generator`, a new `BitGenerator`
is instantiated. This function does not manage a default global instance.
Examples
--------
``default_rng`` is the recommended constructor for the random number class
``Generator``. Here are several ways we can construct a random
number generator using ``default_rng`` and the ``Generator`` class.
Here we use ``default_rng`` to generate a random float:
>>> import numpy as np
>>> rng = np.random.default_rng(12345)
>>> print(rng)
Generator(PCG64)
>>> rfloat = rng.random()
>>> rfloat
0.22733602246716966
>>> type(rfloat)
<class 'float'>
Here we use ``default_rng`` to generate 3 random integers between 0
(inclusive) and 10 (exclusive):
>>> import numpy as np
>>> rng = np.random.default_rng(12345)
>>> rints = rng.integers(low=0, high=10, size=3)
>>> rints
array([6, 2, 7])
>>> type(rints[0])
<class 'numpy.int64'>
Here we specify a seed so that we have reproducible results:
>>> import numpy as np
>>> rng = np.random.default_rng(seed=42)
>>> print(rng)
Generator(PCG64)
>>> arr1 = rng.random((3, 3))
>>> arr1
array([[0.77395605, 0.43887844, 0.85859792],
[0.69736803, 0.09417735, 0.97562235],
[0.7611397 , 0.78606431, 0.12811363]])
If we exit and restart our Python interpreter, we'll see that we
generate the same random numbers again:
>>> import numpy as np
>>> rng = np.random.default_rng(seed=42)
>>> arr2 = rng.random((3, 3))
>>> arr2
array([[0.77395605, 0.43887844, 0.85859792],
[0.69736803, 0.09417735, 0.97562235],
[0.7611397 , 0.78606431, 0.12811363]])
Here we use rng
to make a length 10 NumPy array of random integers between 0 (inclusive) and 5 (exclusive).
rng = np.random.default_rng()
arr = rng.integers(0,5,size = 10)
arr
array([4, 4, 1, 3, 2, 4, 4, 4, 3, 3])
arr
array([[3, 4, 0],
[2, 0, 1]])
When we do not specify a seed, we get different results every time.
arr = rng.integers(0,5,size = 10)
arr
array([2, 3, 3, 1, 2, 3, 2, 2, 0, 2])
How can we guarantee consistent (or reproducible) random integers?
# Generator(PCG64) at SOME_MEMORY_ADDRESS
rng
Generator(PCG64) at 0x7F00F834BBA0
type(rng)
numpy.random._generator.Generator
When we use a fixed seed
keyword argument, we get the same output every time.
rng = np.random.default_rng(seed=50)
arr = rng.integers(0,5,size = 10)
arr
array([3, 3, 3, 4, 4, 2, 1, 4, 4, 1])
If we change to a different seed
, we get a new output.
rng = np.random.default_rng(seed=110)
arr = rng.integers(0,5,size = 10)
arr
array([2, 4, 3, 2, 1, 3, 3, 2, 0, 1])
rng = np.random.default_rng(seed=110)
arr = rng.integers(0,5,size = 5)
arr
array([2, 4, 3, 2, 1])
If we run the same code again, we will get new integers.
arr = rng.integers(0,5,size = 5)
arr
array([3, 3, 2, 0, 1])
Make a Boolean array indicating where the array is equal to 2.
To get consistent results, it helps to put all of these lines into the same cell.
rng = np.random.default_rng(seed=110)
arr = rng.integers(0,5,size = 10)
arr
array([2, 4, 3, 2, 1, 3, 3, 2, 0, 1])
Be sure you understand how these Boolean values correspond to the values in the array. Also, notice that we are using two equals signs, not one, to compare for elementwise equality. (One equals sign is for assignment.)
arr == 2
array([ True, False, False, True, False, False, False, True, False,
False])
Count how many of these entries are equal to 2.
Because True
is treated like 1
and False
is treated like 0
, we can count the number of True
values (in this case, that is the number of 2
values in the original array) by using sum
.
Here we use the built-in Python function sum
.
# use sum (built-in python function)
sum(arr == 2)
3
type(arr)
numpy.ndarray
Here we use the NumPy array method sum
.
np.sum(arr == 2)
3
help(np.sum)
Help on function sum in module numpy:
sum(a, axis=None, dtype=None, out=None, keepdims=<no value>, initial=<no value>, where=<no value>)
Sum of array elements over a given axis.
Parameters
----------
a : array_like
Elements to sum.
axis : None or int or tuple of ints, optional
Axis or axes along which a sum is performed. The default,
axis=None, will sum all of the elements of the input array. If
axis is negative it counts from the last to the first axis.
.. versionadded:: 1.7.0
If axis is a tuple of ints, a sum is performed on all of the axes
specified in the tuple instead of a single axis or all the axes as
before.
dtype : dtype, optional
The type of the returned array and of the accumulator in which the
elements are summed. The dtype of `a` is used by default unless `a`
has an integer dtype of less precision than the default platform
integer. In that case, if `a` is signed then the platform integer
is used while if `a` is unsigned then an unsigned integer of the
same precision as the platform integer is used.
out : ndarray, optional
Alternative output array in which to place the result. It must have
the same shape as the expected output, but the type of the output
values will be cast if necessary.
keepdims : bool, optional
If this is set to True, the axes which are reduced are left
in the result as dimensions with size one. With this option,
the result will broadcast correctly against the input array.
If the default value is passed, then `keepdims` will not be
passed through to the `sum` method of sub-classes of
`ndarray`, however any non-default value will be. If the
sub-class' method does not implement `keepdims` any
exceptions will be raised.
initial : scalar, optional
Starting value for the sum. See `~numpy.ufunc.reduce` for details.
.. versionadded:: 1.15.0
where : array_like of bool, optional
Elements to include in the sum. See `~numpy.ufunc.reduce` for details.
.. versionadded:: 1.17.0
Returns
-------
sum_along_axis : ndarray
An array with the same shape as `a`, with the specified
axis removed. If `a` is a 0-d array, or if `axis` is None, a scalar
is returned. If an output array is specified, a reference to
`out` is returned.
See Also
--------
ndarray.sum : Equivalent method.
add.reduce : Equivalent functionality of `add`.
cumsum : Cumulative sum of array elements.
trapz : Integration of array values using the composite trapezoidal rule.
mean, average
Notes
-----
Arithmetic is modular when using integer types, and no error is
raised on overflow.
The sum of an empty array is the neutral element 0:
>>> np.sum([])
0.0
For floating point numbers the numerical precision of sum (and
``np.add.reduce``) is in general limited by directly adding each number
individually to the result causing rounding errors in every step.
However, often numpy will use a numerically better approach (partial
pairwise summation) leading to improved precision in many use-cases.
This improved precision is always provided when no ``axis`` is given.
When ``axis`` is given, it will depend on which axis is summed.
Technically, to provide the best speed possible, the improved precision
is only used when the summation is along the fast axis in memory.
Note that the exact precision may vary depending on other parameters.
In contrast to NumPy, Python's ``math.fsum`` function uses a slower but
more precise approach to summation.
Especially when summing a large number of lower precision floating point
numbers, such as ``float32``, numerical errors can become significant.
In such cases it can be advisable to use `dtype="float64"` to use a higher
precision for the output.
Examples
--------
>>> np.sum([0.5, 1.5])
2.0
>>> np.sum([0.5, 0.7, 0.2, 1.5], dtype=np.int32)
1
>>> np.sum([[0, 1], [0, 5]])
6
>>> np.sum([[0, 1], [0, 5]], axis=0)
array([0, 6])
>>> np.sum([[0, 1], [0, 5]], axis=1)
array([1, 5])
>>> np.sum([[0, 1], [np.nan, 5]], where=[False, True], axis=1)
array([1., 5.])
If the accumulator is too small, overflow occurs:
>>> np.ones(128, dtype=np.int8).sum(dtype=np.int8)
-128
You can also start the sum with a value other than zero:
>>> np.sum([10], initial=5)
15
Another way to compute the sum of the elements in a NumPy array
(arr == 2).sum()
3
Sometimes it is more elegant to save the intermediate values along the way, rather than copy-pasting. Here we save the Boolean array with the variable name ba
.
ba = (arr == 2) #boolean indexing for entries = 2
ba.sum()
3
Notice that ba
really is a NumPy array.
type(ba)
numpy.ndarray
Make a Boolean array indicating where
arr
is strictly greater than 1 and less than or equal to 3.
Here we check where it’s strictly greater than 1
.
arr > 1
array([ True, True, True, True, False, True, True, True, False,
False])
Here we check where it’s less than or equal to 3
.
arr <= 3
array([ True, False, True, True, True, True, True, True, True,
True])
We now want to check where both are satisfied. Python gets confused because we do not have parentheses.
arr > 1 & arr <= 3
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In [33], line 1
----> 1 arr > 1 & arr <= 3
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
(arr > 1) and (arr <= 3)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In [38], line 1
----> 1 (arr > 1) and (arr <= 3)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Thanks to Katie finding the Python Operators. It mentions that for logical operators we can use
and
to return true id both statements are true, such as:
x = 5
(x > 3) and (x < 10)
True
While for bitwise operators, we use
&
, such as:
ba2 = (arr > 1) & (arr <= 3)
ba2
array([ True, False, True, True, False, True, True, True, False,
False])
Using Boolean indexing, produce the subarray of
arr
containing the values which are strictly greater than 1 and less than or equal to 3.
arr
array([2, 4, 3, 2, 1, 3, 3, 2, 0, 1])
arr[ba2]
array([2, 3, 2, 3, 3, 2])
Make a 10x3 NumPy array
arr2
of random integers between 0 (inclusive) and 5 (exclusive). Here, we will specify thesize
keyword argument using atuple
rather than anint
. Useseed=100
so we all have the same values.
rng = np.random.default_rng(seed = 100)
arr2 = rng.integers(0,5, size = (10,3))
arr2
array([[3, 4, 0],
[2, 0, 1],
[2, 0, 2],
[4, 4, 2],
[2, 3, 4],
[4, 0, 3],
[3, 0, 2],
[4, 3, 1],
[1, 3, 0],
[2, 2, 2]])
type(arr2)
numpy.ndarray
arr2.shape
(10, 3)
Define a variable
col
that is equal to the 0-th column ofarr2
.
(Let’s try to consistently start counting at 0 in this class)
arr2
array([[3, 4, 0],
[2, 0, 1],
[2, 0, 2],
[4, 4, 2],
[2, 3, 4],
[4, 0, 3],
[3, 0, 2],
[4, 3, 1],
[1, 3, 0],
[2, 2, 2]])
col = arr2[:,0]
col
array([3, 2, 2, 4, 2, 4, 3, 4, 1, 2])
# get row 1
row1 = arr2[1,:]
row1
array([2, 0, 1])