Week 1 Wednesday#
Keyboard Shortcut
Add new code block below current one: ⌘ + J
(MAC) ctrl (+ shift) + J
(WINDOWS & LINUX)
Make a 10x3 NumPy array arr of random integers between 0 (inclusive) and 5 (exclusive). Here, we will specify the size keyword argument using a tuple rather than an int. Use seed=100 so we all have the same values.
import numpy as np
rng = np.random.default_rng(seed=100)
arr = rng.integers(0, 5, size=(10,3))
arr
array([[3, 4, 0],
[2, 0, 1],
[2, 0, 2],
[4, 4, 2],
[2, 3, 4],
[4, 0, 3],
[3, 0, 2],
[4, 3, 1],
[1, 3, 0],
[2, 2, 2]])
Define a variable col that is equal to the 0-th column of arr
col = arr[:,0]
col
array([3, 2, 2, 4, 2, 4, 3, 4, 1, 2])
What if we had used two sets of square brackets, like what we need if we are using lists of lists? Break the following up into pieces. arr2[:]
is getting the entire array (“every row”), and then [0] is getting the top row.
arr[:][0] #arr[:] every row
array([3, 4, 0])
arr[:][2] #index 2 row
array([2, 0, 2])
arr[2,:]
array([2, 0, 2])
Create the subarray of arr containing the rows which begin with a 2.
We can see what number each row starts with by using col
which we defined above.
col
array([3, 2, 2, 4, 2, 4, 3, 4, 1, 2])
Find where col
is equal to 2
.
col == 2
array([False, True, True, False, True, False, False, False, False,
True])
Use Boolean indexing to extract all the rows that begin with 2
.
subarr = arr[col == 2,:]
subarr
array([[2, 0, 1],
[2, 0, 2],
[2, 3, 4],
[2, 2, 2]])
arr[:][col == 2] # arr[:] every row
array([[2, 0, 1],
[2, 0, 2],
[2, 3, 4],
[2, 2, 2]])
arr[col == 2]
array([[2, 0, 1],
[2, 0, 2],
[2, 3, 4],
[2, 2, 2]])
More complex example of Boolean indexing#
Define arr
as above. We will then create the subarray of arr
containing the rows which have at least two 2s using the following strategy.
Make a 10x3 Boolean array indicating where
arr
is equal to 2.
arr == 2
array([[False, False, False],
[ True, False, False],
[ True, False, True],
[False, False, True],
[ True, False, False],
[False, False, False],
[False, False, True],
[False, False, False],
[False, False, False],
[ True, True, True]])
(arr == 2).shape
(10, 3)
Use the
sum
method withaxis=1
to find how many 2s there are in each row.
help(np.sum)
Help on function sum in module numpy:
sum(a, axis=None, dtype=None, out=None, keepdims=<no value>, initial=<no value>, where=<no value>)
Sum of array elements over a given axis.
Parameters
----------
a : array_like
Elements to sum.
axis : None or int or tuple of ints, optional
Axis or axes along which a sum is performed. The default,
axis=None, will sum all of the elements of the input array. If
axis is negative it counts from the last to the first axis.
.. versionadded:: 1.7.0
If axis is a tuple of ints, a sum is performed on all of the axes
specified in the tuple instead of a single axis or all the axes as
before.
dtype : dtype, optional
The type of the returned array and of the accumulator in which the
elements are summed. The dtype of `a` is used by default unless `a`
has an integer dtype of less precision than the default platform
integer. In that case, if `a` is signed then the platform integer
is used while if `a` is unsigned then an unsigned integer of the
same precision as the platform integer is used.
out : ndarray, optional
Alternative output array in which to place the result. It must have
the same shape as the expected output, but the type of the output
values will be cast if necessary.
keepdims : bool, optional
If this is set to True, the axes which are reduced are left
in the result as dimensions with size one. With this option,
the result will broadcast correctly against the input array.
If the default value is passed, then `keepdims` will not be
passed through to the `sum` method of sub-classes of
`ndarray`, however any non-default value will be. If the
sub-class' method does not implement `keepdims` any
exceptions will be raised.
initial : scalar, optional
Starting value for the sum. See `~numpy.ufunc.reduce` for details.
.. versionadded:: 1.15.0
where : array_like of bool, optional
Elements to include in the sum. See `~numpy.ufunc.reduce` for details.
.. versionadded:: 1.17.0
Returns
-------
sum_along_axis : ndarray
An array with the same shape as `a`, with the specified
axis removed. If `a` is a 0-d array, or if `axis` is None, a scalar
is returned. If an output array is specified, a reference to
`out` is returned.
See Also
--------
ndarray.sum : Equivalent method.
add.reduce : Equivalent functionality of `add`.
cumsum : Cumulative sum of array elements.
trapz : Integration of array values using the composite trapezoidal rule.
mean, average
Notes
-----
Arithmetic is modular when using integer types, and no error is
raised on overflow.
The sum of an empty array is the neutral element 0:
>>> np.sum([])
0.0
For floating point numbers the numerical precision of sum (and
``np.add.reduce``) is in general limited by directly adding each number
individually to the result causing rounding errors in every step.
However, often numpy will use a numerically better approach (partial
pairwise summation) leading to improved precision in many use-cases.
This improved precision is always provided when no ``axis`` is given.
When ``axis`` is given, it will depend on which axis is summed.
Technically, to provide the best speed possible, the improved precision
is only used when the summation is along the fast axis in memory.
Note that the exact precision may vary depending on other parameters.
In contrast to NumPy, Python's ``math.fsum`` function uses a slower but
more precise approach to summation.
Especially when summing a large number of lower precision floating point
numbers, such as ``float32``, numerical errors can become significant.
In such cases it can be advisable to use `dtype="float64"` to use a higher
precision for the output.
Examples
--------
>>> np.sum([0.5, 1.5])
2.0
>>> np.sum([0.5, 0.7, 0.2, 1.5], dtype=np.int32)
1
>>> np.sum([[0, 1], [0, 5]])
6
>>> np.sum([[0, 1], [0, 5]], axis=0)
array([0, 6])
>>> np.sum([[0, 1], [0, 5]], axis=1)
array([1, 5])
>>> np.sum([[0, 1], [np.nan, 5]], where=[False, True], axis=1)
array([1., 5.])
If the accumulator is too small, overflow occurs:
>>> np.ones(128, dtype=np.int8).sum(dtype=np.int8)
-128
You can also start the sum with a value other than zero:
>>> np.sum([10], initial=5)
15
ba = (arr == 2)
np.sum(ba, axis = 1) # axis = 1 means taking sum of all elements (cols) in each row
array([0, 1, 2, 1, 1, 0, 1, 0, 0, 3])
ba.sum(axis = 1)
array([0, 1, 2, 1, 1, 0, 1, 0, 0, 3])
Axis along which a sum is performed. Axis = 0 means taking sum along the row
np.sum(ba, axis = 0)
array([4, 1, 4])
np.sum(ba, axis =2) # we do not have the third dimension
---------------------------------------------------------------------------
AxisError Traceback (most recent call last)
Cell In [18], line 1
----> 1 np.sum(ba, axis =2)
File <__array_function__ internals>:180, in sum(*args, **kwargs)
File /shared-libs/python3.9/py/lib/python3.9/site-packages/numpy/core/fromnumeric.py:2298, in sum(a, axis, dtype, out, keepdims, initial, where)
2295 return out
2296 return res
-> 2298 return _wrapreduction(a, np.add, 'sum', axis, dtype, out, keepdims=keepdims,
2299 initial=initial, where=where)
File /shared-libs/python3.9/py/lib/python3.9/site-packages/numpy/core/fromnumeric.py:86, in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
83 else:
84 return reduction(axis=axis, out=out, **passkwargs)
---> 86 return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
AxisError: axis 2 is out of bounds for array of dimension 2
Use Boolean indexing to create the subarray of
arr
containing only the rows which have at least two 2s.
num2 = ba.sum(axis = 1)
num2 >= 2
array([False, False, True, False, False, False, False, False, False,
True])
num2 > 1
array([False, False, True, False, False, False, False, False, False,
True])
Keep the rows corresponding to the True values, that is, we keep the rows that have at least two 2s.
subar2 = arr[num2 > 1]
subar2
array([[2, 0, 2],
[2, 2, 2]])
arr
array([[3, 4, 0],
[2, 0, 1],
[2, 0, 2],
[4, 4, 2],
[2, 3, 4],
[4, 0, 3],
[3, 0, 2],
[4, 3, 1],
[1, 3, 0],
[2, 2, 2]])
We’ve seen that we can use Boolean arrays to keep certain rows. We can also use a list of indices. Here we get the row at index 0, 5 and 9.
arr[[0, 5, 9]] #arr[[0,5,6],:] #arr[:][[0, 5, 9]]
array([[3, 4, 0],
[4, 0, 3],
[2, 2, 2]])
arr[[0,5,6],:]
array([[3, 4, 0],
[4, 0, 3],
[3, 0, 2]])
arr[0,5,9]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In [27], line 1
----> 1 arr[0,5,6]
IndexError: too many indices for array: array is 2-dimensional, but 3 were indexed
Get the row at index 9 repeated three times.
arr[[9, 9, 9]] # repeat last row three times
array([[2, 2, 2],
[2, 2, 2],
[2, 2, 2]])
arr
array([[3, 4, 0],
[2, 0, 1],
[2, 0, 2],
[4, 4, 2],
[2, 3, 4],
[4, 0, 3],
[3, 0, 2],
[4, 3, 1],
[1, 3, 0],
[2, 2, 2]])
Why did we use the double square brackets above? The outer square brackets are for indexing. The inner square brackets are for a list. Why do we need the list inside? Here is what happens if we omit the inner square brackets. The 0 gets us to the row at index 0, and the 1 gets to the element at index 1 in that row. (Remember that numbering in Python starts at 0.)
arr[0,1]
4
Get the column at index 0 repeated twice.
arr[:,[0, 0]]
array([[3, 3],
[2, 2],
[2, 2],
[4, 4],
[2, 2],
[4, 4],
[3, 3],
[4, 4],
[1, 1],
[2, 2]])
type([1,2,3,4,5])
list
arrb = np.array([1,2,3,4,5])
type(arrb)
numpy.ndarray
Another example of the axis
keyword argument#
What is the result of evaluating the following?
arr.max()
arr.max(axis=0)
arr.max(axis=1)
help(np.max)
Help on function amax in module numpy:
amax(a, axis=None, out=None, keepdims=<no value>, initial=<no value>, where=<no value>)
Return the maximum of an array or maximum along an axis.
Parameters
----------
a : array_like
Input data.
axis : None or int or tuple of ints, optional
Axis or axes along which to operate. By default, flattened input is
used.
.. versionadded:: 1.7.0
If this is a tuple of ints, the maximum is selected over multiple axes,
instead of a single axis or all the axes as before.
out : ndarray, optional
Alternative output array in which to place the result. Must
be of the same shape and buffer length as the expected output.
See :ref:`ufuncs-output-type` for more details.
keepdims : bool, optional
If this is set to True, the axes which are reduced are left
in the result as dimensions with size one. With this option,
the result will broadcast correctly against the input array.
If the default value is passed, then `keepdims` will not be
passed through to the `amax` method of sub-classes of
`ndarray`, however any non-default value will be. If the
sub-class' method does not implement `keepdims` any
exceptions will be raised.
initial : scalar, optional
The minimum value of an output element. Must be present to allow
computation on empty slice. See `~numpy.ufunc.reduce` for details.
.. versionadded:: 1.15.0
where : array_like of bool, optional
Elements to compare for the maximum. See `~numpy.ufunc.reduce`
for details.
.. versionadded:: 1.17.0
Returns
-------
amax : ndarray or scalar
Maximum of `a`. If `axis` is None, the result is a scalar value.
If `axis` is given, the result is an array of dimension
``a.ndim - 1``.
See Also
--------
amin :
The minimum value of an array along a given axis, propagating any NaNs.
nanmax :
The maximum value of an array along a given axis, ignoring any NaNs.
maximum :
Element-wise maximum of two arrays, propagating any NaNs.
fmax :
Element-wise maximum of two arrays, ignoring any NaNs.
argmax :
Return the indices of the maximum values.
nanmin, minimum, fmin
Notes
-----
NaN values are propagated, that is if at least one item is NaN, the
corresponding max value will be NaN as well. To ignore NaN values
(MATLAB behavior), please use nanmax.
Don't use `amax` for element-wise comparison of 2 arrays; when
``a.shape[0]`` is 2, ``maximum(a[0], a[1])`` is faster than
``amax(a, axis=0)``.
Examples
--------
>>> a = np.arange(4).reshape((2,2))
>>> a
array([[0, 1],
[2, 3]])
>>> np.amax(a) # Maximum of the flattened array
3
>>> np.amax(a, axis=0) # Maxima along the first axis
array([2, 3])
>>> np.amax(a, axis=1) # Maxima along the second axis
array([1, 3])
>>> np.amax(a, where=[False, True], initial=-1, axis=0)
array([-1, 3])
>>> b = np.arange(5, dtype=float)
>>> b[2] = np.NaN
>>> np.amax(b)
nan
>>> np.amax(b, where=~np.isnan(b), initial=-1)
4.0
>>> np.nanmax(b)
4.0
You can use an initial value to compute the maximum of an empty slice, or
to initialize it to a different value:
>>> np.amax([[-50], [10]], axis=-1, initial=0)
array([ 0, 10])
Notice that the initial value is used as one of the elements for which the
maximum is determined, unlike for the default argument Python's max
function, which is only used for empty iterables.
>>> np.amax([5], initial=6)
6
>>> max([5], default=6)
5
arr
array([[3, 4, 0],
[2, 0, 1],
[2, 0, 2],
[4, 4, 2],
[2, 3, 4],
[4, 0, 3],
[3, 0, 2],
[4, 3, 1],
[1, 3, 0],
[2, 2, 2]])
If we use max()
without any axis argument, it returns the overall maximum in the array. This max
(like sum
above) is an example of a method. Methods are types of functions in Python, but they’re functions which are attached to an object. This max
method and the sum
method above are attached to NumPy array objects.
arr.max()
4
Axis denotes along which to operate. (Why does “rows” correspond to axis=0? For example, when we say 10x3, the 10 refers to rows.)
arr.max(axis = 0)
array([4, 4, 4])
arr.max(axis = 1)
array([4, 2, 2, 4, 4, 4, 3, 4, 3, 2])
In how many rows of
arr
is the maximum entry in that row2
or less?
Find the maximum in each row and create a Boolean array, indicating if the maximum is less than or equal to 2 or not.
maxrow = arr.max(axis = 1)
maxrow <= 2
array([False, True, True, False, False, False, False, False, False,
True])
(maxrow <= 2).sum()
3
Functions in Python#
Write a function
getsub
which takes two inputs, a NumPyarr
and an integern
, and as output returns the subarray ofarr
containing all rows with at least two entries equal ton
.