Boolean arrays and Boolean indexing#

import numpy as np
# Instantiate a random number generator object
rng = np.random.default_rng()
Here we use rng to make a length 10 NumPy array of random integers between 0 (inclusive) and 5 (exclusive).

rng = np.random.default_rng()
arr = rng.integers(0,5,size = 10)
array([4, 4, 1, 3, 2, 4, 4, 4, 3, 3])
array([[3, 4, 0],
       [2, 0, 1]])

When we do not specify a seed, we get different results every time.

arr = rng.integers(0,5,size = 10)
array([2, 3, 3, 1, 2, 3, 2, 2, 0, 2])
  • How can we guarantee consistent (or reproducible) random integers?

Generator(PCG64) at 0x7F00F834BBA0

When we use a fixed seed keyword argument, we get the same output every time.

rng = np.random.default_rng(seed=50)
arr = rng.integers(0,5,size = 10)
array([3, 3, 3, 4, 4, 2, 1, 4, 4, 1])

If we change to a different seed, we get a new output.

rng = np.random.default_rng(seed=110)
arr = rng.integers(0,5,size = 10)
array([2, 4, 3, 2, 1, 3, 3, 2, 0, 1])
rng = np.random.default_rng(seed=110)
arr = rng.integers(0,5,size = 5)
array([2, 4, 3, 2, 1])

If we run the same code again, we will get new integers.

arr = rng.integers(0,5,size = 5)
array([3, 3, 2, 0, 1])
  • Make a Boolean array indicating where the array is equal to 2.

To get consistent results, it helps to put all of these lines into the same cell.

rng = np.random.default_rng(seed=110)
arr = rng.integers(0,5,size = 10)
array([2, 4, 3, 2, 1, 3, 3, 2, 0, 1])

Be sure you understand how these Boolean values correspond to the values in the array. Also, notice that we are using two equals signs, not one, to compare for elementwise equality. (One equals sign is for assignment.)

arr == 2
array([ True, False, False,  True, False, False, False,  True, False,
  • Count how many of these entries are equal to 2.

Because True is treated like 1 and False is treated like 0, we can count the number of True values (in this case, that is the number of 2 values in the original array) by using sum.

Here we use the built-in Python function sum.

# use sum (built-in python function)
sum(arr == 2)

Here we use the NumPy array method sum.

np.sum(arr == 2)
Another way to compute the sum of the elements in a NumPy array

(arr == 2).sum() 

Sometimes it is more elegant to save the intermediate values along the way, rather than copy-pasting. Here we save the Boolean array with the variable name ba.

ba = (arr == 2) #boolean indexing for entries = 2

Notice that ba really is a NumPy array.

  • Make a Boolean array indicating where arr is strictly greater than 1 and less than or equal to 3.

Here we check where it’s strictly greater than 1.

arr > 1
array([ True,  True,  True,  True, False,  True,  True,  True, False,

Here we check where it’s less than or equal to 3.

arr <= 3
array([ True, False,  True,  True,  True,  True,  True,  True,  True,

We now want to check where both are satisfied. Python gets confused because we do not have parentheses.

arr > 1 & arr <= 3
ValueError                                Traceback (most recent call last)
Cell In [33], line 1
----> 1 arr > 1 & arr <= 3

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
(arr > 1) and (arr <= 3)
ValueError                                Traceback (most recent call last)
Cell In [38], line 1
----> 1 (arr > 1) and (arr <= 3)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
  • Thanks to Katie finding the Python Operators. It mentions that for logical operators we can use and to return true id both statements are true, such as:

x = 5

(x > 3) and (x < 10)
  • While for bitwise operators, we use &, such as:

ba2 = (arr > 1) & (arr <= 3)
array([ True, False,  True,  True, False,  True,  True,  True, False,
  • Using Boolean indexing, produce the subarray of arr containing the values which are strictly greater than 1 and less than or equal to 3.

array([2, 4, 3, 2, 1, 3, 3, 2, 0, 1])
array([2, 3, 2, 3, 3, 2])
  • Make a 10x3 NumPy array arr2 of random integers between 0 (inclusive) and 5 (exclusive). Here, we will specify the size keyword argument using a tuple rather than an int. Use seed=100 so we all have the same values.

rng = np.random.default_rng(seed = 100)
arr2 = rng.integers(0,5, size = (10,3))
array([[3, 4, 0],
       [2, 0, 1],
       [2, 0, 2],
       [4, 4, 2],
       [2, 3, 4],
       [4, 0, 3],
       [3, 0, 2],
       [4, 3, 1],
       [1, 3, 0],
       [2, 2, 2]])
(10, 3)
  • Define a variable col that is equal to the 0-th column of arr2.

(Let’s try to consistently start counting at 0 in this class)

array([[3, 4, 0],
       [2, 0, 1],
       [2, 0, 2],
       [4, 4, 2],
       [2, 3, 4],
       [4, 0, 3],
       [3, 0, 2],
       [4, 3, 1],
       [1, 3, 0],
       [2, 2, 2]])
col = arr2[:,0]
array([3, 2, 2, 4, 2, 4, 3, 4, 1, 2])
# get row 1
row1 = arr2[1,:]
array([2, 0, 1])