How (not) to copy a NumPy array
The Python code below has puzzled me for a while. It shows the initialization of NumPy array a
and three examples of copy assignment stored in b
, c
, and d
.
import numpy as np
a = np.arange(3,5)
#a = [3, 4]
b = a
c = a[:]
d = a.copy()
print(b is a) # True
print(c is a) # False
print(d is a) # False
print(a, b, c, d) #[3 4] [3 4] [3 4] [3 4]
a[0] = -11.
print(a, b, c, d) #[-11 4] [-11 4] [-11 4] [3 4]
Based on these assignments, I expected b
to be exactly equal to a
, but c
and d
not. The outcome of the print
statements on lines 9-11 confirmed my expectations.
Now, when I changed the first element of a
to -11
, I expected the first element of b
to change as well, as b
is only a reference to a
, but those of c
and d
to remain constant. To my great surprise, however, the first element of c
had changed as well! It makes the behavior of NumPy very different than that of lists (just comment out line 4 to see the difference).
With some help from StackOverflow, and the SciPy Cookbook, I discovered that the [:]
operator of a NumPy array does not make a copy of the data, but it provides a so-called view to the same data. This means that even though a
and c
are different objects, as confirmed in line 10, they still point to the same data.
The use case of copy assignment of the form c = a[:]
is not entirely clear from the current example. A better one is to create a variable a_even_indices = a[::2]
to provide a way to access only the even indices of a
with simple assignments as a_even_indices[:] = 3
.
My most important lesson: to make a deep copy of a NumPy array, always use the copy
function that NumPy provides.