This repository demonstrates the advantages of using NumPy arrays over Python lists in terms of speed, memory efficiency, convenience, and advanced indexing features.
-
- Python Lists
- NumPy Arrays
- Why NumPy is faster
- Why Python lists are slow
-
- Python Lists
- NumPy Arrays
-
- Element-wise operations
- Broadcasting
- Advanced Indexing
- Python List Example
- NumPy Array Example
-
- Normal Indexing and Slicing
- Fancy Indexing
- Boolean Indexing
-
- Broadcasting Rules
- Examples of Broadcasting
- Broadcasting Works
- Broadcasting Fails
- Key Takeaways
-
- Sigmoid Function
- Mean Squared Error (MSE)
-
- y = x Function Plot
- y = x² Function Plot
- y = sin(x) Function Plot
- y = xlog(x) Function Plot
- Sigmoid Function Plot
This repository demonstrates the advantages of using NumPy arrays over Python lists in terms of speed, memory efficiency, convenience, and advanced indexing features.
a = [i for i in range(10000000)]
b = [i for i in range(10000000,20000000)]
c = []
import time
start = time.time()
for i in range(len(a)):
c.append(a[i] + b[i])
print(time.time()-start) # Output: 3.2699835300445557 seconds
import numpy as np
a = np.arange(10000000)
b = np.arange(10000000,20000000)
start = time.time()
c = a + b
print(time.time()-start) # Output: 0.06481003761291504 seconds
Result: NumPy arrays are significantly faster, performing over 54 times faster in this example.
- Numpy uses C programming type array (which means it is a static array,fixed-size array) which is why it is faster
- numpy array is not a referential array (meaning in memory numpy directly stores items/values, not the item address)
- it is a dynamic array, when the list gets full of items, it creates a new list whose size is double the previous list and has to copy all the value from the old list.because of this reason, it takes a lot of time
- python list are a referential array(meaning it is not direactly store item instead it used item address, then go to the memory to fetch all the item)
a = [i for i in range(10000000)]
import sys
sys.getsizeof(a) # Output: 81528048 bytes
a = np.arange(10000000, dtype=np.int8)
sys.getsizeof(a) # Output: 10000104 bytes
Result: NumPy arrays use much less memory than Python lists.
NumPy arrays provide more convenience than Python lists for tasks that involve complex numerical operations, matrix manipulations, and high-performance computations due to their optimized structure and built-in functions. Here’s why:
- Element-wise operations: In NumPy, you can directly perform operations on arrays element-wise without having to loop over individual elements like you would in Python lists.
- Broadcasting: NumPy allows operations between arrays of different shapes, automatically adjusting the shapes where necessary (broadcasting), which is not available with Python lists.
- Advanced indexing: NumPy allows sophisticated slicing and indexing techniques, making data manipulation easier.
- Performance: NumPy is implemented in C and optimized for performance, making it much faster for large data sets than Python lists.
This process requires manually looping through each element, adding complexity and decreasing performance for large datasets.
# Python lists
a = [1, 2, 3, 4]
b = [5, 6, 7, 8]
# Adding two lists element-wise (manual loop)
c = []
for i in range(len(a)):
c.append(a[i] + b[i])
print("Result using Python list:", c)
Result using Python list: [6, 8, 10, 12]
In NumPy, the operation is much simpler and faster since you can directly add two arrays without looping.
import numpy as np
# NumPy arrays
a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])
# Adding two arrays element-wise (direct operation)
c = a + b
print("Result using NumPy array:", c)
Result using NumPy array: [ 6 8 10 12]
NumPy provides more efficient ways to work with arrays using advanced indexing techniques.
a = np.arange(24).reshape(6, 4)
print(a[1, 2]) # Output: 5
print(a[1:3, 1:3]) # Output: array([[5, 6], [9, 10]])
A fancy index gets used when we want to fetch rows and columns and can't be fetched through normal indexing because there are no pattern matches so we use fancy indexing.
print(a[:, [0, 2, 3]])
# Output:
# array([[ 0, 2, 3],
# [ 4, 6, 7],
# [ 8, 10, 11],
# [12, 14, 15],
# [16, 18, 19],
# [20, 22, 23]])
a = np.random.randint(1, 100, 24).reshape(6, 4)
print(a[a > 50]) # Output: All elements greater than 50
Broadcasting allows for operations between arrays of different shapes by “broadcasting” the smaller array.
-
Make the two arrays have the same number of dimensions
If the numbers of dimensions of the two arrays are different, add new dimensions with size 1 to the head of the array with the smaller dimension. -
Make each dimension of the two arrays the same size
If the sizes of each dimension of the two arrays do not match, dimensions with size 1 are stretched to the size of the other array.
If there is a dimension whose size is not 1 in either of the two arrays, it cannot be broadcasted, and an error is raised.
import numpy as np
a = np.array([1, 2, 3])
b = 5
result = a + b
print(result)
result : [6 7 8]
The scalar b is broadcasted across each element of the array a, adding 5 to each element.
import numpy as np
a = np.array([[1, 2, 3], [4, 5, 6]]) # Shape (2, 3)
b = np.array([10, 20, 30]) # Shape (3,)
result = a + b
print(result)
Result:
[[11 22 33]
[14 25 36]]
The 1D array b is broadcasted to match the shape of a, and the addition happens element-wise.
import numpy as np
a = np.array([1, 2, 3])
b = np.array([10, 20])
result = a + b # This will raise an error
result : error
Here, the shapes of a (shape (3,)) and b (shape (2,)) are incompatible for broadcasting because the dimensions do not align.
import numpy as np
a = np.array([[1, 2], [3, 4]]) # Shape (2, 2)
b = np.array([10, 20, 30]) # Shape (3,)
result = a + b # This will raise an error
result : error
The shapes (2, 2) and (3,) are not compatible for broadcasting, as the dimensions cannot be made to match.
- Broadcasting occurs when the dimensions can be stretched or expanded to match.
- When dimensions are incompatible for broadcasting, a
ValueError
is raised.
NumPy allows for mathematical operations to be applied directly to arrays.
def sigmoid(array):
return 1 / (1 + np.exp(-array))
def mse(actual, predicted):
return np.mean((actual - predicted)**2)
a = np.array([1, 2, 3, 4, np.nan, 6])
print(a[~np.isnan(a)]) # Output: array([1., 2., 3., 4., 6.])
np.isnan() go to every data point and ask you are nan value and if it is nan it shows true otherwise, it is false.
NumPy arrays are well-suited for generating data to plot graphs using Matplotlib.
# plotting a 2D plot
# x = y
import matplotlib.pyplot as plt
x = np.linspace(-10,10,100)
y = x
plt.plot(x,y)
# y = x^2
x = np.linspace(-10,10,100)
y = x**2
plt.plot(x,y)
# y = sin(x)
x = np.linspace(-10,10,100)
y = np.sin(x)
plt.plot(x,y)
# y = xlog(x)
x = np.linspace(-10,10,100)
y = x * np.log(x)
plt.plot(x,y)
import matplotlib.pyplot as plt
x = np.linspace(-10, 10, 100)
y = 1 / (1 + np.exp(-x))
plt.plot(x, y)
plt.show()