I. What is Numpy?¶
NumPy is the core library for scientific computing in Python, providing efficient multidimensional array objects (ndarray) and a large collection of mathematical functions. Random number generation is one of NumPy’s common functionalities, widely used in data analysis, machine learning, simulation experiments, and other scenarios. The np.random submodule is dedicated to random number generation, with rand and randn being the two most basic and frequently used functions.
II. Installation and Importing Numpy¶
First, ensure NumPy is installed (if not, use pip install numpy). Import NumPy in your code:
import numpy as np
III. Basic Concepts of Random Number Generation¶
NumPy’s random numbers are pseudorandom numbers (generated by algorithms, reproducible with a fixed seed). Common distributions include:
- Uniform Distribution: Each value has an equal probability (similar to rolling a die, where each face has equal probability).
- Normal Distribution (Gaussian Distribution): Data clusters around the mean, forming a “bell curve” (similar to human heights, most values near the average, with few extreme values).
IV. np.random.rand: Generating Uniformly Distributed Random Numbers¶
np.random.rand(d0, d1, ..., dn) generates random numbers from a uniform distribution in the interval [0, 1). The parameters d0, d1, ... specify the array shape (e.g., rows, columns).
Example 1: 1D Array¶
# Generate a 1D array of length 5 (shape: (5,))
arr1 = np.random.rand(5)
print(arr1)
# Sample output (results vary on each run): [0.3456 0.1234 0.7890 0.5678 0.9012]
Example 2: Multidimensional Array¶
# Generate a 2x3 matrix (shape: (2, 3))
arr2 = np.random.rand(2, 3)
print(arr2)
# Sample output:
# [[0.1234 0.5678 0.9012]
# [0.3456 0.7890 0.2345]]
Key Features:¶
- All elements lie in
[0, 1). - If no parameters are provided (
np.random.rand()), returns a scalar random number between 0 and 1.
V. np.random.randn: Generating Standard Normal Distributed Random Numbers¶
np.random.randn(d0, d1, ..., dn) generates random numbers from a standard normal distribution (mean = 0, standard deviation = 1). The parameter meaning is the same as rand.
Example 1: Single Random Number¶
# Generate 1 standard normal random number
num = np.random.randn()
print(num)
# Sample output: 0.5678 (can be positive or negative; most values cluster between -1 and 1)
Example 2: 2x2 Matrix¶
# Generate a 2x2 matrix of standard normal values
mat = np.random.randn(2, 2)
print(mat)
# Sample output:
# [[-0.1234 0.5678]
# [ 0.9012 -0.3456]]
Key Features:¶
- Data clusters around 0; most values lie in
[-1, 1], with extreme values (e.g., ±3) being rare. - To adjust mean/standard deviation: use
μ + σ * np.random.randn(shape)(whereμis the target mean andσis the target standard deviation).
VI. Comparison of rand and randn¶
| Feature | np.random.rand() |
np.random.randn() |
|---|---|---|
| Distribution | Uniform ([0, 1)) | Standard normal (mean=0, std=1) |
| Value Range | All elements in [0, 1) | Can be positive/negative; most values in [-1, 1] |
| Parameters | Array shape (e.g., (m, n)) |
Array shape (e.g., (m, n)) |
| Purpose | Generate equally probable random data | Simulate natural data fluctuations (e.g., noise) |
VII. Practical Tip: Fixing the Random Seed¶
To avoid random result fluctuations (e.g., reproducible experiments), use np.random.seed():
np.random.seed(0) # Fixed seed ensures consistent random results
print(np.random.rand(2, 2))
# Output:
# [[0.3980 0.4109]
# [0.0399 0.3398]]
VIII. Summary¶
np.random.rand(shape)generates uniformly distributed random arrays in [0, 1), suitable for scenarios requiring equal probability (e.g., initializing weights).np.random.randn(shape)generates standard normal distributed random arrays, ideal for simulating natural data (e.g., noise, testing algorithm robustness).
Recommendation: Beginners can generate arrays of different shapes to observe distribution characteristics and quickly distinguish between the two functions.
Exercises¶
Try generating the following random arrays and observe results:
1. np.random.rand(3, 4) (3x4 uniform matrix)
2. np.random.randn(1000) (1000 standard normal numbers; check if mean and std are close to 0 and 1)
3. Fix seed seed=1, generate np.random.rand(2) and compare results across runs.