Why Numpy File I/O is Needed?¶
When working with Numpy to process data, we often need to temporarily store intermediate results or permanently save data for later use. For example, in a machine learning project, you may need to save model parameters after training; or in data analysis, you may need to reuse previously processed arrays. Numpy provides functions like save and load to easily implement the persistence of array data (i.e., storing and reading data).
Numpy’s save and load: Persistence of a Single Array¶
1. np.save(): Save a Single Array to a File¶
np.save() saves a single Numpy array into a binary file with the extension .npy. The syntax is:
np.save(文件名, 数组)
Example:
import numpy as np
# Create an example array
arr = np.array([1, 2, 3, 4, 5])
# Save the array to a file (automatically generates my_array.npy)
np.save('my_array', arr)
2. np.load(): Load an Array from a File¶
np.load() reads array data from a .npy file. The syntax is:
np.load(文件名)
Example:
# Load the saved array
loaded_arr = np.load('my_array.npy')
# Verify if the array content is consistent
print("Original array:", arr)
print("Loaded array:", loaded_arr)
print("Are arrays equal?", np.array_equal(arr, loaded_arr)) # Output: True
Key Point: File Extension¶
np.save()automatically appends the.npyextension (even if not specified), e.g.,np.save('data', arr)createsdata.npy.- Ensure the file path is correct and the file exists when loading.
Numpy’s savez: Persistence of Multiple Arrays¶
When you need to save multiple arrays simultaneously, use np.savez(), which compresses multiple arrays into a single .npz file.
1. np.savez(): Save Multiple Arrays¶
np.savez(文件名, 数组1=数组1, 数组2=数组2, ...)
Example:
# Create two arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([[4, 5, 6], [7, 8, 9]])
# Save multiple arrays to my_arrays.npz (keys are 'arr1' and 'arr2' by default)
np.savez('my_arrays', arr1=arr1, arr2=arr2)
2. np.load(): Load Multiple Arrays¶
Loading a .npz file returns a dictionary-like object, where arrays can be accessed by their names (keys).
Example:
# Load multiple arrays
loaded_data = np.load('my_arrays.npz')
# View all array names
print("Array names:", loaded_data.files) # Output: ['arr1', 'arr2']
# Access arrays by key name
print("arr1:", loaded_data['arr1'])
print("arr2:", loaded_data['arr2'])
Comparison: savez vs save¶
savesaves only one array, suitable for simple scenarios.savezsaves multiple arrays, ideal for grouped data storage (e.g., model parameters + data labels).
Practical Example: Complete Data Processing Workflow¶
Scenario: Save and Load Experimental Data¶
import numpy as np
# 1. Generate simulated data
data1 = np.random.rand(100) # 100 random numbers
data2 = np.random.randn(50, 50) # 50x50 random normal distribution array
# 2. Save data
np.save('data1.npy', data1)
np.savez('data2_50x50.npz', matrix=data2)
# 3. Load data for later use
loaded_data1 = np.load('data1.npy')
loaded_data2 = np.load('data2_50x50.npz')['matrix']
# 4. Perform data analysis
print("Mean of data1:", np.mean(loaded_data1))
print("Shape of data2:", loaded_data2.shape)
print("Max of data2:", np.max(loaded_data2))
Additional: Text Format Saving (savetxt/loadtxt)¶
For saving arrays as plain text (e.g., CSV format), use np.savetxt() and np.loadtxt():
# Save as CSV (comma-separated)
np.savetxt('data.csv', data1, delimiter=',')
# Load text file
loaded_csv = np.loadtxt('data.csv', delimiter=',')
Note: Text format is human-readable but less efficient. Binary formats (.npy/.npz) preserve data types and are faster.
Summary¶
- Single array: Use
np.save()andnp.load()to generate.npyfiles. - Multiple arrays: Use
np.savez()andnp.load()to generate.npzfiles. - Text format: Use
savetxt()andloadtxt()for CSV/other plain text (human-readable).
With these methods, you can flexibly handle data persistence in Numpy for various machine learning and data analysis tasks.