Why Numpy File I/O is Needed?

When working with Numpy to process data, we often need to temporarily store intermediate results or permanently save data for later use. For example, in a machine learning project, you may need to save model parameters after training; or in data analysis, you may need to reuse previously processed arrays. Numpy provides functions like save and load to easily implement the persistence of array data (i.e., storing and reading data).

Numpy’s save and load: Persistence of a Single Array

1. np.save(): Save a Single Array to a File

np.save() saves a single Numpy array into a binary file with the extension .npy. The syntax is:

np.save(文件名, 数组)

Example:

import numpy as np

# Create an example array
arr = np.array([1, 2, 3, 4, 5])

# Save the array to a file (automatically generates my_array.npy)
np.save('my_array', arr)

2. np.load(): Load an Array from a File

np.load() reads array data from a .npy file. The syntax is:

np.load(文件名)

Example:

# Load the saved array
loaded_arr = np.load('my_array.npy')

# Verify if the array content is consistent
print("Original array:", arr)
print("Loaded array:", loaded_arr)
print("Are arrays equal?", np.array_equal(arr, loaded_arr))  # Output: True

Key Point: File Extension

  • np.save() automatically appends the .npy extension (even if not specified), e.g., np.save('data', arr) creates data.npy.
  • Ensure the file path is correct and the file exists when loading.

Numpy’s savez: Persistence of Multiple Arrays

When you need to save multiple arrays simultaneously, use np.savez(), which compresses multiple arrays into a single .npz file.

1. np.savez(): Save Multiple Arrays

np.savez(文件名, 数组1=数组1, 数组2=数组2, ...)

Example:

# Create two arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([[4, 5, 6], [7, 8, 9]])

# Save multiple arrays to my_arrays.npz (keys are 'arr1' and 'arr2' by default)
np.savez('my_arrays', arr1=arr1, arr2=arr2)

2. np.load(): Load Multiple Arrays

Loading a .npz file returns a dictionary-like object, where arrays can be accessed by their names (keys).

Example:

# Load multiple arrays
loaded_data = np.load('my_arrays.npz')

# View all array names
print("Array names:", loaded_data.files)  # Output: ['arr1', 'arr2']

# Access arrays by key name
print("arr1:", loaded_data['arr1'])
print("arr2:", loaded_data['arr2'])

Comparison: savez vs save

  • save saves only one array, suitable for simple scenarios.
  • savez saves multiple arrays, ideal for grouped data storage (e.g., model parameters + data labels).

Practical Example: Complete Data Processing Workflow

Scenario: Save and Load Experimental Data

import numpy as np

# 1. Generate simulated data
data1 = np.random.rand(100)  # 100 random numbers
data2 = np.random.randn(50, 50)  # 50x50 random normal distribution array

# 2. Save data
np.save('data1.npy', data1)
np.savez('data2_50x50.npz', matrix=data2)

# 3. Load data for later use
loaded_data1 = np.load('data1.npy')
loaded_data2 = np.load('data2_50x50.npz')['matrix']

# 4. Perform data analysis
print("Mean of data1:", np.mean(loaded_data1))
print("Shape of data2:", loaded_data2.shape)
print("Max of data2:", np.max(loaded_data2))

Additional: Text Format Saving (savetxt/loadtxt)

For saving arrays as plain text (e.g., CSV format), use np.savetxt() and np.loadtxt():

# Save as CSV (comma-separated)
np.savetxt('data.csv', data1, delimiter=',')

# Load text file
loaded_csv = np.loadtxt('data.csv', delimiter=',')

Note: Text format is human-readable but less efficient. Binary formats (.npy/.npz) preserve data types and are faster.

Summary

  • Single array: Use np.save() and np.load() to generate .npy files.
  • Multiple arrays: Use np.savez() and np.load() to generate .npz files.
  • Text format: Use savetxt() and loadtxt() for CSV/other plain text (human-readable).

With these methods, you can flexibly handle data persistence in Numpy for various machine learning and data analysis tasks.

Xiaoye