pandas is a very popular data processing library in Python, providing many convenient data structures and tools to easily handle tabular data, time series data, and more. One of the most basic and commonly used data structures in pandas is the Series. You can think of a Series as a “labeled one-dimensional array”—it not only contains the data itself but also a “label” (referred to as the “index”) that identifies each data point.
一、创建Series¶
To use Series, you first need to import the pandas library, typically abbreviated as import pandas as pd. Here are several common ways to create a Series:
1. Creating from a List (Default Index)¶
If you have a regular Python list, you can directly pass it to pd.Series() to create a Series. For example:
import pandas as pd
# Define a list
data = [10, 20, 30, 40]
# Create a Series (default index is 0,1,2,3)
s = pd.Series(data)
print(s)
Output:
0 10
1 20
2 30
3 40
dtype: int64
The numbers on the left (0, 1, 2, 3) are the Series’ index (default starts at 0), and the right side is the data. To name the Series, use the name parameter:
s = pd.Series(data, name="Scores")
print(s)
The output will display Name: Scores above the index.
2. Creating from a Dictionary (Keys as Index)¶
If the data is in key-value pairs (e.g., a Python dictionary), pd.Series() automatically uses the dictionary’s keys as the index and the values as the data. For example:
data_dict = {"Chinese": 90, "Math": 85, "English": 95, "Physics": 80}
s = pd.Series(data_dict)
print(s)
Output:
Chinese 90
Math 85
English 95
Physics 80
dtype: int64
Here, the index is the dictionary’s keys (Chinese, Math, etc.), and the data is the corresponding values.
3. Creating from a Scalar Value (Repeated Generation)¶
To create a Series where all elements are the same value, pass a scalar and a length parameter. For example, create a Series of length 5 with value 10:
s = pd.Series(10, range(5)) # Use range(5) to specify index 0-4
print(s)
Output:
0 10
1 10
2 10
3 10
4 10
dtype: int64
二、Basic Properties of Series¶
Each Series has several core properties to help you understand its structure:
- values: Retrieve the Series’ data itself (returns a numpy array)
print(s.values) # e.g.: [90 85 95 80]
- index: Retrieve the Series’ index (labels)
print(s.index) # e.g.: Index(['Chinese', 'Math', 'English', 'Physics'], dtype='object')
- name: The name of the Series (if specified during creation)
print(s.name) # e.g.: "Scores"
- shape: Returns the shape (for a 1D Series, shape is (length,))
print(s.shape) # e.g.: (4,) (4 elements)
三、Understanding and Manipulating Indexes¶
The index is the “soul” of a Series, allowing quick data location via labels.
1. Custom Index¶
The default index is 0,1,2…, but you can customize the index (e.g., strings, dates) for readability. For example:
dates = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]
sales = [150, 200, 180, 220, 190]
s = pd.Series(sales, index=dates, name="Sales (yuan)")
print(s)
Output:
Monday 150
Tuesday 200
Wednesday 180
Thursday 220
Friday 190
Name: Sales (yuan), dtype: int64
2. Accessing Data by Index¶
- By Label (loc): Use
locwhen you know the label. For example:
print(s.loc["Wednesday"]) # Output: 180 (retrieve data with label "Wednesday")
- By Position (iloc): Use
ilocwhen you know the position. For example:
print(s.iloc[2]) # Output: 180 (retrieve the 3rd element, index 2)
3. Slicing Operations¶
- Label Slicing (includes end label): Use
loc, e.g.:
print(s.loc["Tuesday":"Thursday"]) # Output: Tuesday:200, Wednesday:180, Thursday:220
- Position Slicing (excludes end position): Use
iloc, e.g.:
print(s.iloc[1:4]) # Output: Tuesday:200, Wednesday:180, Thursday:220
4. Modifying Indexes¶
Indexes cannot be modified individually, but you can replace them entirely:
s.index = ["Mon", "Tue", "Wed", "Thu", "Fri"] # Replace all indexes
print(s)
四、Data Operations and Analysis¶
1. Basic Statistical Methods¶
Series has built-in methods for summing, mean, maximum, etc.:
s = pd.Series([10, 20, 30, 40])
print(s.sum()) # 100 (sum)
print(s.mean()) # 25.0 (mean)
print(s.max()) # 40 (maximum)
2. Conditional Filtering¶
Filter data using boolean conditions:
s = pd.Series([10, 20, 30, 40])
filtered = s[s > 25] # Filter data greater than 25
print(filtered) # Output: 30, 40
五、Practical Case: Comprehensive Exercise¶
Suppose we need to record a store’s weekly foot traffic, stored and analyzed using a Series:
import pandas as pd
# 1. Create a foot traffic Series with date index
days = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
passengers = [500, 480, 520, 600, 550, 700, 650]
s = pd.Series(passengers, index=days, name="Foot Traffic")
# 2. View the data
print("=== Weekly Foot Traffic Data ===")
print(s)
# 3. Access Saturday's foot traffic by label
print("\nSaturday's Foot Traffic:", s.loc["Saturday"]) # Output: 700
# 4. Calculate total and average foot traffic
total = s.sum()
avg = s.mean()
print(f"\nTotal: {total} people, Average: {avg:.1f} people per day")
# 5. Filter days with foot traffic exceeding 600
busy_days = s[s > 600]
print("\nDays with foot traffic > 600:")
print(busy_days)
Output:
=== Weekly Foot Traffic Data ===
Monday 500
Tuesday 480
Wednesday 520
Thursday 600
Friday 550
Saturday 700
Sunday 650
Name: Foot Traffic, dtype: int64
Saturday's Foot Traffic: 700
Total: 4050 people, Average: 578.6 people per day
Days with foot traffic > 600:
Saturday 700
Sunday 650
dtype: int64
六、Summary¶
The Series is the most basic data structure in pandas, and mastering it is key for data processing. Key points:
- Series = Data + Label (Index): Indexes make data easier to manage.
- Flexible Creation: Supports lists, dictionaries, and scalars for diverse data sources.
- Index Access: Use loc (label) and iloc (position) to avoid confusion.
- Built-in Stats & Filtering: Quickly perform initial data analysis with built-in methods.
Practice with different scenarios (e.g., date indexes, string indexes) to familiarize yourself with index operations and data processing logic, enabling mastery of Series usage.