pandas is a very popular data processing library in Python, providing many convenient data structures and tools to easily handle tabular data, time series data, and more. One of the most basic and commonly used data structures in pandas is the Series. You can think of a Series as a “labeled one-dimensional array”—it not only contains the data itself but also a “label” (referred to as the “index”) that identifies each data point.

一、创建Series

To use Series, you first need to import the pandas library, typically abbreviated as import pandas as pd. Here are several common ways to create a Series:

1. Creating from a List (Default Index)

If you have a regular Python list, you can directly pass it to pd.Series() to create a Series. For example:

import pandas as pd

# Define a list
data = [10, 20, 30, 40]
# Create a Series (default index is 0,1,2,3)
s = pd.Series(data)
print(s)

Output:

0    10
1    20
2    30
3    40
dtype: int64

The numbers on the left (0, 1, 2, 3) are the Series’ index (default starts at 0), and the right side is the data. To name the Series, use the name parameter:

s = pd.Series(data, name="Scores")
print(s)

The output will display Name: Scores above the index.

2. Creating from a Dictionary (Keys as Index)

If the data is in key-value pairs (e.g., a Python dictionary), pd.Series() automatically uses the dictionary’s keys as the index and the values as the data. For example:

data_dict = {"Chinese": 90, "Math": 85, "English": 95, "Physics": 80}
s = pd.Series(data_dict)
print(s)

Output:

Chinese    90
Math       85
English    95
Physics    80
dtype: int64

Here, the index is the dictionary’s keys (Chinese, Math, etc.), and the data is the corresponding values.

3. Creating from a Scalar Value (Repeated Generation)

To create a Series where all elements are the same value, pass a scalar and a length parameter. For example, create a Series of length 5 with value 10:

s = pd.Series(10, range(5))  # Use range(5) to specify index 0-4
print(s)

Output:

0    10
1    10
2    10
3    10
4    10
dtype: int64

二、Basic Properties of Series

Each Series has several core properties to help you understand its structure:
- values: Retrieve the Series’ data itself (returns a numpy array)

  print(s.values)  # e.g.: [90 85 95 80]
  • index: Retrieve the Series’ index (labels)
  print(s.index)  # e.g.: Index(['Chinese', 'Math', 'English', 'Physics'], dtype='object')
  • name: The name of the Series (if specified during creation)
  print(s.name)  # e.g.: "Scores"
  • shape: Returns the shape (for a 1D Series, shape is (length,))
  print(s.shape)  # e.g.: (4,) (4 elements)

三、Understanding and Manipulating Indexes

The index is the “soul” of a Series, allowing quick data location via labels.

1. Custom Index

The default index is 0,1,2…, but you can customize the index (e.g., strings, dates) for readability. For example:

dates = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]
sales = [150, 200, 180, 220, 190]
s = pd.Series(sales, index=dates, name="Sales (yuan)")
print(s)

Output:

Monday      150
Tuesday     200
Wednesday   180
Thursday    220
Friday      190
Name: Sales (yuan), dtype: int64

2. Accessing Data by Index

  • By Label (loc): Use loc when you know the label. For example:
  print(s.loc["Wednesday"])  # Output: 180 (retrieve data with label "Wednesday")
  • By Position (iloc): Use iloc when you know the position. For example:
  print(s.iloc[2])  # Output: 180 (retrieve the 3rd element, index 2)

3. Slicing Operations

  • Label Slicing (includes end label): Use loc, e.g.:
  print(s.loc["Tuesday":"Thursday"])  # Output: Tuesday:200, Wednesday:180, Thursday:220
  • Position Slicing (excludes end position): Use iloc, e.g.:
  print(s.iloc[1:4])  # Output: Tuesday:200, Wednesday:180, Thursday:220

4. Modifying Indexes

Indexes cannot be modified individually, but you can replace them entirely:

s.index = ["Mon", "Tue", "Wed", "Thu", "Fri"]  # Replace all indexes
print(s)

四、Data Operations and Analysis

1. Basic Statistical Methods

Series has built-in methods for summing, mean, maximum, etc.:

s = pd.Series([10, 20, 30, 40])
print(s.sum())   # 100 (sum)
print(s.mean())  # 25.0 (mean)
print(s.max())   # 40 (maximum)

2. Conditional Filtering

Filter data using boolean conditions:

s = pd.Series([10, 20, 30, 40])
filtered = s[s > 25]  # Filter data greater than 25
print(filtered)  # Output: 30, 40

五、Practical Case: Comprehensive Exercise

Suppose we need to record a store’s weekly foot traffic, stored and analyzed using a Series:

import pandas as pd

# 1. Create a foot traffic Series with date index
days = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
passengers = [500, 480, 520, 600, 550, 700, 650]
s = pd.Series(passengers, index=days, name="Foot Traffic")

# 2. View the data
print("=== Weekly Foot Traffic Data ===")
print(s)

# 3. Access Saturday's foot traffic by label
print("\nSaturday's Foot Traffic:", s.loc["Saturday"])  # Output: 700

# 4. Calculate total and average foot traffic
total = s.sum()
avg = s.mean()
print(f"\nTotal: {total} people, Average: {avg:.1f} people per day")

# 5. Filter days with foot traffic exceeding 600
busy_days = s[s > 600]
print("\nDays with foot traffic > 600:")
print(busy_days)

Output:

=== Weekly Foot Traffic Data ===
Monday      500
Tuesday     480
Wednesday   520
Thursday    600
Friday      550
Saturday    700
Sunday      650
Name: Foot Traffic, dtype: int64

Saturday's Foot Traffic: 700

Total: 4050 people, Average: 578.6 people per day

Days with foot traffic > 600:
Saturday    700
Sunday      650
dtype: int64

六、Summary

The Series is the most basic data structure in pandas, and mastering it is key for data processing. Key points:
- Series = Data + Label (Index): Indexes make data easier to manage.
- Flexible Creation: Supports lists, dictionaries, and scalars for diverse data sources.
- Index Access: Use loc (label) and iloc (position) to avoid confusion.
- Built-in Stats & Filtering: Quickly perform initial data analysis with built-in methods.

Practice with different scenarios (e.g., date indexes, string indexes) to familiarize yourself with index operations and data processing logic, enabling mastery of Series usage.

Xiaoye