What is pandas Index?¶
In pandas, an Index is like the “row numbers” and “column headers” in an Excel spreadsheet—it is crucial for identifying the position and content of data. For example, in a table of student information, “Student ID” or “Name” on the left acts as the row index, while “Subject” and “Score” at the top are column indexes. It functions as a “data ID card,” enabling quick data location, sorting, and management.
Why are Indexes Important?¶
- Quick Data Location: Directly access rows/columns via the index instead of searching through all data.
- Foundation for Sorting: Sorting requires reordering data based on indexes or values.
- Basis for Data Merging: Indexes act as “keys” to match data when combining different DataFrames.
I. Data Sorting: Reordering by Index or Values¶
In pandas, sorting is divided into two types: sorting by index (default position-based) and sorting by value (content-based).
1. Series Sorting (Single-Column Data)¶
Suppose we have a Series recording product sales, where the index is the product name and the value is the sales volume:
import pandas as pd
# Create a sample Series: index = product names, values = sales volume
s = pd.Series([50, 30, 80, 20], index=['苹果', '香蕉', '橙子', '草莓'])
print("Original Series:")
print(s)
Output:
Original Series:
苹果 50
香蕉 30
橙子 80
草莓 20
dtype: int64
(1) Sorting by Index (sort_index)¶
Sort by product name (index) in ascending/descending order:
# Sort by index in ascending order (default: ascending=True)
sorted_by_index_asc = s.sort_index()
print("\nSorted by index (ascending):")
print(sorted_by_index_asc)
# Sort by index in descending order (set ascending=False)
sorted_by_index_desc = s.sort_index(ascending=False)
print("\nSorted by index (descending):")
print(sorted_by_index_desc)
Output:
Sorted by index (ascending):
草莓 20
苹果 50
香蕉 30
橙子 80
dtype: int64
Sorted by index (descending):
橙子 80
香蕉 30
苹果 50
草莓 20
dtype: int64
(2) Sorting by Value (sort_values)¶
Sort by sales volume (value) in ascending/descending order:
# Sort by value in ascending order (default: ascending=True)
sorted_by_value_asc = s.sort_values()
print("\nSorted by value (ascending):")
print(sorted_by_value_asc)
# Sort by value in descending order (set ascending=False)
sorted_by_value_desc = s.sort_values(ascending=False)
print("\nSorted by value (descending):")
print(sorted_by_value_desc)
Output:
Sorted by value (ascending):
草莓 20
香蕉 30
苹果 50
橙子 80
dtype: int64
Sorted by value (descending):
橙子 80
苹果 50
香蕉 30
草莓 20
dtype: int64
2. DataFrame Sorting (Multi-Column Data)¶
A DataFrame is a 2D table. Sorting can be done by row index (default) or column values (specified column).
Suppose we have a student grade table:
# Create a sample DataFrame: index = student names, columns = subjects and scores
df = pd.DataFrame({
'语文': [85, 78, 92],
'数学': [90, 82, 88],
'英语': [75, 90, 85]
}, index=['小明', '小红', '小刚'])
print("Original DataFrame:")
print(df)
Output:
Original DataFrame:
语文 数学 英语
小明 85 90 75
小红 78 82 90
小刚 92 88 85
(1) Sorting by Column Values (sort_values)¶
For example, sort by “语文” (Chinese) scores in ascending order:
# Sort by the "语文" column in ascending order
df_sorted = df.sort_values(by='语文')
print("\nSorted by 语文 (ascending):")
print(df_sorted)
Output:
Sorted by 语文 (ascending):
语文 数学 英语
小红 78 82 90
小明 85 90 75
小刚 92 88 85
(2) Sorting by Row Index (sort_index)¶
Sort by row index (student names) in alphabetical order (Chinese pinyin order):
# Sort by row index in descending order (Chinese pinyin order: 小刚 > 小红 > 小明)
df_sorted_index = df.sort_index(ascending=False)
print("\nSorted by index (descending):")
print(df_sorted_index)
Output:
Sorted by index (descending):
语文 数学 英语
小刚 92 88 85
小明 85 90 75
小红 78 82 90
II. Renaming Indexes: Modifying Row/Column Labels¶
Default indexes (e.g., 0,1,2 or Chinese names) may be unclear. We can manually rename row or column indexes.
1. Renaming Row Indexes (rename method)¶
Suppose we want to simplify student names to “Group 1”, “Group 2”, etc.:
# Rename row indexes (original: 小明, 小红, 小刚 → new: 第一组, 第二组, 第三组)
df_renamed = df.rename(index={'小明': '第一组', '小红': '第二组', '小刚': '第三组'})
print("\nAfter renaming row indexes:")
print(df_renamed)
Output:
After renaming row indexes:
语文 数学 英语
第一组 85 90 75
第二组 78 82 90
第三组 92 88 85
2. Renaming Column Indexes (rename method)¶
To rename column names (e.g., “语文” → “Chinese”, “数学” → “Math”):
# Rename column indexes (original: 语文, 数学, 英语 → new: Chinese, Math, English)
df_renamed_cols = df.rename(columns={'语文': 'Chinese', '数学': 'Math', '英语': 'English'})
print("\nAfter renaming column indexes:")
print(df_renamed_cols)
Output:
After renaming column indexes:
Chinese Math English
小明 85 90 75
小红 78 82 90
小刚 92 88 85
3. Directly Modifying Indexes (inplace parameter)¶
To modify the original data directly (instead of creating a new DataFrame), use inplace=True:
# Directly modify column names (ensure new column names match the number of original columns)
df.columns = ['语文', '数学', '英语'] # Overwrite column names
print("\nAfter directly modifying column names:")
print(df)
# Directly modify row indexes
df.index = [1, 2, 3] # Overwrite row indexes
print("\nAfter directly modifying row indexes:")
print(df)
III. Notes¶
- Distinguish Row/Column Indexes:
df.indexis the row index,df.columnsis the column index. Specify which to sort/rename. - Index Length Consistency: The new index list must match the original number of rows/columns; otherwise, an error occurs.
inplaceParameter: Defaults toinplace=False(generates a new DataFrame). Set toTrueto modify the original data (use with caution!).
Summary¶
Indexes are a core tool in pandas data processing. Mastering sorting (by index/value) and renaming (rows/columns) will help you manage data efficiently. Practice with simple examples (like the student grade table above) to quickly become familiar with these operations!
Now open your Python editor, create your own DataFrame, and try sorting and renaming!