In data processing, sorting is one of the most commonly used operations. By sorting, we can quickly find the maximum and minimum values in the data, or organize data according to a specific pattern for subsequent analysis. As a powerful data processing library in Python, pandas provides the sort_values function to implement data sorting. This article will take you step by step from basics to practical applications to master the usage of this function.

一、Why Sorting is Needed?

Imagine you have a student grade sheet containing names, Chinese, Math, and English scores. If you want to know who has the highest Chinese score, checking the raw data directly may require comparing row by row. However, using sort_values to sort by Chinese scores will directly arrange the results from highest to lowest scores, making it一目了然.

二、Basic Understanding of sort_values Function

The core function of sort_values is to sort a DataFrame or Series according to the values of specified columns. Its basic syntax is:

df.sort_values(by, ascending=True, inplace=False, axis=0)

Parameter Explanation:
- by: Required, specifies the column name(s) to sort by.
- ascending: Optional, sorting direction. True (default) is ascending order, False is descending order. If sorting by multiple columns, a list can be passed (e.g., [True, False]).
- inplace: Optional, whether to modify the original DataFrame. False (default) returns a new sorted DataFrame without changing the original data; True directly modifies the original data.
- axis: Optional, sorting direction. 0 (default) sorts by rows, 1 sorts by columns (less commonly used; beginners can temporarily skip the details).

三、Basic Sorting Examples

We first use a simple student grade sheet as sample data:

import pandas as pd

# Create sample data
data = {
    '姓名': ['张三', '李四', '王五', '赵六'],
    '语文': [85, 76, 90, 88],
    '数学': [92, 85, 78, 95],
    '英语': [78, 90, 82, 89]
}
df = pd.DataFrame(data)

1. Single Column Ascending Sort

Sort by “Chinese” scores from smallest to largest (ascending):

sorted_df = df.sort_values(by='语文')  # Ascending order by Chinese scores (default ascending=True)
print(sorted_df)

Output:

   姓名  语文  数学  英语
1  李四  76  85  90
0  张三  85  92  78
3  赵六  88  95  89
2  王五  90  78  82

It can be seen that “李四” (76 points) has the lowest Chinese score and is at the front; “王五” (90 points) has the highest and is at the end.

2. Single Column Descending Sort

Sort by “Math” scores from largest to smallest (descending):

sorted_df = df.sort_values(by='数学', ascending=False)  # ascending=False for descending order
print(sorted_df)

Output:

   姓名  语文  数学  英语
3  赵六  88  95  89
0  张三  85  92  78
1  李四  76  85  90
2  王五  90  78  82

“赵六” (95 points) has the highest Math score and is first; “王五” (78 points) has the lowest and is last.

四、Multi-Column Sorting

When values in one column are the same, sorting can continue based on another column. For example, first sort by “Chinese” scores ascending, and if Chinese scores are the same, then sort by “Math” scores descending:

sorted_df = df.sort_values(by=['语文', '数学'], ascending=[True, False])
print(sorted_df)

Output:

   姓名  语文  数学  英语
1  李四  76  85  90
0  张三  85  92  78
3  赵六  88  95  89
2  王五  90  78  82

Explanation:
- First, sort by “Chinese” ascending: “李四” (76) is first, followed by “张三” (85), “赵六” (88), and “王五” (90).
- When Chinese scores are the same (none in this example), sort by “Math” descending (no effect here as Chinese scores are unique).

五、Modifying Original Data (inplace Parameter)

By default, sort_values does not modify the original DataFrame but returns a new sorted DataFrame. To modify the original data directly, set inplace=True:

df.sort_values(by='英语', ascending=False, inplace=True)  # Directly modify df
print(df)  # Original df has been modified

Note: inplace=True may overwrite the original data. It is recommended to use the default inplace=False to preserve the original data.

六、Practical Case: Sorting by Total Score

To analyze students’ comprehensive performance, calculate a “Total” score first, then sort by total score descending:

# Add a "Total" column
df['总分'] = df['语文'] + df['数学'] + df['英语']

# Sort by total score descending
sorted_df = df.sort_values(by='总分', ascending=False)
print(sorted_df)

Output:

   姓名  语文  数学  英语  总分
3  赵六  88  95  89  272
0  张三  85  92  78  255
1  李四  76  85  90  251
2  王五  90  78  82  250

It can be seen that “赵六” has the highest total score (272), and “王五” has the lowest (250).

七、Summary and Notes

  1. Core Parameters: by specifies the column(s) to sort by, ascending controls the direction, and inplace determines whether to modify the original data.
  2. Multi-Column Sorting: Both by and ascending can accept lists, ensuring the list length matches the number of columns.
  3. Data Safety: Prefer inplace=False (default) to avoid accidental modification of original data.
  4. Advanced Operation: Combine with reset_index(drop=True) to reset the index after sorting (e.g., df.sort_values(...).reset_index(drop=True)).

Through the above examples, you have mastered the basic usage and practical skills of sort_values. Sorting is a fundamental skill in data processing, often relied upon for subsequent analyses (e.g., TopN, group statistics). By practicing different scenarios (single column, multi-column, descending order, inplace), you will become proficient in its application!

Xiaoye