import pandas as pd
I. Creating Data: Starting from “Empty” to “Existing”¶
The most core data structures in pandas are Series (one-dimensional data) and DataFrame (two-dimensional tabular data, similar to Excel). Let’s start with the simplest creation methods.
1. Creating a Series (One-Dimensional Data)¶
A Series can be understood as a “labeled list”, where the labels are called “indices”.
Example Code:
# Create a Series with a list (default index: 0,1,2...)
s = pd.Series([10, 20, 30, 40])
print(s)
# Output:
# 0 10
# 1 20
# 2 30
# 3 40
# dtype: int64
# Custom index (e.g., using dates or letters)
s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
print(s)
# Output:
# a 10
# b 20
# c 30
# dtype: int64
2. Creating a DataFrame (Two-Dimensional Tabular Data)¶
A DataFrame is the most commonly used structure, similar to an Excel spreadsheet, composed of multiple Series (each column is a Series).
Example Code:
# Method 1: Create with a dictionary (keys = column names, values = column data)
data = {
"Name": ["Xiaoming", "Xiaohong", "Xiaogang"],
"Age": [18, 19, 20],
"Score": [90, 85, 95]
}
df = pd.DataFrame(data)
print(df)
# Output:
# Name Age Score
# 0 Xiaoming 18 90
# 1 Xiaohong 19 85
# 2 Xiaogang 20 95
# Method 2: Create with a 2D list (column names must be specified)
df = pd.DataFrame([
["Xiaoming", 18, 90],
["Xiaohong", 19, 85],
["Xiaogang", 20, 95]
], columns=["Name", "Age", "Score"])
print(df) # Output is the same as above
II. Viewing Data: Quickly Understanding “What’s in the Table”¶
After creating data, first use basic methods to check the “appearance” and “information” of the data to avoid errors in subsequent operations.
1. Viewing the First/Last N Rows (Quick Preview)¶
head(n): View the first n rows (default n=5)tail(n): View the last n rows (default n=5)
Example Code:
# View the first 2 rows
print(df.head(2))
# Output:
# Name Age Score
# 0 Xiaoming 18 90
# 1 Xiaohong 19 85
# View the last 1 row
print(df.tail(1))
# Output:
# Name Age Score
# 2 Xiaogang 20 95
2. Viewing Data Statistics¶
info(): Check data types and non-null value counts (quickly check for missing data)describe(): Statistical analysis of numerical columns (count, mean, standard deviation, min/max, etc.)columns: View column namesindex: View row indices
Example Code:
# View basic information (data types, non-null values)
print(df.info())
# Output:
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 3 entries, 0 to 2
# Data columns (total 3 columns):
# # Column Non-Null Count Dtype
# --- ------ -------------- -----
# 0 Name 3 non-null object
# 1 Age 3 non-null int64
# 2 Score 3 non-null int64
# dtypes: int64(2), object(1)
# memory usage: 200.0+ bytes
# View statistical descriptions of numerical columns (only valid for numerical columns)
print(df.describe())
# Output:
# Age Score
# count 3.000000 3.000000
# mean 19.000000 90.000000
# std 1.000000 2.581989
# min 18.000000 85.000000
# 25% 18.500000 87.500000
# 50% 19.000000 90.000000
# 75% 19.500000 92.500000
# max 20.000000 95.000000
# View column names and row indices
print("Column names:", df.columns) # Output: Column names: Index(['Name', 'Age', 'Score'], dtype='object')
print("Row indices:", df.index) # Output: Row indices: RangeIndex(start=0, stop=3, step=1)
III. Modifying Data: Making Data “Work for Me”¶
Mastering data “add, delete, modify, query” is key. The core is to locate data and modify it. The following are common operations.
1. Modifying Single/Multiple Cell Values¶
Use loc[row_label, column_name] or iloc[row_position, column_position] to locate and assign values.
- loc: Positioning by label (e.g., row labels, column names)
- iloc: Positioning by position (integer starting from 0)
Example Code:
# Modify the "Age" column value in the first row (using loc for label positioning)
df.loc[0, "Age"] = 17
print(df)
# Output:
# Name Age Score
# 0 Xiaoming 17 90
# 1 Xiaohong 19 85
# 2 Xiaogang 20 95
# Modify the "Score" column value in the third row (using iloc for position positioning, row position 2, column position 2)
df.iloc[2, 2] = 96
print(df)
# Output:
# Name Age Score
# 0 Xiaoming 17 90
# 1 Xiaohong 19 85
# 2 Xiaogang 20 96
2. Adding a New Column (Adding New Data Columns)¶
Directly assign values to a new column name, supporting calculations based on existing columns.
Example Code:
# Add a "Class" column (direct assignment)
df["Class"] = "Class 1"
print(df)
# Output:
# Name Age Score Class
# 0 Xiaoming 17 90 Class 1
# 1 Xiaohong 19 85 Class 1
# 2 Xiaogang 20 96 Class 1
# Add a "Total" column (calculated based on existing numerical columns)
df["Total"] = df["Score"] + 10 # Assuming full score is 100, total = score + 10
print(df)
# Output:
# Name Age Score Class Total
# 0 Xiaoming 17 90 Class 1 100
# 1 Xiaohong 19 85 Class 1 95
# 2 Xiaogang 20 96 Class 1 106
3. Deleting a Column (Removing Unneeded Columns)¶
Use the drop() method, axis=1 indicates deleting a column, and inplace=True modifies the original data directly.
Example Code:
# Delete the "Total" column (axis=1 indicates column, inplace=True directly modifies df)
df.drop("Total", axis=1, inplace=True)
print(df)
# Output:
# Name Age Score Class
# 0 Xiaoming 17 90 Class 1
# 1 Xiaohong 19 85 Class 1
# 2 Xiaogang 20 96 Class 1
4. Modifying Row/Column Indices (Adjusting Labels)¶
Directly assign to index or columns, or use rename() to rename.
Example Code:
# Modify row indices (original indices: 0,1,2 → change to "row1", "row2", "row3")
df.index = ["row1", "row2", "row3"]
print(df)
# Output:
# Name Age Score Class
# row1 Xiaoming 17 90 Class 1
# row2 Xiaohong 19 85 Class 1
# row3 Xiaogang 20 96 Class 1
# Rename column names (using rename method, inplace=True directly modifies)
df.rename(columns={"Class": "ClassName"}, inplace=True)
print(df)
# Output:
# Name Age Score ClassName
# row1 Xiaoming 17 90 Class 1
# row2 Xiaohong 19 85 Class 1
# row3 Xiaogang 20 96 Class 1
Summary¶
Through this article, we’ve learned the basic operations of pandas data processing:
- Data Creation: Build 1D/2D data using Series and DataFrame
- Data Viewing: Quickly understand data characteristics via head/tail/info/describe
- Data Modification: Use loc/iloc for positioning and modification, add/delete columns, adjust indices
The core of pandas operations is “locating data”. Remember the difference between loc (label) and iloc (position). With more practice, you’ll master these skills! For more complex operations (e.g., grouping, merging), you can learn further later. Now, go ahead and try it yourself!