Must - read for Beginners! Basic Operations in pandas: Creating, Viewing, and Modifying Data

import pandas as pd

I. Creating Data: Starting from “Empty” to “Existing”¶

The most core data structures in pandas are Series (one-dimensional data) and DataFrame (two-dimensional tabular data, similar to Excel). Let’s start with the simplest creation methods.

1. Creating a Series (One-Dimensional Data)¶

A Series can be understood as a “labeled list”, where the labels are called “indices”.

Example Code:

# Create a Series with a list (default index: 0,1,2...)
s = pd.Series([10, 20, 30, 40])
print(s)
# Output:
# 0    10
# 1    20
# 2    30
# 3    40
# dtype: int64

# Custom index (e.g., using dates or letters)
s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
print(s)
# Output:
# a    10
# b    20
# c    30
# dtype: int64

2. Creating a DataFrame (Two-Dimensional Tabular Data)¶

A DataFrame is the most commonly used structure, similar to an Excel spreadsheet, composed of multiple Series (each column is a Series).

Example Code:

# Method 1: Create with a dictionary (keys = column names, values = column data)
data = {
    "Name": ["Xiaoming", "Xiaohong", "Xiaogang"],
    "Age": [18, 19, 20],
    "Score": [90, 85, 95]
}
df = pd.DataFrame(data)
print(df)
# Output:
#     Name  Age  Score
# 0  Xiaoming   18     90
# 1  Xiaohong   19     85
# 2  Xiaogang   20     95

# Method 2: Create with a 2D list (column names must be specified)
df = pd.DataFrame([
    ["Xiaoming", 18, 90],
    ["Xiaohong", 19, 85],
    ["Xiaogang", 20, 95]
], columns=["Name", "Age", "Score"])
print(df)  # Output is the same as above

II. Viewing Data: Quickly Understanding “What’s in the Table”¶

After creating data, first use basic methods to check the “appearance” and “information” of the data to avoid errors in subsequent operations.

1. Viewing the First/Last N Rows (Quick Preview)¶

head(n): View the first n rows (default n=5)
tail(n): View the last n rows (default n=5)

Example Code:

# View the first 2 rows
print(df.head(2))
# Output:
#     Name  Age  Score
# 0  Xiaoming   18     90
# 1  Xiaohong   19     85

# View the last 1 row
print(df.tail(1))
# Output:
#     Name  Age  Score
# 2  Xiaogang   20     95

2. Viewing Data Statistics¶

info(): Check data types and non-null value counts (quickly check for missing data)
describe(): Statistical analysis of numerical columns (count, mean, standard deviation, min/max, etc.)
columns: View column names
index: View row indices

Example Code:

# View basic information (data types, non-null values)
print(df.info())
# Output:
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 3 entries, 0 to 2
# Data columns (total 3 columns):
#  #   Column  Non-Null Count  Dtype 
# ---  ------  --------------  ----- 
#  0   Name      3 non-null      object
#  1   Age       3 non-null      int64 
#  2   Score     3 non-null      int64 
# dtypes: int64(2), object(1)
# memory usage: 200.0+ bytes

# View statistical descriptions of numerical columns (only valid for numerical columns)
print(df.describe())
# Output:
#        Age      Score
# count   3.000000   3.000000
# mean   19.000000  90.000000
# std     1.000000   2.581989
# min    18.000000  85.000000
# 25%    18.500000  87.500000
# 50%    19.000000  90.000000
# 75%    19.500000  92.500000
# max    20.000000  95.000000

# View column names and row indices
print("Column names:", df.columns)  # Output: Column names: Index(['Name', 'Age', 'Score'], dtype='object')
print("Row indices:", df.index)    # Output: Row indices: RangeIndex(start=0, stop=3, step=1)

III. Modifying Data: Making Data “Work for Me”¶

Mastering data “add, delete, modify, query” is key. The core is to locate data and modify it. The following are common operations.

1. Modifying Single/Multiple Cell Values¶

Use loc[row_label, column_name] or iloc[row_position, column_position] to locate and assign values.
- loc: Positioning by label (e.g., row labels, column names)
- iloc: Positioning by position (integer starting from 0)

Example Code:

# Modify the "Age" column value in the first row (using loc for label positioning)
df.loc[0, "Age"] = 17
print(df)
# Output:
#     Name  Age  Score
# 0  Xiaoming   17     90
# 1  Xiaohong   19     85
# 2  Xiaogang   20     95

# Modify the "Score" column value in the third row (using iloc for position positioning, row position 2, column position 2)
df.iloc[2, 2] = 96
print(df)
# Output:
#     Name  Age  Score
# 0  Xiaoming   17     90
# 1  Xiaohong   19     85
# 2  Xiaogang   20     96

2. Adding a New Column (Adding New Data Columns)¶

Directly assign values to a new column name, supporting calculations based on existing columns.

Example Code:

# Add a "Class" column (direct assignment)
df["Class"] = "Class 1"
print(df)
# Output:
#     Name  Age  Score    Class
# 0  Xiaoming   17     90  Class 1
# 1  Xiaohong   19     85  Class 1
# 2  Xiaogang   20     96  Class 1

# Add a "Total" column (calculated based on existing numerical columns)
df["Total"] = df["Score"] + 10  # Assuming full score is 100, total = score + 10
print(df)
# Output:
#     Name  Age  Score    Class  Total
# 0  Xiaoming   17     90  Class 1    100
# 1  Xiaohong   19     85  Class 1     95
# 2  Xiaogang   20     96  Class 1    106

3. Deleting a Column (Removing Unneeded Columns)¶

Use the drop() method, axis=1 indicates deleting a column, and inplace=True modifies the original data directly.

Example Code:

# Delete the "Total" column (axis=1 indicates column, inplace=True directly modifies df)
df.drop("Total", axis=1, inplace=True)
print(df)
# Output:
#     Name  Age  Score    Class
# 0  Xiaoming   17     90  Class 1
# 1  Xiaohong   19     85  Class 1
# 2  Xiaogang   20     96  Class 1

4. Modifying Row/Column Indices (Adjusting Labels)¶

Directly assign to index or columns, or use rename() to rename.

Example Code:

# Modify row indices (original indices: 0,1,2 → change to "row1", "row2", "row3")
df.index = ["row1", "row2", "row3"]
print(df)
# Output:
#        Name  Age  Score    Class
# row1  Xiaoming   17     90  Class 1
# row2  Xiaohong   19     85  Class 1
# row3  Xiaogang   20     96  Class 1

# Rename column names (using rename method, inplace=True directly modifies)
df.rename(columns={"Class": "ClassName"}, inplace=True)
print(df)
# Output:
#        Name  Age  Score  ClassName
# row1  Xiaoming   17     90  Class 1
# row2  Xiaohong   19     85  Class 1
# row3  Xiaogang   20     96  Class 1

Summary¶

Through this article, we’ve learned the basic operations of pandas data processing:
- Data Creation: Build 1D/2D data using Series and DataFrame
- Data Viewing: Quickly understand data characteristics via head/tail/info/describe
- Data Modification: Use loc/iloc for positioning and modification, add/delete columns, adjust indices

The core of pandas operations is “locating data”. Remember the difference between loc (label) and iloc (position). With more practice, you’ll master these skills! For more complex operations (e.g., grouping, merging), you can learn further later. Now, go ahead and try it yourself!

I. Creating Data: Starting from “Empty” to “Existing”¶

1. Creating a Series (One-Dimensional Data)¶

2. Creating a DataFrame (Two-Dimensional Tabular Data)¶

II. Viewing Data: Quickly Understanding “What’s in the Table”¶

1. Viewing the First/Last N Rows (Quick Preview)¶

2. Viewing Data Statistics¶

III. Modifying Data: Making Data “Work for Me”¶

1. Modifying Single/Multiple Cell Values¶

2. Adding a New Column (Adding New Data Columns)¶

3. Deleting a Column (Removing Unneeded Columns)¶

4. Modifying Row/Column Indices (Adjusting Labels)¶

Summary¶

Related Articles