Yeyupiaoling

Implementing the Shell Sort Algorithm with Python

2025-12-09 57 views 排序算法希尔排序 Python排序算法插入排序改进分组排序 Sorting Algorithm

Shell Sort is an improved version of Insertion Sort, which enhances efficiency by "coarsely sorting" and then "finely sorting" through grouping to reduce element intervals. The core involves selecting an initial increment (e.g., half the array length), dividing the array into multiple groups where elements within each group are spaced by the increment, and applying Insertion Sort to each group. The process then repeats with the increment halved until the increment reaches 1, completing the "fine sorting." Its key logic is reducing element movement through grouping: initially grouping with large intervals allows the array to become nearly sorted first, and gradually shrinking the increment ensures the final Insertion Sort phase finishes efficiently. The average time complexity is O(n log n), worst-case O(n²), with a space complexity of O(1). Shell Sort is suitable for arrays of moderate size with uneven element distribution and is an efficient in-place sorting algorithm.

Implementing the Insertion Sort Algorithm with Python

2025-12-09 66 views 排序算法 Python插入排序插入排序算法排序算法实现 Python排序插入排序复杂度

This paper introduces the insertion sort algorithm, whose core idea is to insert elements one by one into a sorted subarray, similar to the ordered insertion when organizing playing cards. The basic approach is: starting from the second element of the array, treat each element as an element to be inserted, compare it with the sorted subarray from the end to the beginning, find the appropriate position, and then insert it, ensuring the subarray remains ordered at all times. Taking the Python implementation as an example, the outer loop traverses the elements to be inserted (starting from index 1), and the inner loop uses a while loop to compare and shift elements backward. A temporary variable 'temp' is used to store the current element, which is finally inserted into the correct position. The code is an in-place sort that only uses one temporary variable, resulting in a space complexity of O(1). Time complexity: Best case (array already sorted) O(n), worst case (reverse order) O(n²); space complexity O(1). It is suitable for small-scale data or nearly sorted data, with simple implementation and stability.

Implementing the QuickSort Algorithm in Python

2025-12-09 58 views 排序算法快速排序 Python算法分治排序分区算法排序复杂度

Quick Sort is based on the "divide and conquer" principle, with the core being selecting a pivot value to partition the array and recursively sorting the subarrays. The basic idea is: select a pivot value (e.g., the first element of the array), partition the array into two parts—elements less than and greater than the pivot—and then recursively process the subarrays. The partitioning process is critical: using left and right pointers to traverse, the right pointer moves left to find elements smaller than the pivot, while the left pointer moves right to find elements larger than the pivot. After swapping these elements, the process continues until the pointers meet. The pivot is then swapped to its final position, completing the partition. In Python implementation, the `partition` function determines the pivot position, and `quick_sort` recursively processes the left and right subarrays. Test code verifies the sorting effect. Complexity: Average case O(n log n) (when partitioning is balanced), worst case O(n²) (e.g., sorted arrays with the pivot chosen as the first element, which can be optimized by randomly selecting the pivot). Quick Sort is an efficient and practical sorting algorithm widely applied in real-world scenarios. Understanding its partitioning logic and recursive process is key to mastering sorting algorithms.

Implementing the Bubble Sort Algorithm with Python

2025-12-09 72 views 排序算法 Python冒泡排序冒泡排序算法 Sorting Algorithm Python排序冒泡排序优化

### Bubble Sort: A Comprehensive Analysis of a Basic Sorting Algorithm Bubble Sort is based on the "bubble rising" principle. Its core idea is to repeatedly compare adjacent elements and swap those in the wrong order, allowing larger elements to gradually "bubble" to the end of the array until the entire array is sorted. The working steps are as follows: traverse the array for multiple rounds, compare and swap adjacent elements with reverse order pairs in each round, and after each round, the largest unsorted element is placed in its correct position; if no swaps occur in a round, the array is already sorted, and the process terminates early. In Python implementation, the outer loop controls the number of sorting rounds (at most n-1 rounds), the inner loop compares adjacent elements and swaps them, and a `swapped` flag is used to optimize the termination condition. The worst-case time complexity is O(n²) (for a completely reversed array), the best-case is O(n) (for a sorted array with optimization), the space complexity is O(1), and it is a stable sorting algorithm. Bubble Sort is simple and intuitive, suitable for small-scale data, and serves as a foundational understanding of sorting concepts. By examining its principles and Python code implementation, one can quickly grasp the core logic of comparing and swapping adjacent elements.

Implementing Radix Sort Algorithm in Java

2025-12-09 55 views 排序算法基数排序 Java排序算法非比较排序 Sorting Algorithm 整数排序

Radix sort is a non-comparison integer sorting algorithm that processes digits from the least significant to the most significant. It distributes each number into "buckets" based on the current digit, then collects them back into the original array in bucket order, repeating until all digits are processed. It is suitable for integers with few digits and has high efficiency. The basic idea is "distribute-collect-repeat": distribute numbers into corresponding buckets by the current digit (units, tens, etc.), collect them back into the array in bucket order, and repeat for all digits. Taking the array [5, 3, 8, 12, 23, 100] as an example, it is sorted after three rounds of processing: units, tens, and hundreds. In Java code, the maximum number determines the highest digit, and `(num / radix) % 10` is used to get the current digit. ArrayLists are used as buckets to implement distribution and collection. The time complexity is O(d(n+k)) (where d is the number of digits of the maximum number and k=10), and the space complexity is O(n+k). This algorithm is stable and suitable for integer sorting. Negative numbers can be separated into positive and negative groups, sorted separately, and then merged.

Implementing Bucket Sort Algorithm in Java

2025-12-09 50 views 排序算法 Java桶排序算法桶排序实现非比较排序排序算法教程桶排序优缺点

Bucket sort is a non-comparison-based sorting algorithm. Its core idea is to distribute data into several "buckets", sort each bucket locally, and then merge the sorted buckets. It is suitable for scenarios where data is uniformly distributed and the range is not large (e.g., integers with a controllable range). The steps are: determine the number and range of buckets (e.g., for integers in the range 0 to max, the number of buckets is max+1), create corresponding bucket containers, traverse the elements to distribute them into the appropriate buckets, sort each bucket internally (e.g., using insertion sort or built-in methods), and finally merge the elements in the order of the buckets. The time complexity is ideally O(n), and the space complexity is O(n). Its advantages include high efficiency when data is uniformly distributed, while its disadvantages are space waste when the data range is large and efficiency degradation when the data distribution is uneven.

Implementing the Counting Sort Algorithm in Java

2025-12-09 54 views 排序算法 Java计数排序 Sorting Algorithm 计数排序实现稳定性排序算法复杂度分析

Counting sort is a simple and intuitive non-comparison sorting algorithm. It determines the position of elements by counting their occurrences and using prefix sums. It is suitable for scenarios where the element range is small (e.g., integers), there are many repeated elements, and stable sorting is required. The core idea is: first, determine the element range (find min and max), count the occurrences of each element, calculate the prefix sum to get the final position of the element, and then traverse the original array from the end to generate the sorted result. Implementation steps: handle edge cases (no sorting needed for empty/single-element arrays), determine min/max, create a count array to tally occurrences, compute prefix sums (accumulate to get the final position of elements), and traverse from the end to generate the result. The time complexity is O(n+k) (where n is the array length and k is the element range), and the space complexity is O(n+k). Applicable scenarios include small integer ranges (e.g., scores, ages), many repeated elements, and the need for stable sorting. This algorithm achieves sorting through counting and accumulation without comparisons, making it suitable for beginners to understand the basic ideas of sorting.

Implementing the Merge Sort Algorithm in Java

2025-12-09 60 views 排序算法归并排序 Java算法分治算法 Sorting Algorithm 时间复杂度

Merge sort is an efficient sorting algorithm based on the divide-and-conquer paradigm, with three core steps: divide, conquer, and merge. It recursively splits the array into single-element subarrays, sorts these subarrays, and finally merges two ordered subarrays into a fully ordered array. In Java implementation, the `mergeSort` method recursively divides the array into left and right halves, sorts each half, and then calls the `merge` method to combine them. The `merge` method uses three pointers to traverse the left and right subarrays, compares elements, and fills the result array, while directly copying remaining elements. Algorithm complexity: Time complexity is O(n log n) (each merge operation takes O(n) time, with log n recursive levels), space complexity is O(n) (requires extra space for storing merged results), and it is a stable sort (relative order of equal elements is preserved). Merge sort has a clear logic and is suitable for large-scale data sorting. It serves as a classic example of divide-and-conquer algorithms, efficiently sorting by recursively splitting and merging ordered subarrays.

Implementing Heap Sort Algorithm in Java

2025-12-09 74 views 排序算法 Java堆排序堆排序算法大顶堆排序 Java排序堆排序实现

Heap sort is an efficient sorting algorithm based on the heap data structure, with a time complexity of O(n log n) and a space complexity of O(1). It is an in-place sorting algorithm suitable for large-scale data. A heap is a special complete binary tree, divided into a max-heap (parent node value is greater than child node values) and a min-heap. Heap sort uses a max-heap. The core idea is: each time, take the maximum value at the top of the heap and place it at the end of the array, then adjust the remaining elements to form a new max-heap, and repeat until the array is sorted. The implementation consists of three steps: constructing a max-heap (starting from the last non-leaf node and using heapify to adjust each node); heap adjustment (recursively adjusting the subtree to maintain the max-heap property); and the sorting process (swapping the top of the heap with the end element, reducing the heap size, and repeating the adjustment). The core function heapify adjusts the subtree to a max-heap by comparing parent and child nodes recursively; buildMaxHeap constructs a complete max-heap starting from the second-to-last node; the main function integrates the above steps to complete the sorting. Heap sort achieves ordering through efficient heap adjustment, is suitable for scenarios with space constraints, and is an efficient choice for sorting large-scale data.

Implementing the Selection Sort Algorithm in Java

2025-12-09 56 views 排序算法 Java选择排序 Sorting Algorithm 选择排序实现数据结构算法复杂度

Selection sort is a simple and intuitive sorting algorithm. Its core idea is to repeatedly select the smallest (or largest) element from the unsorted portion and place it at the end of the sorted portion until the entire array is sorted. The basic approach involves an outer loop to determine the end position of the sorted portion, and an inner loop to find the minimum value in the unsorted portion, followed by swapping this minimum value with the element at the current position of the outer loop. In Java implementation, the `selectionSort` method is implemented with two nested loops: the outer loop iterates through the array (with `i` ranging from 0 to `n-2`), and the inner loop (with `j` ranging from `i+1` to `n-1`) finds the index `minIndex` of the minimum value in the unsorted portion. Finally, the element at position `i` is swapped with the element at `minIndex`. Taking the array `{64, 25, 12, 22, 11}` as an example, the sorted array `[11, 12, 22, 25, 64]` is gradually constructed through each round of swaps. The time complexity is O(n²), making it suitable for small-scale data. This algorithm has a simple logic and easy-to-implement code, serving as a typical example for understanding the basic sorting concepts.

Implementing Shell Sort Algorithm with Java

2025-12-09 52 views 排序算法 Java希尔排序 Sorting Algorithm 希尔排序实现分组插入排序步长gap优化

Shell Sort is an improved version of Insertion Sort that reduces the number of element movements during inversions by grouping elements. The core idea is to introduce a step size (Gap), which divides the array into Gap subsequences. After performing insertion sort on each subsequence, the Gap is gradually reduced to 1 (equivalent to standard Insertion Sort). Algorithm steps: Initialize Gap as half the array length. Perform insertion sort on each subsequence, then reduce the Gap and repeat until Gap becomes 0. In Java implementation, the outer loop controls the Gap to decrease from n/2. The inner loop iterates through elements, using a temporary variable to store the current element, then compares and shifts elements forward to their correct positions to complete insertion. Testing with the array {12, 34, 54, 2, 3} results in the sorted output [2, 3, 12, 34, 54]. By gradually ordering elements through grouping, Shell Sort improves efficiency, and optimizing the step size sequence (e.g., 3k+1) can further enhance performance.

Implementing the Insertion Sort Algorithm in Java

2025-12-09 56 views 排序算法插入排序 Java排序算法 Java代码实现排序算法详解 Insertion Sort

Insertion sort is a simple and intuitive sorting algorithm. Its core idea is to insert unsorted elements one by one into their correct positions in the sorted part, similar to organizing playing cards. It is suitable for small-scale data and has a simple implementation. Basic idea: Starting from the second element, mark the current element as the "element to be inserted". Compare it with the elements in the sorted part from back to front. If the sorted element is larger, shift it backward until the insertion position is found. Repeat this process until all elements are processed. In Java implementation, the element to be inserted needs to be saved, and the insertion is completed by looping through comparisons and shifting elements backward. The time complexity of the algorithm is: best O(n) (when already sorted), worst and average O(n²); space complexity O(1) (in-place sorting); stable sort, suitable for small-scale data or nearly sorted data. Its core lies in "gradual insertion", with simple implementation. Its stability and in-place nature make it perform well in small-scale sorting.

Implementing QuickSort Algorithm in Java

2025-12-09 75 views 排序算法 Java快速排序分治排序算法快速排序实现 Java排序代码算法复杂度分析

QuickSort is based on the divide-and-conquer approach. Its core involves selecting a pivot element to partition the array into elements less than and greater than the pivot, followed by recursively sorting the subarrays. With an average time complexity of O(n log n), it is a commonly used and efficient sorting algorithm. **Basic Steps**: 1. Select a pivot (e.g., the rightmost element). 2. Partition the array based on the pivot. 3. Recursively sort the left and right subarrays. **Partition Logic**: Using the rightmost element as the pivot, define an index `i` to point to the end of the "less than pivot" region. Traverse the array, swapping elements smaller than the pivot into this region. Finally, move the pivot to its correct position. The Java code implements this logic. The time complexity is O(n log n) on average and O(n²) in the worst case, with an average space complexity of O(log n). A notable drawback is that QuickSort is an unstable sort, and its worst-case performance can be poor, so optimizing the pivot selection is crucial to improve performance.

Implementing the Bubble Sort Algorithm in Java

2025-12-09 83 views 排序算法 Java冒泡排序冒泡排序算法 Java排序实现排序算法原理时间空间复杂度

Bubble Sort is a basic sorting algorithm whose core idea is to repeatedly compare adjacent elements and swap their positions, allowing larger elements to "bubble up" to the end of the array (in ascending order). Its sorting process is completed through multiple iterations: each iteration determines the position of the largest element in the current unsorted portion and moves it to the end until the array is sorted. In Java implementation, the outer loop controls the number of sorting rounds (at most n-1 rounds), while the inner loop compares adjacent elements and performs swaps. A key optimization is using a `swapped` flag; if no swaps occur in a round, the algorithm terminates early, reducing the best-case time complexity to O(n). The worst and average-case time complexities are O(n²), with a space complexity of O(1) (in-place sorting). Despite its simple and intuitive principle, which makes it suitable for teaching the core concepts of sorting, bubble sort is inefficient and only applicable for small-scale data or educational scenarios. For large-scale data sorting, more efficient algorithms like Quick Sort are typically used.

Introduction to PyTorch Neural Networks: Fully Connected Layers and Backpropagation Principles

2025-12-09 181 views Pytorch入门教程 PyTorch神经网络入门全连接层原理反向传播教程梯度下降深度学习基础

This paper introduces the basics of PyTorch neural networks, with a core focus on fully connected layers and backpropagation. A fully connected layer enables full connectivity between neurons of the previous layer and the current layer, producing an output calculated as the product of a weight matrix and the input, plus a bias vector. Forward propagation is the forward computation process of data from the input layer through fully connected layers and activation functions to the output layer, for example, in a two - layer network: input → fully connected → ReLU → fully connected → output. Backpropagation is the core of neural network learning, adjusting parameters through gradient descent. Based on the chain rule, it reversely calculates the gradient of the loss with respect to each parameter starting from the output layer. PyTorch's autograd automatically records the computation graph and completes gradient calculation. The process includes forward propagation, loss calculation, backpropagation (loss.backward()), and parameter update (using an optimizer like SGD). Key concepts: Fully connected layers implement feature combination, forward propagation performs forward computation, backpropagation minimizes loss through gradient descent, and automatic differentiation simplifies gradient calculation. Understanding these principles is conducive to model debugging and optimization.

Quick Start with PyTorch: Tensor Dimension Transformation and Common Operations

2025-12-09 99 views Pytorch入门教程 Deep Learning 张量操作 Pytorch入门维度变换张量基础

This article introduces the core knowledge of PyTorch tensors, including basics, dimension transformations, common operations, and exercise suggestions. Tensors are the basic structure for storing data in PyTorch, similar to NumPy arrays, and support GPU acceleration and automatic differentiation. They can be created using `torch.tensor()` from lists/numbers, `torch.from_numpy()` from NumPy arrays, or built-in functions to generate tensors of all zeros, ones, or random values. Dimension transformation is a key operation: `reshape()` flexibly adjusts the shape (keeping the total number of elements unchanged), `squeeze()` removes singleton dimensions, `unsqueeze()` adds singleton dimensions, and `transpose()`/`permute()` swap dimensions. Common operations include basic arithmetic operations, matrix multiplication with `matmul()`, broadcasting (automatic dimension expansion for operations), and aggregation operations such as `sum()`, `mean()`, and `max()`. The article suggests consolidating tensor operations through exercises, such as dimension adjustment, broadcasting mechanisms, and dimension swapping, to master the "shape language" and lay a foundation for subsequent model construction.

PyTorch Basics Tutorial: Practical Data Loading with Dataset and DataLoader

2025-12-09 83 views Pytorch入门教程 PyTorch Dataset DataLoader实战 PyTorch数据加载深度学习教程 TensorDataset

Data loading is a crucial step in machine learning training, and PyTorch's `Dataset` and `DataLoader` are core tools for efficient data management. As an abstract base class for data storage, `Dataset` requires inheriting to implement `__getitem__` (to read a single sample) and `__len__` (to get the total number of samples). Alternatively, `TensorDataset` can be directly used to wrap tensor data. `DataLoader`, on the other hand, handles batch processing and supports parameters such as `batch_size` (batch size), `shuffle` (shuffling order), and `num_workers` (multithreaded loading) to optimize training efficiency. In practice, taking MNIST as an example, image data can be loaded via `torchvision`, and combined with `Dataset` and `DataLoader` to achieve efficient iteration. It should be noted that under Windows, `num_workers` is defaulted to 0 to avoid memory issues. During training, `shuffle=True` should be used to shuffle the data, while `shuffle=False` is set for the validation/test sets to ensure reproducibility. Key steps: 1. Define a `Dataset` to store data; 2. Create a `DataLoader` with specified parameters; 3. Iterate over the `DataLoader` to input data into the model for training. These two components are the cornerstones of data processing. Once mastered, they can be flexibly applied to various data loading requirements.

Playing with PyTorch from Scratch: Data Visualization and Model Evaluation Techniques

2025-12-09 64 views Pytorch入门教程 Pytorch数据可视化 Pytorch模型评估 TensorBoard实战 Matplotlib教程 MNIST可视化

This article introduces core skills of data visualization and model evaluation in PyTorch to facilitate efficient model debugging. For data visualization, Matplotlib can observe data distributions (e.g., histograms of MNIST samples and labels), and TensorBoard can monitor training processes (e.g., scalar changes, model structures). In model evaluation, classification tasks should focus on accuracy and confusion matrices (e.g., MNIST classification example), while regression tasks use MSE and MAE. In practice, using visualization to identify issues (e.g., confusion between "8" and "9") enables iterative model optimization. Advanced applications include GAN visualization and real-time metric calculation. Mastering these skills allows quick problem localization and data understanding, laying a foundation for developing complex models.

PyTorch Beginner's Guide: Understanding Model Construction with Simple Examples

2025-12-09 93 views Pytorch入门教程 Pytorch入门 Linear Regression Deep Learning Python机器学习自动求导

This PyTorch beginner's tutorial covers core knowledge points: PyTorch is Python-based with obvious advantages in dynamic computation graphs and simple installation (`pip install torch`). The core data structure is the Tensor, which supports GPU acceleration, and can be created, manipulated (addition, subtraction, multiplication, division, matrix multiplication), and converted to/from NumPy. Automatic differentiation (autograd) is implemented via `requires_grad=True` for gradient calculation, e.g., the derivative of \( y = x^2 + 3x \) at \( x = 2 \) is 7. A linear regression model inherits `nn.Module` for definition, with forward propagation implementing \( y = wx + b \). For data preparation, simulated data (\( y = 2x + 3 + \text{noise} \)) is generated, and batched loaded using `TensorDataset` and `DataLoader`. Training uses MSE loss and SGD optimizer, with gradient zeroing, backpropagation, and parameter updates in the loop. After 1000 epochs, results are validated and visualized, with learned parameters close to the true values. The core process covers tensor operations, automatic differentiation, model construction, data loading, and training optimization, enabling scalability to complex models.

Beginner-Friendly: Basics of PyTorch Loss Functions and Training Loops

2025-12-09 86 views Pytorch入门教程 PyTorch损失函数训练循环 MSE损失 CrossEntropy损失深度学习入门

This article introduces the roles and implementation of loss functions and training loops in machine learning. Loss functions measure the gap between model predictions and true labels, while training loops adjust parameters to minimize loss for model learning. Common loss functions include: Mean Squared Error (MSE) for regression tasks (e.g., housing price prediction), accessible via `nn.MSELoss()` in PyTorch, and Cross-Entropy Loss for classification tasks (e.g., cat-dog recognition), accessible via `nn.CrossEntropyLoss()`. The core four steps of a training loop are: forward propagation (model prediction) → loss calculation → backpropagation (gradient computation) → parameter update (optimizer adjustment). It is critical to zero out gradients before backpropagation. Using linear regression as an example, the article generates simulated data, defines a linear model, trains it with MSE loss and the Adam optimizer, and iteratively optimizes parameters. Key considerations include: gradient zeroing, switching between training/inference modes, optimizer selection (e.g., Adam), and batch training with DataLoader. Mastering these concepts enables models to learn patterns from data, laying the foundation for complex models.

Introduction to PyTorch Optimizers: Practical Implementation of Optimization Algorithms like SGD and Adam

2025-12-09 176 views Pytorch入门教程 PyTorch优化器 SGD优化器 Adam优化器深度学习优化模型训练技巧

### Optimizers: The "Navigation System" for Deep Learning Optimizers are core tools in deep learning for updating model parameters and minimizing loss functions, similar to a navigation system when climbing a mountain, guiding the model from "high-loss" peaks to "low-loss" valleys. Their core task is to adjust parameters to improve the model's performance on training data. Different optimizers are designed for distinct scenarios: The basic SGD (Stochastic Gradient Descent) is simple but converges slowly and requires manual hyperparameter tuning; SGD+Momentum incorporates "inertia" to accelerate convergence; Adam combines momentum and adaptive learning rates, performing exceptionally well with default parameters and being the first choice for most tasks; AdamW adds weight decay (L2 regularization) to Adam, effectively preventing overfitting. PyTorch's `torch.optim` module provides various optimizers: SGD is suitable for simple models, SGD+Momentum accelerates models with fluctuations (e.g., RNNs), Adam adapts to most tasks (e.g., CNNs, Transformers), and AdamW is ideal for small datasets or complex models. In practical tasks, comparing linear regression (e.g., `y=2x+3`), Adam converges faster with smoother loss and parameters closer to the true values, while SGD is prone to oscillations. Beginners are advised to prioritize Adam, and if parameter control is required... (Note: The original text cuts off here, so the translation concludes at the available content.)

Learning PyTorch from Scratch: A Basic Explanation of Activation Functions and Convolutional Layers

2025-12-09 106 views Pytorch入门教程 PyTorch激活函数卷积层基础 ReLU Sigmoid Tanh CNN卷积操作 PyTorch入门教程

### Overview of Activation Functions and Convolutional Layers **Activation Functions**: Neural networks require non-linear transformations to fit complex relationships, and activation functions introduce this non-linearity. Common functions include: - **ReLU**: `y = max(0, x)`, simple computation, solves the vanishing gradient problem, and is the most widely used (PyTorch: `nn.ReLU()`). - **Sigmoid**: `y = 1/(1+exp(-x))`, outputs in (0,1) for binary classification but suffers from vanishing gradients (PyTorch: `nn.Sigmoid()`). - **Tanh**: `y=(exp(x)-exp(-x))/(exp(x)+exp(-x))`, outputs in (-1,1) with a mean of 0, easier to train but still prone to vanishing gradients (PyTorch: `nn.Tanh()`). **Convolutional Layers**: A core component of CNNs, convolutional layers extract local features via convolution kernels. Key concepts include: input (e.g., RGB images with shape `(batch, in_channels, H, W)`), convolution kernel (small matrix), stride (number of pixels the kernel slides), and padding (edge zero-padding to control output size). Implemented in PyTorch via `nn.Conv2d`, critical parameters include `in_channels` (input

Beginner's Guide to PyTorch: A Practical Tutorial on Data Loading and Preprocessing

2025-12-09 105 views Pytorch入门教程 PyTorch数据加载数据预处理深度学习实战 Dataset DataLoader Transforms教程

Data loading and preprocessing are crucial foundations for training deep learning models, and PyTorch efficiently implements this through tools like `Dataset`, `DataLoader`, and `transforms`. As a data container, `Dataset` defines how samples are retrieved—for example, built-in datasets such as MNIST in `torchvision.datasets` can be used directly, while custom datasets require implementing `__getitem__` and `__len__`. `DataLoader` handles batch loading, with core parameters including `batch_size`, `shuffle` (set to `True` during training), and `num_workers` (for multi-threaded acceleration). Data preprocessing is achieved via `transforms`, such as `ToTensor` for converting to tensors, `Normalize` for normalization, and data augmentation techniques like `RandomCrop` (used only for the training set). `Compose` allows combining multiple transformations. For practical implementation using MNIST as an example, the full workflow involves defining preprocessing steps, loading the dataset, and creating a `DataLoader`. Key considerations include normalization parameters, applying data augmentation only to the training set, and setting `num_workers=0` under Windows to avoid multi-thread errors. Mastering these skills enables efficient data handling and lays the groundwork for model training.

Mastering PyTorch Basics: A Detailed Explanation of Tensor Operations and Automatic Differentiation

2025-12-09 93 views Pytorch入门教程 Pytorch张量操作 Pytorch自动求导 PyTorch基础张量创建与操作神经网络反向传播

This article introduces the basics of Tensors in PyTorch. Tensors are the fundamental units for storing and manipulating data, similar to NumPy arrays but with GPU acceleration support, making them a core structure of neural networks. Creation methods include converting from lists/NumPy arrays (`torch.tensor()`/`as_tensor()`) and using constructors like `zeros()`/`ones()`/`rand()`. Key attributes include shape (`.shape`/`.size()`), data type (`.dtype`), and device (`.device`), which can be converted via `.to()`. Major operations cover arithmetic (addition, subtraction, multiplication, division, matrix multiplication), indexing/slicing, reshaping (`reshape()`/`squeeze()`/`unsqueeze()`), and concatenation/splitting (`cat()`/`stack()`/`split()`). Autograd is central: `requires_grad=True` enables gradient tracking, `backward()` computes gradients, and `grad` retrieves them. Important considerations include handling gradients of non-leaf nodes, gradient accumulation, and `detach()` for tensor separation. Mastering tensor operations and autograd is foundational for neural network learning.

Beginner's Guide to PyTorch: Build Your First Neural Network Model Step by Step

2025-12-09 94 views Pytorch入门教程 PyTorch入门教程神经网络模型 MNIST手写数字识别全连接网络 Deep Learning

This article is an introductory PyTorch tutorial that explains core operations by building a fully connected neural network (MLP) model based on the MNIST dataset. First, install PyTorch (CPU/GPU version), load the MNIST dataset using torchvision, convert it to tensors with ToTensor, normalize with Normalize, and then use DataLoader for batch processing (batch_size=64). The model is defined as an MLP with an input layer of 784 (flattened 28×28 images), a hidden layer of 128 (ReLU activation), and an output layer of 10 (Softmax), implemented by inheriting nn.Module for forward propagation. CrossEntropyLoss is chosen as the loss function, and SGD with lr=0.01 is used as the optimizer. The model is trained for 5 epochs, with forward propagation, loss calculation, backpropagation, and parameter updates executed cyclically, printing the loss every 100 batches. During testing, the model is set to eval mode, gradient computation is disabled, and the accuracy on the test set is calculated. The tutorial also suggests extension directions, such as adjusting the network structure, replacing optimizers, or changing datasets.