Parallelism (2) – Pipeline, Tensor

Posted on 2025-11-282025-11-27 by lechuck park

Parallelism (2) – Pipeline vs Tensor Parallelism

This image compares two parallel processing techniques: Pipeline Parallelism and Tensor Parallelism.

Pipeline Parallelism

Core Concept:

Sequential work is divided into multiple stages
Each GPU is responsible for a specific task (a → b → c)

Characteristics:

Axis: Depth-wise – splits by layers
Pattern: Pipeline/conveyor belt with micro-batches
Communication: Only at stage boundaries
Cost: Bubbles (idle time), requires pipeline tuning

How it works: Data flows sequentially like waves, with each GPU processing its assigned stage before passing to the next GPU.

Tensor Parallelism

Core Concept:

Matrix pool is prepared and split in advance
All GPUs simultaneously process different parts of the same data

Characteristics:

Axis: Width-wise – splits inside layers
Pattern: Width-wise sharding – splits matrix/attention across GPUs
Communication: Occurs at every Transformer layer (forward/backward)
Cost: High communication overhead, requires strong NVLink/NVSwitch

How it works: Large matrices are divided into chunks, with each GPU processing simultaneously while continuously communicating via NVLink/NVSwitch.

Key Differences

Aspect	Pipeline	Tensor
Split Method	Layer-wise (vertical)	Within-layer (horizontal)
GPU Role	Different tasks	Parts of same task
Communication	Low (stage boundaries)	High (every layer)
Hardware Needs	Standard	High-speed interconnect required

Summary

Pipeline Parallelism splits models vertically by layers with sequential processing and low communication cost, while Tensor Parallelism splits horizontally within layers for parallel processing but requires high-speed interconnects. These two techniques are often combined in training large-scale AI models to maximize efficiency.

#ParallelComputing #DistributedTraining #DeepLearning #GPUOptimization #MachineLearning #ModelParallelism #AIInfrastructure #NeuralNetworks #ScalableAI #HPC

With Claude

3 Computing in AI

Posted on 2025-07-18 by lechuck park

AI Computing Architecture

3 Processing Types

1. Sequential Processing

Hardware: General CPU (Intel/ARM)
Function: Control flow, I/O, scheduling, Data preparation
Workload Share: Training 5%, Inference 5%

2. Parallel Stream Processing

Hardware: CUDA core (Stream process)
Function: FP32/FP16 Vector/Scalar, memory management
Workload Share: Training 10%, Inference 30%

3. Matrix Processing

Hardware: Tensor core (Matrix core)
Function: Mixed-precision (FP8/FP16) MMA, Sparse matrix operations
Workload Share: Training 85%+, Inference 65%+

Key Insight

The majority of AI workloads are concentrated in matrix processing because matrix multiplication is the core operation in deep learning. Tensor cores are the key component for AI performance improvement.

With Claude

CPU & GPU Works

Posted on 2024-06-04 by lechuck park

From Claude with some prompting
This image explains the working principles of CPU (Central Processing Unit) and GPU (Graphics Processing Unit) in a visual manner.

Data Types:
- Scalar: A single value
- Vector: One-dimensional array
- Matrix: Two-dimensional array
- Tensor: Multi-dimensional array
CPU Work Method:
- Sequential processing, denoted by ’01’
- Tasks are processed in order, as shown by 1, 2, 3, 4, 5
- Primarily handles scalar data, processing complex tasks sequentially
GPU Work Method:
- Parallel processing, represented by a matrix
- Icons show multiple tasks being processed simultaneously
- Mainly deals with multi-dimensional data like matrices or tensors, processing many tasks in parallel

The image demonstrates that while CPUs process tasks sequentially, GPUs can handle many tasks simultaneously in parallel. This helps explain which processing unit is more efficient based on the complexity and volume of data. Complex and large-scale data (matrices, tensors) are better suited for GPUs, while simple, sequential tasks are more appropriate for CPUs.

Scalar, Vector, Matrix, Tensor

Posted on 2024-01-17 by lechuck park

From DALL-E with some prompting
This image appears to be a diagram explaining the concepts of scalar, vector, matrix, and tensor in the context of dimensions and data structures:

Scalar: Represented as a zero-dimensional entity and is simply a single value that exists.
Vector: Shown as one-dimensional, it is depicted as an arrow, indicating a feature or a point with direction and magnitude.
Matrix: Illustrated as two-dimensional, like a grid, representing connected data points.
Tensor: Described with ‘N dimension’, suggesting a complex structure where all elements are interconnected, like a network of points extending beyond two dimensions. This progression shows how data structures become more complex and capable of representing more intricate relationships as the number of dimensions increases.

scalar vector matrix tensor

Posted on 2022-07-202022-07-20 by lechuck park