CUDA

3 Processing Types

1. Sequential Processing

Hardware: General CPU (Intel/ARM)

Function: Control flow, I/O, scheduling, Data preparation

Workload Share: Training 5%, Inference 5%

2. Parallel Stream Processing

Hardware: CUDA core (Stream process)

Function: FP32/FP16 Vector/Scalar, memory management

Workload Share: Training 10%, Inference 30%

3. Matrix Processing

Hardware: Tensor core (Matrix core)

Function: Mixed-precision (FP8/FP16) MMA, Sparse matrix operations

Workload Share: Training 85%+, Inference 65%+

From Claude with some prompting
This image illustrates the architecture of CUDA (Compute Unified Device Architecture), a parallel computing platform and programming model developed by NVIDIA. The main components are as follows:

OS, CPU, USER: Concentric circles on the left represent the operating system, CPU, and user level.
CUDA Framework: The large central box, which includes API control.
SIMT (Single Instruction Multi Thread): Indicated by the arrow connecting the CPU to the CUDA cores.
CUDA Cores: Parallel processing units represented by many small squares.
Warp: Described as “a group of CUDA core”, representing a group of CUDA cores.
Standardized High-Bandwidth (HW Arch): Represents standardized high-bandwidth at the hardware architecture level.
Dynamic Allocation (SW Arch): Indicates dynamic allocation at the software architecture level.
Video Memory Block: Located in the upper right corner.
API Control: Included within the CUDA Framework.

This diagram shows the allocation of tasks from the CPU to CUDA cores, the parallel processing capability of CUDA cores, and overall system control through memory and API. CUDA is used to accelerate complex computations by leveraging the powerful parallel processing capabilities of GPUs.

The diagram effectively simplifies the CUDA architecture to provide an overview of its key components and their relationships, suitable for educational purposes or high-level explanations.

Tag: CUDA

3 Computing in AI

AI Computing Architecture

3 Processing Types

1. Sequential Processing

2. Parallel Stream Processing

3. Matrix Processing

Key Insight