From RNN to Transformer

Posted on 2025-08-202025-08-19 by lechuck park

Visual Analysis: RNN vs Transformer

Visual Structure Comparison

RNN (Top): Sequential Chain

Linear flow: Circular nodes connected left-to-right
Hidden states: Each node processes sequentially
Attention weights: Numbers (2,5,11,4,2) show token importance
Bottleneck: Must process one token at a time

Transformer (Bottom): Parallel Grid

Matrix layout: 5×5 grid of interconnected nodes
Self-attention: All tokens connect to all others simultaneously
Multi-head: 5 parallel attention heads working together
Position encoding: Separate blue boxes handle sequence order

Key Visual Insights

Processing Pattern

RNN: Linear chain → Sequential dependency
Transformer: Interconnected grid → Parallel freedom

Information Flow

RNN: Single path with accumulating states
Transformer: Multiple simultaneous pathways

Attention Mechanism

RNN: Weights applied to existing sequence
Transformer: Direct connections between all elements

Design Effectiveness

The diagram succeeds by using:

Contrasting layouts to show architectural differences
Color coding to highlight attention mechanisms
Clear labels (“Sequential” vs “Parallel Processing”)
Visual metaphors that make complex concepts intuitive

The grid vs chain visualization immediately conveys why Transformers enable faster, more scalable processing than RNNs.

Summary

This diagram effectively illustrates the fundamental shift from sequential to parallel processing in neural architecture. The visual contrast between RNN’s linear chain and Transformer’s interconnected grid clearly demonstrates why Transformers revolutionized AI by enabling massive parallelization and better long-range dependencies.

With Claude

3 Computing in AI

Posted on 2025-07-18 by lechuck park

AI Computing Architecture

3 Processing Types

1. Sequential Processing

Hardware: General CPU (Intel/ARM)
Function: Control flow, I/O, scheduling, Data preparation
Workload Share: Training 5%, Inference 5%

2. Parallel Stream Processing

Hardware: CUDA core (Stream process)
Function: FP32/FP16 Vector/Scalar, memory management
Workload Share: Training 10%, Inference 30%

3. Matrix Processing

Hardware: Tensor core (Matrix core)
Function: Mixed-precision (FP8/FP16) MMA, Sparse matrix operations
Workload Share: Training 85%+, Inference 65%+

Key Insight

The majority of AI workloads are concentrated in matrix processing because matrix multiplication is the core operation in deep learning. Tensor cores are the key component for AI performance improvement.

With Claude

Analytical vs Empirical

Posted on 2025-04-25 by lechuck park

Analytical vs Empirical Approaches

Analytical Approach

Theory Driven: Based on mathematical theories and logical reasoning
Programmable with Design: Implemented through explicit rules and algorithms
Sequential by CPU: Tasks are processed one at a time in sequence
Precise & Explainable: Results are accurate and decision-making processes are transparent

Empirical Approach

Data Driven: Based on real data and observations
Deep Learning with Learn: Neural networks automatically learn from data
Parallel by GPU: Multiple tasks are processed simultaneously for improved efficiency
Approximate & Unexplainable: Results are approximations and internal workings are difficult to explain

Summary

This diagram illustrates the key differences between traditional programming methods and modern machine learning approaches. The analytical approach follows clearly defined rules designed by humans and can precisely explain results, while the empirical approach learns patterns from data and improves efficiency through parallel processing but leaves decision-making processes as a black box.

with claude

CPU & GPU Works

Posted on 2024-06-04 by lechuck park

From Claude with some prompting
This image explains the working principles of CPU (Central Processing Unit) and GPU (Graphics Processing Unit) in a visual manner.

Data Types:
- Scalar: A single value
- Vector: One-dimensional array
- Matrix: Two-dimensional array
- Tensor: Multi-dimensional array
CPU Work Method:
- Sequential processing, denoted by ’01’
- Tasks are processed in order, as shown by 1, 2, 3, 4, 5
- Primarily handles scalar data, processing complex tasks sequentially
GPU Work Method:
- Parallel processing, represented by a matrix
- Icons show multiple tasks being processed simultaneously
- Mainly deals with multi-dimensional data like matrices or tensors, processing many tasks in parallel

The image demonstrates that while CPUs process tasks sequentially, GPUs can handle many tasks simultaneously in parallel. This helps explain which processing unit is more efficient based on the complexity and volume of data. Complex and large-scale data (matrices, tensors) are better suited for GPUs, while simple, sequential tasks are more appropriate for CPUs.

Parallel with a GPU

Posted on 2023-10-09 by lechuck park