AI Chips

This image presents a comprehensive overview of the AI chip ecosystem, categorizing different approaches and technologies:

Major AI Chip Categories

GPU-Based Solutions:

  • Nvidia H100/B200 (AMD MI Series): Currently the most widely used GPUs for AI training and inference
  • General GPU architecture: Traditional general-purpose GPU architectures

Specialized AI Chips:

  • Cerebras AI (WSE): Wafer-Scale Engine where the entire wafer functions as one chip
  • Google TPU: Google’s Tensor Processing Unit
  • MS Azure Maia: Microsoft’s cloud-optimized AI chip
  • Amazon (Inferentia/Trainium): Amazon’s dedicated inference and training chips

Technical Features

Memory Technologies:

  • High-Bandwidth Memory (HBM): Advanced memory technology including HBM2E
  • Massive On-Chip SRAM: Large-capacity on-chip memory with external MemoryX
  • Ultra-Low Latency On-Chip Fabric (SwarmX): High-speed on-chip interconnect

Networking Technologies:

  • NvLink/NvSwitch: Nvidia’s high-speed interconnect with Infinity Fabric
  • Inter-Chip Interconnect (ICI): Ethernet-based connections including RoCE-like and UEC protocols
  • NeuroLink: Advanced chip-to-chip communication

Design Approaches:

  • Single Wafer-Scale Engine: Entire wafer as one chip with immense on-chip memory/bandwidth
  • Simplified Distributed Training: Wafer-scale design enabling simplified distributed training
  • ASIC for special AI function: Application-specific integrated circuits optimized for AI workloads
  • Optimization for Cloud Solutions with ASIC: Cloud-optimized ASIC implementations

This diagram effectively illustrates the evolution from general-purpose GPUs to specialized AI chips, showcasing how different companies are pursuing distinct technological approaches to meet the demanding requirements of AI workloads. The ecosystem demonstrates various strategies including memory optimization, interconnect technologies, and architectural innovations.

With Claude

Digital Twin with LLM

This image demonstrates the revolutionary applicability of Digital Twin enhanced by LLM integration.

Three Core Components of Digital Twin

Digital Twin consists of three essential elements:

  1. Modeling – Creating digital replicas of physical objects
  2. Data – Real-time sensor data and operational information collection
  3. Simulation – Predictive analysis and scenario testing

Traditional Limitations and LLM’s Revolutionary Solution

Previous Challenges: Modeling results were expressed only through abstract concepts like “Visual Effect” and “Easy to view of complex,” making practical interpretation difficult.

LLM as a Game Changer:

  • Multimodal Interpretation: Transforms complex 3D models, data patterns, and simulation results into intuitive natural language explanations
  • Retrieval Interpretation: Instantly extracts key insights from vast datasets and converts them into human-understandable formats
  • Human Interpretation Resource Replacement: LLM provides expert-level analytical capabilities, enabling continuous 24/7 monitoring

Future Value of Digital Twin

With LLM integration, Digital Twin evolves from a simple visualization tool into an intelligent decision-making partner. This becomes the core driver for maximizing operational efficiency and continuous innovation, accelerating digital transformation across industries.

Ultimately, this diagram emphasizes that LLM is the key technology that unlocks the true potential of Digital Twin, demonstrating its necessity and serving as the foundation for sustained operational improvement and future development.

With Claude

AI from the base

This diagram contrasts two approaches: traditional rule-based systems that achieve 100% accuracy within limited scope using human-designed logic, versus AI systems that handle massive datasets through neural networks with probabilistic reasoning. While traditional methods guarantee perfect results in narrow domains, AI offers scalable, adaptive solutions for complex real-world problems despite requiring significant energy and operating with uncertainty rather than absolute certainty

Upper Process (Traditional Approach):

  • Data → Human Rule Creation: Based on binary data, humans design clear logical rules
  • Mathematical Operations (√(x+y)): Precise and deterministic calculations
  • “BASE”: Foundation system with 100% certainty
  • Human-created rules guarantee complete accuracy (100%) but operate only within limited scope

Lower Process (AI-Based Approach):

  • Large-Scale Data Processing: Capable of handling vastly more extensive and complex data than traditional methods
  • Neural Network Pattern Learning: Discovers complex patterns and relationships that are difficult for humans to explicitly define
  • Adaptive Learning: The circular arrow (⚡) represents continuous improvement and adaptability to new situations
  • Advantages of Probabilistic Reasoning: Flexibility to handle uncertain and complex real-world problems

Key Advantages:

  • Traditional Approach: Clear and predictable but limited for complex real-world problems
  • AI Approach: While probabilistic, provides scalability and adaptability to solve complex problems that are difficult for humans to design solutions for. Though imperfect, it offers practical solutions that can respond to diverse and unpredictable real-world situations

AI may not be perfect, but it opens up innovative possibilities in areas that are difficult to approach with traditional methods, serving as a powerful tool for tackling previously intractable problems.

With Claude

Silicon Photonics

This diagram compares PCIe (Electrical Copper Circuit) and Silicon Photonics (Optical Signal) technologies.

PCIe (Left, Yellow Boxes)

  • Signal Transmission: Uses electrons (copper traces)
  • Speed: Gen5 512Gbps (x16), Gen6 ~1Tbps expected
  • Latency: μs~ns level delay due to resistance
  • Power Consumption: High (e.g., Gen5 x16 ~20W), increased cooling costs due to heat generation
  • Pros/Cons: Mature standard with low cost, but clear bandwidth/distance limitations

Silicon Photonics (Right, Purple Boxes)

  • Signal Transmission: Uses photons (silicon optical waveguides)
  • Speed: 400Gbps~7Tbps (utilizing WDM technology)
  • Latency: Ultra-low latency (tens of ps, minimal conversion delay)
  • Power Consumption: Low (e.g., 7Tbps ~10W or less), minimal heat with reduced cooling needs
  • Key Benefits:
    • Overcomes electrical circuit limitations
    • Supports 7Tbps-level AI communication
    • Optimized for AI workloads (high speed, low power)

Key Message

Silicon Photonics overcomes the limitations of existing PCIe technology (high power consumption, heat generation, speed limitations), making it a next-generation technology particularly well-suited for AI workloads requiring high-speed data processing.

With Claude

Small Errors in AI

Four Core Characteristics of AI Tasks (Left)

AI systems have distinctive characteristics that make them particularly vulnerable to error amplification:

  • Big Volume: Processing massive amounts of data
  • Long Duration: Extended computational operations over time
  • Parallel Processing: Simultaneous execution of multiple tasks
  • Interdependencies: Complex interconnections where components influence each other

Small Error Amplification (Middle)

Due to these AI characteristics, small initial errors become amplified in two critical ways:

  • Error Propagation & Data Corruption: Minor errors spread throughout the system, significantly impacting overall data quality
  • Delay Propagation & Performance Degradation: Small delays accumulate and cascade, severely affecting entire system performance

Final Impact (Right)

  • Very High Energy Cost: Errors and performance degradation result in exponentially higher energy consumption than anticipated

Key Message

The four inherent characteristics of AI (big volume, long duration, parallel processing, and interdependencies) create a perfect storm where small errors can amplify exponentially, ultimately leading to enormously high energy costs. This diagram serves as a warning about the critical importance of preventing small errors in AI systems before they cascade into major problems.

With Claude

Human & Data with AI

Data Accumulation Perspective

History → Internet: All knowledge and information accumulated throughout human history is digitized through the internet and converted into AI training data. This consists of multimodal data including text, images, audio, and other formats.

Foundation Model: Large language models (LLMs) and multimodal models are pre-trained based on this vast accumulated data. Examples include GPT, BERT, CLIP, and similar architectures.

Human to AI: Applying Human Cognitive Patterns to AI

1. Chain of Thoughts

  • Implementation of human logical reasoning processes in the Reasoning stage
  • Mimicking human cognitive patterns that break down complex problems into step-by-step solutions
  • Replicating the human approach of “think → analyze → conclude” in AI systems

2. Mixture of Experts

  • AI implementation of human expert collaboration systems utilized in the Experts domain
  • Architecting the way human specialists collaborate on complex problems into model structures
  • Applying the human method of synthesizing multiple expert opinions for problem-solving into AI

3. Retrieval-Augmented Generation (RAG)

  • Implementing the human process of searching existing knowledge → generating new responses into AI systems
  • Systematizing the human approach of “reference material search → comprehensive judgment”

Personal/Enterprise/Sovereign Data Utilization

1. Personal Level

  • Utilizing individual documents, history, preferences, and private data in RAG systems
  • Providing personalized AI assistants and customized services

2. Enterprise Level

  • Integrating organizational internal documents, processes, and business data into RAG systems
  • Implementing enterprise-specific AI solutions and workflow automation

3. Sovereign Level

  • Connecting national or regional strategic data to RAG systems
  • Optimizing national security, policy decisions, and public services

Overall Significance: This architecture represents a Human-Centric AI system that transplants human cognitive abilities and thinking patterns into AI while utilizing multi-layered data from personal to national levels to evolve general-purpose AI (Foundation Models) into intelligent systems specialized for each level. It goes beyond simple data processing to implement human thinking methodologies themselves into next-generation AI systems.

With Claude

Dynamic Voltage and Frequency Scaling (in GPU)

This image illustrates the DVFS (Dynamic Voltage and Frequency Scaling) system workflow, which is a power management technique that dynamically adjusts CPU/GPU voltage and frequency to optimize power consumption.

Key Components and Operation Flow

1. Main Process Flow (Top Row)

  • Workload InitWorkload AnalysisDVFS Policy DecisionClock Frequency AdjustmentVoltage AdjustmentWorkload ExecutionWorkload Finish

2. Core System Components

Power State Management:

  • Basic power states: P0~P12 (P0 = highest performance, P12 = lowest power)
  • Real-time monitoring through PMU (Power Management Unit)

Analysis & Decision Phase:

  • Applies dynamic power consumption formula using algorithms
  • Considers thermal limits in analysis
  • Selects new power state (High: P0-P2, Low: P8-P10)
  • P-State changes occur within 10μs~1ms

Frequency Adjustment (PLL – Phase-Locked Loop):

  • Adjusts GPU core and memory clock frequencies
  • Typical range: 1,410MHz~1,200MHz (memory), 1,000MHz~600MHz (core)
  • Adjustment time: 10-100 microseconds

Voltage Adjustment (VRM – Voltage Regulator Module):

  • Adjusts voltage supplied to GPU core and memory
  • Typical range: 1.1V (P0) to 0.8V (P8)
  • VRM stabilizes voltage within tens of microseconds

3. Real-time Feedback Loop

The system operates a continuous feedback loop that readjusts P-states in real-time based on workload changes, maintaining optimal balance between performance and power efficiency.

4. Execution Phase

The GPU executes workloads at new frequency and voltage settings, with asynchronous adjustments based on frequency and voltage changes. After completion, the system transitions to low-power states (e.g., P10, P12) to conserve energy.


Summary: Key Benefits of DVFS

DVFS technology is for AI data centers as it optimizes GPU efficiency management to achieve maximum overall power efficiency. By intelligently scaling thousands of GPUs based on AI workload demands, DVFS can reduce total data center power consumption by 30-50% while maintaining peak AI performance during training and inference operations, making it essential for sustainable and cost-effective AI infrastructure at scale.

With Claude