Switching of the power

This diagram illustrates two main power switching methods used in electrical systems: ATS (Automatic Transfer Switch) and STS (Static Transfer Switch).

System Configuration

  • Power Sources: Utility grid and Generator
  • Protection: UPS systems
  • Load: Server infrastructure

ATS (Automatic Transfer Switch)

Location: Switchgear Area (Power Distribution Board)

Characteristics:

  • Mechanism: Mechanical breakers/contacts
  • Transfer Time: Several seconds (including generator start-up)
  • Advantages: Relatively simple, lower cost
  • Application: Standard power transfer systems

STS (Static Transfer Switch)

Location: Panelboard Area (Distribution Panel)

Characteristics:

  • Mechanism: Semiconductor devices (SCR, IGBT)
  • Transfer Time: A few milliseconds (near seamless)
  • Advantages: Ensures high-quality power supply
  • Disadvantages: Expensive

Key Differences

  1. Transfer Speed: STS is significantly faster (milliseconds vs seconds)
  2. Technology: ATS uses mechanical switching, STS uses electronic switching
  3. Cost: ATS is more economical
  4. Power Quality: STS provides more stable power delivery
  5. Complexity: STS requires more sophisticated semiconductor control

Applications

  • ATS: Suitable for applications that can tolerate brief power interruptions
  • STS: Critical for sensitive equipment like servers, data centers, and medical facilities requiring uninterrupted power

Summary: This diagram shows a redundant power system where ATS provides cost-effective backup power switching while STS offers near-instantaneous transfer for critical loads. Both systems work together with UPS backup to ensure continuous power supply to servers and sensitive equipment.

With Claude

LLM Efficiency with a Cooling

This image demonstrates the critical impact of cooling stability on both LLM performance and energy efficiency in GPU servers through benchmark results.

Cascading Effects of Unstable Cooling

Problems with Unstable Air Cooling:

  • GPU Temperature: 54-72°C (high and unstable)
  • Thermal throttling occurs – where GPUs automatically reduce clock speeds to prevent overheating, leading to significant performance degradation
  • Result: Double penalty of reduced performance + increased power consumption

Energy Efficiency Impact:

  • Power Consumption: 8.16kW (high)
  • Performance: 46 TFLOPS (degraded)
  • Energy Efficiency: 5.6 TFLOPS/kW (poor performance-to-power ratio)

Benefits of Stable Liquid Cooling

Temperature Stability Achievement:

  • GPU Temperature: 41-50°C (low and stable)
  • No thermal throttling → sustained optimal performance

Energy Efficiency Improvement:

  • Power Consumption: 6.99kW (14% reduction)
  • Performance: 54 TFLOPS (17% improvement)
  • Energy Efficiency: 7.7 TFLOPS/kW (38% improvement)

Core Mechanisms: How Cooling Affects Energy Efficiency

  1. Thermal Throttling Prevention: Stable cooling allows GPUs to maintain peak performance continuously
  2. Power Efficiency Optimization: Eliminates inefficient power consumption caused by overheating
  3. Performance Consistency: Unstable cooling can cause GPUs to use 50% of power budget while delivering only 25% performance

Advanced cooling systems can achieve energy savings ranging from 17% to 23% compared to traditional methods. This benchmark paradoxically shows that proper cooling investment dramatically improves overall energy efficiency.

Final Summary

Unstable cooling triggers thermal throttling that simultaneously degrades LLM performance while increasing power consumption, creating a dual efficiency loss. Stable liquid cooling achieves 17% performance gains and 14% power savings simultaneously, improving energy efficiency by 38%. In AI infrastructure, adequate cooling investment is essential for optimizing both performance and energy efficiency.

With Claude

Emergency Power System

This image shows a diagram of an Emergency Power System and the characteristics of each component.

Overall System Structure

At the top, the power grid is connected to servers/data centers, and three backup power options are presented in case of power supply interruption.

Three Backup Power Options

1. Generator

  • Long-term operation: Unlimited operation as long as fuel is available
  • Operation method: Engine rotation → Power generation
  • Type: Diesel engine generator
  • Disadvantages:
    • Start-up delay during instantaneous power outages
    • Start-up delay, noise, exhaust emissions
    • Periodic testing required
    • Requires integration with ATS (Automatic Transfer Switch)

2. Dynamic UPS

  • Features:
    • Uninterrupted/Long-term operation (until diesel engine starts)
    • Flywheel kinetic energy storage
    • Combined generator and diesel engine
  • Advantages: Seamless power supply without STS (Static Transfer Switch)
  • Disadvantages: High initial cost, large footprint, noise

DR (Diesel Rotary) UPS: A special form of Dynamic UPS that provides uninterrupted power through flywheel energy storage technology.

3. Static UPS

  • Operation time: Instantaneous/Short-term (typically 5-15 minutes)
  • Power quality: Clean power supply
  • Configuration: Battery(DC) → Inverter(AC) → Rectifier
  • Features:
    • Millisecond-level instant transfer
    • Battery life 3-5 years, replacement costs, heat generation issues

Key Characteristics Summary

Generators can operate long-term with fuel supply but have start-up delays, while Static UPS provides immediate power but only for short durations. Dynamic UPS (including DR UPS) is a hybrid solution that provides uninterrupted power through flywheel technology while enabling long-term operation when combined with diesel engines. In actual operations, it’s common to use these systems in combination, considering the advantages and disadvantages of each system.

With Claude

Power Circuit Breaker

This image presents a Power Circuit Breaker classification diagram showing the types and characteristics of electrical circuit breakers used in power systems.

System Overview

Power Flow: The diagram illustrates the electrical power path from power plant → transmission lines → circuit breakers → distribution panels.

Circuit Breaker Classification

The breakers are categorized by voltage levels and arc extinguishing methods:

Voltage Classifications

  • Very High Voltage: 66~800kV
  • High Voltage: 3.3~38kV
  • Using Voltage: 380~690V, 110~600V, 110~440V

Breaker Types and Arc Extinguishing Methods

  1. GIS/GCB (Gas Insulated Switchgear/Gas Circuit Breaker)
    • 66~800kV
    • Uses SF6 gas with high vacuum technology
  2. VCB (Vacuum Circuit Breaker)
    • 3.3~38kV
    • Vacuum arc extinguishing method
  3. ACB (Air Circuit Breaker)
    • 380~690V
    • Air + arc chute method
  4. MCCB (Molded Case Circuit Breaker)
    • 110~600V
    • Air + arc chute method
  5. ELCB (Earth Leakage Circuit Breaker)
    • 110~440V
    • Ground fault protection, no arc extinguishing

Key Safety Message

The diagram emphasizes “The bigger (Arc) the more dangerous” – highlighting that higher voltages require more sophisticated and safer arc extinguishing technologies.

Summary: This technical diagram systematically categorizes power circuit breakers from ultra-high voltage (800kV) to low voltage (110V) applications, demonstrating how arc extinguishing complexity increases with voltage levels. The chart serves as an educational reference showing that higher voltage systems require more advanced safety mechanisms like SF6 gas insulation, while lower voltage applications can use simpler air-based arc interruption methods.

With Claude

Data Center ?

This infographic compares the evolution from servers to data centers, showing the progression of IT infrastructure complexity and operational requirements.

Left – Server

  • Shows individual hardware components: CPU, motherboard, power supply, cooling fans
  • Labeled “No Human Operation,” indicating basic automated functionality

Center – Modular DC

  • Represented by red cubes showing modular architecture
  • Emphasizes “More Bigger” scale and “modular” design
  • Represents an intermediate stage between single servers and full data centers

Right – Data Center

  • Displays multiple server racks and various infrastructure components (networking, power, cooling systems)
  • Marked as “Human & System Operation,” suggesting more complex management requirements

Additional Perspective on Automation Evolution:

While the image shows data centers requiring human intervention, the actual industry trend points toward increasing automation:

  1. Advanced Automation: Large-scale data centers increasingly use AI-driven management systems, automated cooling controls, and predictive maintenance to minimize human intervention.
  2. Lights-Out Operations Goal: Hyperscale data centers from companies like Google, Amazon, and Microsoft ultimately aim for complete automated operations with minimal human presence.
  3. Paradoxical Development: As scale increases, complexity initially requires more human involvement, but advanced automation eventually enables a return toward unmanned operations.

Summary: This diagram illustrates the current transition from simple automated servers to complex data centers requiring human oversight, but the ultimate industry goal is achieving fully automated “lights-out” data center operations. The evolution shows increasing complexity followed by sophisticated automation that eventually reduces the need for human intervention.

With Claude

Numbers about Cooling

Numbers about Cooling – System Analysis

This diagram illustrates the thermodynamic principles and calculation methods for cooling systems, particularly relevant for data center and server room thermal management.

System Components

Left Side (Heat Generation)

  • Power consumption device (Power kW)
  • Time element (Time kWh)
  • Heat-generating source (appears to be server/computer systems)

Right Side (Cooling)

  • Cooling system (Cooling kW – Remove ‘Heat’)
  • Cooling control system
  • Coolant circulation system

Core Formula: Q = m×Cp×ΔT

Heat Generation Side (Red Box)

  • Q: Heat flow rate (J/s) = Power (kW)
  • V: Volumetric flow rate (m³/s)
  • ρ: Air density (approximately 1.2 kg/m³)
  • Cp: Specific heat capacity of air at constant pressure (approximately 1005 J/(kg·K))
  • ΔT: Temperature change

Cooling Side (Blue Box)

  • Q: Cooling capacity (kW)
  • m: Coolant circulation rate (kg/s)
  • Cp: Specific heat capacity of coolant (for water, approximately 4.2 kJ/kg·K)
  • ΔT: Temperature change

System Operation Principle

  1. Heat generated by electronic equipment heats the air
  2. Heated air moves to the cooling system
  3. Circulating coolant absorbs the heat
  4. Cooling control system regulates flow rate or temperature
  5. Processed cool air recirculates back to the system

Key Design Considerations

The cooling control system monitors critical parameters such as:

  • High flow rate vs. High temperature differential
  • Optimal balance between energy efficiency and cooling effectiveness
  • Heat load matching between generation and removal capacity

Summary

This diagram demonstrates the fundamental thermodynamic principles for cooling system design, where electrical power consumption directly translates to heat generation that must be removed by the cooling system. The key relationship Q = m×Cp×ΔT applies to both heat generation (air side) and heat removal (coolant side), enabling engineers to calculate required coolant flow rates and temperature differentials. Understanding these heat balance calculations is essential for efficient thermal management in data centers and server environments, ensuring optimal performance while minimizing energy consumption.

Components for AI Work

This diagram visualizes the core concept that all components must be organically connected and work together to successfully operate AI workloads.

Importance of Organic Interconnections

Continuity of Data Flow

  • The data pipeline from Big Data → AI Model → AI Workload must operate seamlessly
  • Bottlenecks at any stage directly impact overall system performance

Cooperative Computing Resource Operations

  • GPU/CPU computational power must be balanced with HBM memory bandwidth
  • SSD I/O performance must harmonize with memory-processor data transfer speeds
  • Performance degradation in one component limits the efficiency of the entire system

Integrated Software Control Management

  • Load balancing, integration, and synchronization coordinate optimal hardware resource utilization
  • Real-time optimization of workload distribution and resource allocation

Infrastructure-based Stability Assurance

  • Stable power supply ensures continuous operation of all computing resources
  • Cooling systems prevent performance degradation through thermal management of high-performance hardware
  • Facility control maintains consistency of the overall operating environment

Key Insight

In AI systems, the weakest link determines overall performance. For example, no matter how powerful the GPU, if memory bandwidth is insufficient or cooling is inadequate, the entire system cannot achieve its full potential. Therefore, balanced design and integrated management of all components is crucial for AI workload success.

The diagram emphasizes that AI infrastructure is not just about having powerful individual components, but about creating a holistically optimized ecosystem where every element supports and enhances the others.

With Claude