UPS & ESS


UPS vs. ESS & Key Safety Technologies

This image illustrates the structural differences between UPS (Uninterruptible Power System) and ESS (Energy Storage System), emphasizing the advanced safety technologies required for ESS due to its “High Power, High Risk” nature.

1. Left Side: System Comparison (UPS vs. ESS)

This section contrasts the purpose and scale of the two systems, highlighting why ESS requires stricter safety measures.

  • UPS (Traditional System)
    • Purpose: Bridges the power gap for a short duration (10–30 mins) until the backup generator starts (Generator Wake-Up Time).
    • Scale: Relatively low capacity (25–500 kWh) and output (100 kW – N MW).
  • ESS (High-Capacity System)
    • Purpose: Stores energy for long durations (4+ hours) for active grid management, such as Peak Shaving.
    • Scale: Handles massive power (~100+ MW) and capacity (~400+ MWh).
    • Risk Factor: Labeled as “High Power, High Risk,” indicating that the sheer energy density makes it significantly more hazardous than UPS.

2. Right Side: 4 Key Safety Technologies for ESS

Since standard UPS technologies (indicated in gray text) are insufficient for ESS, the image outlines four critical technological upgrades (indicated in bold text).

① Battery Management System (BMS)

  • (From) Simple voltage monitoring and cut-off.
  • [To] Active Balancing & Precise State Estimation: Requires algorithms that actively balance cell voltages and accurately calculate SOC (State of Charge) and SOH (State of Health).

② Thermal Management System

  • (From) Simple air cooling or fans.
  • [To] Forced Air (HVAC) / Liquid Cooling: Due to high heat generation, robust air conditioning (HVAC) or direct Liquid Cooling systems are necessary.

③ Fire Detection & Suppression

  • (From) Detecting smoke after a fire starts.
  • [To] Off-gas Detection & Dedicated Suppression: Detects Off-gas (released before thermal runaway) to prevent fires early, using specialized suppressants like Clean Agents or Water Mist.

④ Physical/Structural Safety

  • (From) Standard metal enclosures.
  • [To] Explosion-proof & Venting Design: Enclosures must withstand explosions and safely vent gases.
  • [To] Fire Propagation Prevention: Includes fire barriers and BPU (Battery Protective Units) to stop fire from spreading between modules.

Summary

  • Scale: ESS handles significantly higher power and capacity (>400 MWh) compared to UPS, serving long-term grid needs rather than short-term backup.
  • Risk: Due to the “High Power, High Risk” nature of ESS, standard safety measures used in UPS are insufficient.
  • Solution: Advanced technologies—such as Liquid Cooling, Off-gas Detection, and Active Balancing BMS—are mandatory to ensure safety and prevent thermal runaway.

#ESS #UPS #BatterySafety #BMS #ThermalManagement #EnergyStorage #FireSafety #Engineering #TechTrends #OffGasDetection

WIth Gemini

Operations by Metrics

1. Big Data Collection & 2. Quality Verification

  • Big Data Collection: Represented by the binary data (top-left) and the “All Data (Metrics)” block (bottom-left).
  • Data Quality Verification: The collected data then passes through the checklist icon (top flow) and the “Verification (with Resolution)” step (bottom flow). This aligns with the quality verification step, including ‘resolution/performance’.

3. Change Data Capture (CDC)

  • Verified data moves to the “Change Only” stage (central pink box).
  • If there are “No Changes,” it results in “No Actions,” illustrating the CDC (Change Data Capture) concept of processing only altered data.
  • The magnifying glass icon in the top flow also visualizes this ‘change detection’ role.

4. State/Numeric Processing & 5. Analysis, Severity Definition

  • State/Numeric Processing: Once changes are detected (after the magnifying glass), the data is split into two types:
    • State Changes (ON/OFF icon): Represents changes in ‘state values’.
    • Numeric Changes (graph icon): Represents changes in ‘numeric values’.
  • Statistical Analysis & Severity Definition:
    • These changes are fed into the “Analysis” step.
    • This stage calculates the “Count of Changes” (statistics on the number of changes) and “Numeric change Diff” (amount of numeric change).
    • The analysis result leads to “Severity Tagging” to define the ‘Severity’ level (e.g., “Critical? Major? Minor?”).

6. Notification & 7. Analysis (Retrieve)

  • Notification: Once the severity is defined, the “Notification” step (bell/email icon) is triggered to alert personnel.
  • Analysis (Retrieve):
    • The notified user then performs the “Retrieve” action.
    • This final step involves querying both the changed data (CDD results) and the original data (source, indicated by the URL in the top-right) to analyze the cause.

Summary

This workflow begins with collecting and verifying all data, then uses CDC to isolate only the changes. These changes (state or numeric) are analyzed for count and difference to assign a severity level. The process concludes with notification and a retrieval step for root cause analysis.

#DataProcessing #DataMonitoring #ChangeDataCapture #CDC #DataAnalysis #SystemMonitoring #Alerting #ITOperations #SeverityAnalysis

With Gemini

DC Cooling (R)

Data Center Cooling System Core Structure

This diagram illustrates an integrated data center cooling system centered on chilled water/cooling water circulation and heat exchange.

Core Cooling Circulation Structure

Primary Loop: Cooling Water Loop

Cooling Tower → Cooling Water → Chiller → (Heat Exchange) → Cooling Tower

  • Cooling Tower: Dissipates heat from cooling water to atmosphere using outdoor air
  • Pump/Header: Controls cooling water pressure and flow rate through circulation pipes
  • Heat Exchange in Chiller: Cooling water exchanges heat with refrigerant to cool the refrigerant

Secondary Loop: Chilled Water Loop

Chiller → Chilled Water → CRAH → (Heat Exchange) → Chiller

  • Chiller: Generates chilled water (7-12°C) through compressor and refrigerant cycle
  • Pump/Header: Circulates chilled water to CRAH units and returns it back
  • Heat Exchange in CRAH: Chilled water exchanges heat with air to cool the air

Tertiary Loop: Cooling Air Loop

CRAH → Cooling Air → Servers → Hot Air → CRAH

  • CRAH (Computer Room Air Handler): Generates cooling air through water-to-air heat exchanger
  • FAN: Forces circulation of cooling air throughout server room
  • Heat Absorption: Air absorbs server heat and returns to CRAH

Heat Exchange Critical Points

Heat Exchange #1: Inside Chiller

  • Cooling Water ↔ Refrigerant: Transfers refrigerant heat to cooling water in condenser
  • Refrigerant ↔ Chilled Water: Absorbs heat from chilled water to refrigerant in evaporator

Heat Exchange #2: CRAH

  • Chilled Water ↔ Air: Transfers air heat to chilled water in water-to-air heat exchanger
  • Chilled water temperature rises → Returns to chiller

Heat Exchange #3: Server Room

  • Hot Air ↔ Servers: Air absorbs heat from servers
  • Temperature-increased air → Returns to CRAH

Energy Efficiency: Free Cooling

Low-Temperature Outdoor Air → Air-to-Water Heat Exchanger → Chilled Water Cooling → Reduced Chiller Load

  • Condition: When outdoor temperature is sufficiently low
  • Effect: Reduces chiller operation and compressor power consumption (up to 30-50%)
  • Method: Utilizes natural cooling through cooling tower or dedicated heat exchanger

Cooling System Control Elements

Cooling Basic Operations Components:

  • Cool Down: Controls water/air temperature reduction
  • Water Circulation: Adjusts flow rate through pump speed/pressure control
  • Heat Exchanges: Optimizes heat exchanger efficiency
  • Plumbing: Manages circulation paths and pressure loss

Heat Flow Summary

Server Heat → Air → CRAH (Heat Exchange) → Chilled Water → Chiller (Heat Exchange) → 
Cooling Water → Cooling Tower → Atmospheric Discharge


Summary

This system efficiently removes server heat to the outdoor atmosphere through three cascading circulation loops (air → chilled water → cooling water) and three strategic heat exchange points (CRAH, Chiller, Cooling Tower). Free cooling optimization reduces energy consumption by up to 50% when outdoor conditions permit. The integrated pump/header network ensures precise flow control across all loops for maximum cooling efficiency.


#DataCenterCooling #ChilledWater #CRAH #FreeCooling #HeatExchange #CoolingTower #ThermalManagement #DataCenterInfrastructure #EnergyEfficiency #HVACSystem #CoolingLoop #WaterCirculation #ServerCooling #DataCenterDesign #GreenDataCenter

With Claude

Power for AI

AI Data Center Power Infrastructure: 3 Key Transformations

Traditional Data Center Power Structure (Baseline)

Power Grid → Transformer → UPS → Server (220V AC)

  • Single power grid connection
  • Standard UPS backup (10-15 minutes)
  • AC power distribution
  • 200-300W per server

3 Critical Changes for AI Data Centers

🔴 1. More Power (Massive Power Supply)

Key Changes:

  • Diversified power sources:
    • SMR (Small Modular Reactor) – Stable baseload power
    • Renewable energy integration
    • Natural gas turbines
    • Long-term backup generators + large fuel tanks

Why: AI chips (GPU/TPU) consume kW to tens of kW per server

  • Traditional server: 200-300W
  • AI server: 5-10 kW (25-50x increase)
  • Total data center power demand: Hundreds of MW scale

🔴 2. Stable Power (Power Quality & Conditioning)

Key Changes:

  • 800V HVDC system – High-voltage DC transmission
  • ESS (Energy Storage System) – Large-scale battery storage
  • Peak Shaving – Peak load control and leveling
  • UPS + Battery/Flywheel – Instantaneous outage protection
  • Power conditioning equipment – Voltage/frequency stabilization

Why: AI workload characteristics

  • Instantaneous power surges (during inference/training startup)
  • High power density (30-100 kW per rack)
  • Power fluctuation sensitivity – Training interruption = days of work lost
  • 24/7 uptime requirements

🔴 3. Server Power (High-Efficiency Direct DC Delivery)

Key Changes:

  • Direct-to-Chip DC power delivery
  • Rack-level battery systems (Lithium/Supercapacitor)
  • High-density power distribution

Why: Maximize efficiency

  • Eliminate AC→DC conversion losses (5-15% efficiency gain)
  • Direct chip-level power supply – Minimize conversion stages
  • Ultra-high rack density support (100+ kW/rack)
  • Even minor voltage fluctuations are critical – Chip-level stabilization needed

Key Differences Summary

CategoryTraditional DCAI Data Center
Power ScaleFew MWHundreds of MW
Rack Density5-10 kW/rack30-100+ kW/rack
Power MethodAC-centricHVDC + Direct DC
Backup PowerUPS (10-15 min)Multi-tier (Generator+ESS+UPS)
Power StabilityStandardExtremely high reliability
Energy SourcesSingle gridMultiple sources (Nuclear+Renewable)

Summary

AI data centers require 25-50x more power per server, demanding massive power infrastructure with diversified sources including SMRs and renewables

Extreme workload stability needs drive multi-tier backup systems (ESS+UPS+Generator) and advanced power conditioning with 800V HVDC

Direct-to-chip DC power delivery eliminates conversion losses, achieving 5-15% efficiency gains critical for 100+ kW/rack densities

#AIDataCenter #DataCenterPower #HVDC #DirectDC #EnergyStorageSystem #PeakShaving #SMR #PowerInfrastructure #HighDensityComputing #GPUPower #DataCenterDesign #EnergyEfficiency #UPS #BackupPower #AIInfrastructure #HyperscaleDataCenter #PowerConditioning #DCPower #GreenDataCenter #FutureOfComputing

With Claude

DC Power(R)

Data Center DC Power System Comprehensive Overview

This diagram illustrates the complete DC (Direct Current) power supply system for a data center infrastructure.

1. Core Components

① Power Source

  • 15.4 KV High Voltage AC Power
  • Received from utility grid
  • Efficient long-distance transmission (Efficient Delivery)
  • High voltage warning indicator (High Warning)

② Primary Transformer

  • Voltage conversion: 15.4 KV → 6.6 KV
  • Function: Steps down high voltage to medium voltage
  • Transformation method: Voltage Step-down
  • Adjusts voltage for internal data center distribution

③ Backup Power #1 – Generator System (Long-Time Backup)

  • Configuration: Diesel generator + Fuel tank
  • Characteristic: Long-duration backup capability
  • Purpose: Continuous power supply during main power outage
  • Advantage: Unlimited operation as long as fuel is supplied

④ Secondary Transformer

  • Voltage conversion: 6.6 KV → 380 V
  • Function: Steps down medium voltage to low voltage
  • Transformation method: Voltage Step-down
  • Provides appropriate voltage for UPS and final loads

⑤ Backup Power #2 – UPS System (Short-Time Backup)

  • Configuration: UPS + Battery
  • Characteristic: Short-duration instantaneous backup
  • Purpose: Ensures uninterrupted power during main-to-generator transition
  • Role: Supplies power during generator startup time (10-30 seconds)

⑥ Final Load (Power Use)

  • Output voltage: 220 V AC or 48 V DC
  • Target: Servers, network equipment, storage systems
  • Feature: Stable IT infrastructure operation with DC power

2. Voltage Conversion Flow

15.4 KV (AC)  →  6.6 KV (AC)  →  380 V (AC)  →  48 V (DC) / 220 V
  [Reception]   [Primary TX]   [Secondary TX]   [Final Conversion]

3. Redundant Backup Architecture

Two-Tier Backup System

Main Power (15.4 KV) ─────┐
                          ├──→ Transform ──→ Load
Generator (Long-term) ────┘
         ↓
    UPS/Battery (Short-term) ──→ Instantaneous uninterrupted guarantee

Backup Strategy:

  • Generator: Hours to days operation (fuel-dependent)
  • UPS: Minutes to tens of minutes operation (battery capacity-dependent)
  • Combined effect: UPS covers generator startup gap to achieve complete uninterrupted power

4. Operating Scenarios

Scenario 1: Normal Operation

Utility power (15.4KV) → Primary transform (6.6KV) → Secondary transform (380V) → UPS → DC load (48V)

Scenario 2: Momentary Power Outage

  1. Main power interruption detected (< 4ms)
  2. UPS battery immediately engaged
  3. Continuous power supply to load with zero interruption

Scenario 3: Extended Power Outage

  1. Main power interruption detected
  2. UPS battery immediately engaged (maintains uninterrupted power)
  3. Generator automatically starts (10-30 seconds required)
  4. Generator reaches rated capacity and replaces main power
  5. Generator power charges UPS + supplies load
  6. Long-term operation with continuous fuel supply

Scenario 4: Generator Failure

  • Limited-time operation within UPS battery capacity
  • Priority operation for critical systems or graceful shutdown

5. Additional Protection and Control Devices

Supplementary devices for system stability and safety:

Circuit Breaker Hierarchy

  • GCB (Generator Circuit Breaker): Primary protection at reception point
  • VCB (Vacuum Circuit Breaker): Vacuum interruption, medium voltage protection
  • ACB (Air Circuit Breaker): Low voltage distribution panel protection
  • MCCB (Molded Case Circuit Breaker): Individual load protection
  • Role: Circuit interruption during overload or short circuit to protect equipment and personnel

Switching Devices

  • STS (Static Transfer Switch): High-speed transfer between main power ↔ generator
  • ATS (Automatic Transfer Switch): Automatic transfer between power sources ( UPS level)
  • ALTS (Automatic Load Transfer Switch): Automatic load transfer ( for 22.9kV class)
  • CCTS: Circuit breaker control and transfer system
  • Role: Automatic/immediate transfer to backup power during power failure

Switching Points (Red circle indicators)

  • Reception point, before/after transformers, backup power injection points
  • Critical points for power path changes and redundancy implementation

6. Key System Features

Uninterruptible Power Supply: Three-stage protection with main power → generator → UPS
Multi-stage Voltage Conversion: Ensures both transmission efficiency and usage safety
Automated Backup Transfer: Automatic switching without human intervention
Hierarchical Protection: Stage-by-stage circuit breakers prevent cascading failures
Scalable Architecture: Modular configuration enables easy capacity expansion


Summary

This DC power system architecture ensures continuous, uninterrupted operation of mission-critical data center infrastructure through a sophisticated combination of redundant power sources, automated failover mechanisms, and multi-layered protection systems. The integration of long-term generator backup and short-term UPS battery systems creates a seamless power continuity solution that can handle any grid interruption scenario. The multi-stage voltage transformation (15.4KV → 6.6KV → 380V → 48V DC) optimizes both transmission efficiency and end-user safety while providing flexibility for diverse IT equipment requirements.


#DataCenter #DCPower #PowerSystems #CriticalInfrastructure #UPS #BackupPower #DataCenterDesign #ElectricalEngineering #PowerDistribution #MissionCritical #DataCenterInfrastructure #FacilityManagement #PowerReliability #UninterruptiblePowerSupply #DataCenterOperations

With Claude

AI goes exponentially with ..

This infographic illustrates how AI’s exponential growth triggers a cascading exponential expansion across all interconnected domains.

Core Concept: Exponential Chain Reaction

Top Process Chain: AI’s exponential growth creates proportionally exponential demands at each stage:

  • AI (LLM)DataComputingPowerCooling

The “≈” symbol indicates that each element grows exponentially in proportion to the others. When AI doubles, the required data, computing, power, and cooling all scale proportionally.

Evidence of Exponential Growth Across Domains

1. AI Networking & Global Data Generation (Top Left)

  • Exponential increase beginning in the 2010s
  • Vertical surge post-2020

2. Data Center Electricity Demand (Center Left)

  • Sharp increase projected between 2026-2030
  • Orange (AI workloads) overwhelms blue (traditional workloads)
  • AI is the primary driver of total power demand growth

3. Power Production Capacity (Center Right)

  • 2005-2030 trends across various energy sources
  • Power generation must scale alongside AI demand

4. AI Computing Usage (Right)

  • Most dramatic exponential growth
  • Modern AI era begins in 2012
  • Doubling every 6 months (extremely rapid exponential growth)
  • Over 300,000x increase since 2012
  • Three exponential growth phases shown (1e+0, 1e+2, 1e+4, 1e+6)

Key Message

This infographic demonstrates that AI development is not an isolated phenomenon but triggers exponential evolution across the entire ecosystem:

  • As AI models advance → Data requirements grow exponentially
  • As data increases → Computing power needs scale exponentially
  • As computing expands → Power consumption rises exponentially
  • As power consumption grows → Cooling systems must expand exponentially

All elements are tightly interconnected, creating a ‘cascading exponential effect’ where exponential growth in one domain simultaneously triggers exponential development and demand across all other domains.


#ArtificialIntelligence #ExponentialGrowth #AIInfrastructure #DataCenters #ComputingPower #EnergyDemand #TechScaling #AIRevolution #DigitalTransformation #Sustainability #TechInfrastructure #MachineLearning #LLM #DataScience #FutureOfAI #TechTrends #TechnologyEvolution

With Claude

Multi-DCs Operation with a LLM (4)

LLM-Based Multi-Datacenter Operation System

System Architecture

3-Stage Processing Pipeline: Collector → Integrator → Analyst

  • Event collection from various protocols
  • Data normalization through local integrators
  • Intelligent analysis via LLM/AI analyzers
  • RAG data expansion through bottom Data Add-On modules

Core Functions

1. Time-Based Event Aggregation Analysis

  • 60-second intervals (adjustable) for event bundling
  • Comprehensive situational analysis instead of individual alarms
  • LLM queries with predefined prompts

Effectiveness:

  • ✅ Resolves alarm fatigue and enables correlation analysis
  • ✅ Improves operational efficiency through periodic comprehensive reports
  • ⚠️ Potential delay in immediate response to critical issues ( -> Using a legacy/local monitoring system )

2. RAG-Based Data Enhancement

  • Extension data: Metrics, manuals, configurations, maintenance records
  • Reuse of past analysis results as learning data
  • Improved accuracy through domain-specific knowledge accumulation

Effectiveness:

  • ✅ Continuous improvement of analysis quality and increased automation
  • ✅ Systematization of operational knowledge and organizational capability enhancement

Innovative Value

  • Paradigm Shift: Reactive → Predictive/Contextual analysis
  • Operational Burden Reduction: Transform massive alarms into meaningful insights
  • Self-Evolution: Continuous learning system through RAG framework

Executive Summary: This system overcomes the limitations of traditional individual alarm approaches and represents an innovative solution that intelligentizes datacenter operations through time-based event aggregation and LLM analysis. As a self-evolving monitoring system that continuously learns and develops through RAG-based data enhancement, it is expected to dramatically improve operational efficiency and analysis accuracy.

With Claude