Lechuck Park

Power Changes for AI DC

Posted on 2026-03-05 by lechuck park

Power Architecture Evolution: From Passive Load to Active Asset

This diagram illustrates the critical evolution of data center power systems, highlighting the shift from a traditional “Passive Load” model to an “Active Asset” model. This transition is emerging as an essential power architecture and strategic direction for future AI Data Centers (AI DCs), which demand massive energy consumption and absolute operational stability.

1. AS-IS: Passive Load (Pure Consumer)

Traditional Unidirectional Grid Connection: Power flows in only one direction (Grid -> Data Center).
Grid Burden: The facility acts solely as a massive energy consumer, placing a heavy burden on the power grid.
Vulnerability & Pollution: It is vulnerable to grid instability and relies heavily on polluting diesel generators during power outages.
Infrastructure: It relies on traditional transmission lines and substations, consuming power exactly as it is delivered without any grid interaction.

2. TO-BE: Active Asset (Prosumer / Grid Resource)

Grid-Interactive Microgrid with BESS: Integrates a Battery Energy Storage System (BESS) for intelligent and flexible power management.
Bidirectional Flow: Power can flow both ways (Grid <-> Battery/Inverter <-> Data Center), allowing the facility to function as a “prosumer.”
Grid Support (Ancillary Services): Actively provides control over voltage and frequency to help stabilize the broader power grid.
Resilience & Sustainability: Ensures uninterrupted operation via large-scale battery storage, significantly reducing diesel dependency. It also absorbs the volatility of renewable energy, facilitating a greener grid integration.
Key Technologies: Driven by smart inverters, large-scale batteries, and Advanced Energy Management Systems (EMS).

Conclusion: An Indispensable Power Direction for AI DCs

Rather than simply acting as facilities that drain massive amounts of electricity, modern data centers must evolve into grid-interactive assets. Given the exponential surge in power demands and the strict continuous operation requirements of AI workloads, adopting this “Active Asset” architecture with BESS and smart inverters is no longer just an eco-friendly alternative—it is an essential and inevitable power infrastructure direction for the successful deployment and scaling of AI Data Centers.

#AIDC #AIDataCenter #DataCenterInfrastructure #ESS #Inverter #GridInteractive

With Gemini

Operation Digitalization Step

Posted on 2026-03-04 by lechuck park

Operation Digitalization Step: A 4-Step Roadmap

Step 1: Digitalization (The Start)

Goal: Securing data digitization and observability. It is the foundational phase of gathering and monitoring data before applying any advanced automation.

Step 2: Reactive Enhancement (Human Knowledge)

Goal: Applying LLM & RAG agents as a “Human Help Tool.”
Details: It relies on pre-verified processes to prevent AI hallucinations. By analyzing text-based event messages and operation manuals, it provides an “Easy and Effective first” approach to assist human operators.

Step 3: Proactive Enhancement (Machine Learning)

Goal: Deriving new insights through pattern analysis and machine learning.
Details: It utilizes specific and deep AI models based on metric statistics to provide an “AI Analysis Guide.” However, the final action still relies on a “Human Decision.”

Step 4: Autonomous Enhancement (Full-Validated Closed-Loop)

Goal: Achieving stable, AI-controlled operations.
Details: It prioritizes low-risk, high-gain loops. Through verified machines and strict guide rails, the system executes autonomous “AI Control” under full verification to manage risks.
Core Feedback Loop: The outcomes from both human decisions (Step 3) and AI control (Step 4) are ultimately designed to make “Everything Easy to Read,” ensuring transparency and intuitive understanding for operators.

Progressive Evolution: The roadmap illustrates a strategic 4-step journey from basic data observability to fully autonomous, AI-controlled operations.
Practical AI Adoption: It emphasizes a safe, low-risk strategy, starting with LLM/RAG as human-assist tools before advancing to predictive machine learning and closed-loop automation.
Human-Centric Transparency: Regardless of the automation level, the ultimate design ensures all AI actions and system insights remain intuitive and “Easy to Read” for human operators.

#OperationDigitalization #AIOps #AutonomousOperations #DataCenterManagement #ITInfrastructure #LLM #RAG #MachineLearning #DigitalTransformation

Legacy vs AI DC

Posted on 2026-03-03 by lechuck park

Legacy DC vs. AI Factory

1. Legacy Data Center

Static Load: The flat line on the graph indicates that power and compute demands are stable, continuous, and highly predictable.
Air Cooling: Traditional fan-based air cooling systems are sufficient to manage the heat generated by standard, lower-density server racks.
Minutes Level Work: System responses, resource provisioning, and facility adjustments generally occur on a scale of minutes.
IT & OT Silo Ops: Information Technology (servers, networking) and Operational Technology (power, cooling facilities) are managed independently in isolated silos, with no real-time data exchange.

2. AI Factory (DC)

Dynamic/High-Density: The volatile, jagged graph illustrates how AI workloads create extreme, rapid power spikes and demand highly dense computing resources.
Liquid Cooling: The immense heat output from high-performance AI chips necessitates advanced liquid cooling solutions (represented by the water drop and circulation arrows) to maintain thermal efficiency.
Seconds Level Works: The physical infrastructure must be highly agile, detecting and responding to sudden dynamic workload changes and thermal shifts within seconds.
Workload Aware: The facility dynamically adapts its cooling and power based on real-time AI computing needs. Establishing this requires robust “IT/OT Data Convergence” and the utilization of “High-Fidelity Data” as key components of a broader “Digitalization” strategy.

Summary

Legacy data centers are designed for predictable, static loads using traditional air cooling, with IT and facility operations (OT) isolated from one another.
AI Factories must handle highly volatile, high-density workloads, making liquid cooling and instantaneous, seconds-level infrastructure responses mandatory.
Transitioning to a true “Workload Aware” facility requires a strong “Digitalization” strategy centered around “IT/OT Data Convergence” and “High-Fidelity Data.”

#AIFactory #DataCenter #LiquidCooling #WorkloadAware #ITOTConvergence #HighFidelityData #Digitalization #AIInfrastructure

With Gemini

AI factory (MWC26)

Posted on 2026-03-02 by lechuck park

Third

Posted on 2026-03-012026-03-02 by lechuck park

Hola

Posted on 2026-02-28 by lechuck park

SRE for AI Factory

Posted on 2026-02-272026-02-26 by lechuck park

Comprehensive Image Analysis: SRE for AI Factory

1. Operational Evolution (Bottom Flow)

Human Operating (Traditional DC): Depicts the legacy stage where manual intervention and physical inspections are the primary means of management.
Digital Operating: A transitional phase represented by dashboards and data visualization, moving toward data-informed decision-making.
AI Agent Operating (AI Factory): The future state where autonomous AI agents (like your AIDA platform) manage complex infrastructures with minimal human oversight.

2. Shift in Core Methodology (Top Transition)

Facility-First Operation: Focuses on the physical health of hardware (Transformers, Cooling units) to ensure basic uptime.
Software-Defined Operation (Highlighted): The centerpiece of the transition. It treats infrastructure as code, using software logic and AI to control physical assets dynamically.

3. The Solution: SRE (Site Reliability Engineering)

The image identifies SRE as the definitive answer to the question “Who can care for it?” by applying three technical pillars:

Advanced Observability: Moving beyond binary alerts to deep Correlation Analysis of power and cooling data.
Error Budget Management: Quantitatively Balancing Efficiency (PUE) vs. Reliability to push performance without risking failure.
Toil Reduction & Automation: Achieving scalability through Autonomous AI Control, eliminating repetitive manual tasks.

3-Line Summary

Paradigm Shift: Evolution from hardware-centric “Facility-First” management to code-driven “Software-Defined Operation.”
The Role of SRE: Implementation of SRE principles is the essential bridge to managing the high complexity of AI Factories.
Operational Pillars: Success relies on Advanced Observability, Error Budgeting (PUE optimization), and Toil Reduction via AI automation.

#AIFactory #SRE #SoftwareDefinedOperation #AIOps #DataCenterAutomation #Observability #InfrastructureAsCode

with Gemini