Data in AI DC

This image illustrates a data monitoring system for an AI data center server room. Titled “Data in AI DC Server Room,” it depicts the relationships between key elements being monitored in the data center.

The system consists of four main components, each with detailed metrics:

  1. GPU Workload – Right center
    • Computing Load: GPU utilization rate (%) and type of computational tasks (training vs. inference)
    • Power Consumption: Real-time power consumption of each GPU (W) – Example: NVIDIA H100 GPU consumes up to 700W
    • Workload Pattern: Periodicity of workload (peak/off-peak times) and predictability
    • Memory Usage: GPU memory usage patterns (e.g., HBM3 memory bandwidth usage)
  2. Power Infrastructure – Left
    • Power Usage: Real-time power output and efficiency of UPS, PDU, and transformers
    • Power Quality: Voltage, frequency stability, and power loss rate
    • Power Capacity: Types and proportions of supplied energy, ensuring sufficient power availability for current workload operations
  3. Cooling System – Right
    • Cooling Device Status: Air-cooling fan speed (RPM), liquid cooling pump flow rate (LPM), and coolant temperature (°C)
    • Environmental Conditions: Data center internal temperature, humidity, air pressure, and hot/cold zone temperatures – critical for server operations
    • Cooling Efficiency: Power Usage Effectiveness (PUE) and proportion of power consumed by the cooling system
  4. Server/Rack – Top center
    • Rack Power Density: Power consumption per rack (kW) – Example: GPU server racks range from 30 to 120 kW
    • Temperature Profile: Temperature (°C) of GPUs, CPUs, memory modules, and heat distribution
    • Server Status: Operational state of servers (active/standby) and workload distribution status

The workflow sequence indicated at the bottom of the diagram represents:

  1. ① GPU WORK: Initial execution of AI workloads – GPU computational tasks begin, generating system load
  2. ② with POWER USE: Increased power supply for GPU operations – Power demand increases with GPU workload, and power infrastructure responds accordingly
  3. ③ COOLING WORK: Cooling processes activated in response to heat generation
    • Sensing: Temperature sensors detect server and rack thermal conditions, monitoring hot/cold zone temperature differentials
    • Analysis: Analysis of collected temperature data, determining cooling requirements
    • Action: Adjustment of cooling equipment (fan speed, coolant flow rate, etc. automatically regulated)
  4. ④ SERVER OK: Maintenance of normal server operation through proper power supply and cooling – Temperature and power remain stable, allowing GPU workloads to continue running under optimal conditions

The arrows indicate data flow and interrelationships between systems, showing connections from power infrastructure to servers and from cooling systems to servers. This integrated system enables efficient and stable data center operation by detecting increased power demand and heat generation from GPU workloads, and adjusting cooling systems in real-time accordingly.

With Claude

Monitoring is from changes

Change-Based Monitoring System Analysis

This diagram illustrates a systematic framework for “Monitoring is from changes.” The approach demonstrates a hierarchical structure that begins with simple, certain methods and progresses toward increasingly complex analytical techniques.

Flow of Major Analysis Stages:

  1. One Change Detection:
    • The most fundamental level, identifying simple fluctuations such as numerical changes (5→7).
    • This stage focuses on capturing immediate and clear variations.
  2. Trend Analysis:
    • Recognizes data patterns over time.
    • Moves beyond single changes to understand the directionality and flow of data.
  3. Statistical Analysis:
    • Employs deeper mathematical approaches to interpret data.
    • Utilizes means, variances, correlations, and other statistical measures to derive meaning.
  4. Deep Learning:
    • The most sophisticated analysis stage, using advanced algorithms to discover hidden patterns.
    • Capable of learning complex relationships from large volumes of data.

Evolution Flow of Detection Processes:

  1. Change Detection:
    • The initial stage of detecting basic changes occurring in the system.
    • Identifies numerical variations that deviate from baseline values (e.g., 5→7).
    • Change detection serves as the starting point for the monitoring process and forms the foundation for more complex analyses.
  2. Anomaly Detection:
    • A more advanced form than change detection, identifying abnormal data points that deviate from general patterns or expected ranges.
    • Illustrated in the diagram with a warning icon, representing early signs of potential issues.
    • Utilizes statistical analysis and trend data to detect phenomena outside the normal range.
  3. Abnormal (Error) Detection:
    • The most severe level of detection, identifying actual errors or failures within the system.
    • Shown in the diagram with an X mark, signifying critical issues requiring immediate action.
    • May be classified as a failure when anomaly detection persists or exceeds thresholds.

Supporting Functions:

  • Adding New Relative Data: Continuously collecting relevant data to improve analytical accuracy.
  • Higher Resolution: Utilizing more granular data to enhance analytical precision.

This framework demonstrates a logical progression from simple and certain to gradually more complex analyses. The hierarchical structure of the detection process—from change detection through anomaly detection to error detection—shows how monitoring systems identify and respond to increasingly serious issues.

With Claude

AI together!!

This diagram titled “AI together!!” illustrates a comprehensive architecture for AI-powered question-answering systems, focusing on the integration of user data, tools, and AI models through standardized protocols.

Key Components:

  1. Left Area (Blue) – User Side:
    • Prompt: The entry point for user queries, represented by a UI interface with chat elements
    • RAG (Retrieval Augmented Generation): A system that enhances AI responses by retrieving relevant information from user data sources
    • My Data: User’s personal data repositories shown as spreadsheets and databases
    • My Tool: Custom tools that can be integrated into the workflow
  2. Right Area (Purple) – AI Model Side:
    • AI Model (foundation): The core AI foundation model represented by a robot icon
    • MOE (Mixture Of Experts): A system that combines multiple specialized AI models for improved performance
    • Domain Specific AI Model: Specialized AI models trained for particular domains or tasks
    • External or Internet: Connection to external knowledge sources and internet resources
  3. Center Area (Green) – Connection Standard:
    • MCP (Model Context Protocol): A standardized protocol that facilitates communication between user-side components and AI models, labeled as “Standard of Connecting”

Information Flow:

  • Questions flow from the prompt interface on the left to the AI models on the right
  • Answers are generated by the AI models and returned to the user interface
  • The RAG system augments queries with relevant information from the user’s data
  • Semantic Search provides additional connections between components
  • All interactions are standardized through the MCP framework

This architecture demonstrates how personal data and custom tools can be seamlessly integrated with foundation and specialized AI models to create a more personalized, context-aware AI system that delivers more accurate and relevant responses to user queries.

With Claude

Nice Action

This “Nice Action” diagram illustrates how decision-making processes work similarly for both humans and AI:

  1. Dual Structure of All Choices: Every decision inherently consists of elements of certainty and uncertainty.
  2. Certainty Expansion Strategy: The first step “① Expansion ‘Certain’ First” demonstrates the strategy of maximizing the use of already certain information. This establishes a foundation for decision-making based on known facts.
  3. Uncertainty Upgrade: The second step “② Upgrade Possibility to near 100%” represents the process of increasing the probability of uncertain elements to bring them as close as possible to certainty. While complete certainty cannot be achieved for all elements, obtaining sufficiently high probability enhances the reliability of decisions.
  4. Similarity to Machine Learning and AI: This decision-making model is remarkably similar to how modern machine learning and AI function. AI systems also operate based on certain data (learned patterns) and use probabilistic approaches for uncertain elements to derive optimal decisions.
  5. Transition to Action: Once sufficient certainty is established, the final “ACTION” step can be taken to implement the decision.

This diagram provides insight into how human intuitive decision-making and AI’s algorithmic approach fundamentally follow the same principle—maximizing certainty while managing uncertainty to an acceptable level. The “AI, too” notation explicitly emphasizes this similarity.

With Claude

Data Security

The image shows a comprehensive data security diagram with three main approaches to securing data systems. Let me explain each section:

  1. Left Section – “Easy and Perfect”:
    • Features data encryption for secure storage
    • Implements the “3A” security principles: Accounting (with Auditing), Authentication, and Authorization
    • Shows server hardware protected by physical security (guard)
    • Represents a straightforward but effective security approach
  2. Middle Section – “More complex but more vulnerable??”:
    • Shows an IP network architecture with:
      • Server IP and service port restrictions
      • TCP/IP layer security
      • Access Control Lists
      • Authorized IP only policy
      • Authorized terminal restrictions
      • Personnel authorization controls
  3. Right Section – “End to End”:
    • Divides security between Private Network and Public Network
    • Includes:
      • Application layer security
      • Packet/Payload analysis
      • Access Permission First principle
      • Authorized Access Agent Tool restrictions
      • “Perfect Personnel Data/Network” security approach
      • Unspecified Access concerns (shown with question mark)

The diagram illustrates the evolution of data security approaches from simpler encryption and authentication methods to more complex network security architectures, and finally to comprehensive end-to-end security solutions. The diagram questions whether more complex systems might actually introduce more vulnerabilities, suggesting that complexity doesn’t always equal better security.

With Claude