LG AI Research Validates Model Inference Performance and Stability on an NVIDIA B200 128‑GPU Cluster

Elice

11/28/2025

Customer

LG AI Research

Introduced services

Cloud

Elice Cloud, together with LG AI Research, validated a 128‑GPU NVIDIA B200 inference environment, demonstrating production‑grade infrastructure for frontier‑scale AI.

Elice Cloud provides high‑performance AI infrastructure that enables large enterprise customers to validate their environments at real research scale. Among its offerings, the B200‑based high‑performance GPU clusters deliver an optimized environment for large‑scale model training and inference, engineered to reliably handle massive computational workloads.

LG AI Research, one of only five teams selected for Korea’s independent AI foundation model initiative, ran both its EXAONE 4.0 32B model and leading global models on this infrastructure. The goal was to confirm whether Elice Cloud could fully meet the demanding requirements of a top‑tier research organization. Rather than a simple benchmark, this project faithfully replicated LG AI Research’s real‑world research environment, making it a meaningful end‑to‑end validation of large‑scale model operations.

→ Explore how NC AI validated B200 performance on Elice Cloud for its independent AI foundation model initiative

Validation Objectives and Approach

Elice Cloud partnered with LG AI Research to rigorously evaluate inference performance and operational stability in a next‑generation model research environment. The core goals of the project were to verify whether Elice Cloud can reliably handle large‑scale models, and to ensure that the infrastructure can scale to future research and production services without operational bottlenecks. To this end, the team composed test scenarios that closely mirrored real‑world research conditions by combining a wide range of factors, including long input sequence lengths, multiple precision modes, and different framework configurations, in order to objectively assess the reliability of the platform as a whole.

Because research infrastructure cannot be judged by model performance alone, the team also examined potential issues across multiple stages of operation: VM creation and reconfiguration flows, internal and external network request handling, disk I/O stability, and InfiniBand configuration, among others. In large‑scale model research environments, not only inference latency and throughput but also the smoothness of the entire operational workflow is critical.

Global Cloud Provider’s H200 Cluster vs. Elice Cloud’s B200 Cluster

The project ran for roughly three weeks, from October 2 to 19, 2025. Tests were conducted on a 128‑GPU NVIDIA B200 cluster, using a 128‑GPU H200 cluster from a global cloud provider as the primary point of comparison. LG AI Research validated both performance and runtime stability by running representative LLMs, including EXAONE, on vLLM‑ and SGLang‑based stacks. A 128‑way B200 cluster is a scale that even major research labs rarely get to use in practice, so the very act of operating real models at this magnitude served as a strong indicator of Elice Cloud’s ability to run and manage large‑scale GPU infrastructure.

The team also varied input and output lengths by model to analyze how TTFT (time to first token) and throughput changed under different conditions. This ensured that the platform was not just fast in a few narrow benchmark scenarios, but could deliver consistent performance across the diverse input patterns encountered in real‑world workloads. For large‑scale research environments, earning trust in a project requires demonstrating that the system operates reliably across a wide range of conditions—not just at peak numbers under ideal settings.

Up to 2.5× Higher Performance

Across the test suite, the B200 environment delivered roughly 1.5× to 2.5× higher performance compared to the H200 baseline. For EXAONE 4.0 32B on vLLM, TTFT was about 1.5× to 1.9× faster, while throughput was approximately 1.9× to 2.4× higher. Performance remained stable even with long input sequences, and the system maintained consistent processing speed as model sizes increased. For LLM Model A (SGLang), TTFT improved by about 1.6× and throughput by around 1.7×. For LLM Model B (vLLM), throughput improved by more than 2.5×, highlighting the architectural advantages of the B200 for inference workloads.

From a stability perspective, GPU temperatures stayed mostly below 60°C, and no performance degradation or interruptions were observed due to thermal issues. This is a critical evaluation factor for research organizations that must run large models continuously over long periods. The fact that stability remained similar across different frameworks also underscores that the environment can be used consistently across diverse operational setups.

테스트 결과 전반적으로 B200 환경은 H200 대비 약 1.5~2.5배 수준의 성능 향상을 보이고 있다.

Validation Results from a Research Infrastructure Operations Perspective

From an operational standpoint, the team evaluated the network, storage, and VM lifecycle together. During large‑scale data uploads and model execution, the environment delivered the required level of network performance and stability, and on the internal network it was possible to run experiments without major issues even under heavy request loads. When connecting to external networks, there were a few instances of dropped requests; however, these were classified as requiring separate analysis, as they may be attributable to a mix of factors such as differences between cloud providers’ environments or characteristics of the network path.

Disk performance on NVMe volumes met the read/write throughput required for large‑scale model inference. During the initial infrastructure setup, some parameter tuning was needed for the container runtime and InfiniBand networking, but after full‑scale experiments began, there were no significant operational issues caused by disk bottlenecks or network latency. The team documented several items as improvement points for real‑world operations—such as VM recreation flows, storage and network interface re‑assignment, and console session time limits—while also observing that overall operational efficiency improved over time, including reduced boot times compared to the initial state.

Elice Cloud’s B200 Cluster, Proven Stable for Enterprise‑Grade Research Organizations

This project went beyond simply measuring large‑model inference performance; it demonstrated that Elice Cloud can meet the scale and conditions required by enterprise research organizations in real production‑like environments. Together with LG AI Research, the team achieved consistent results across a variety of models, precision modes, and framework configurations, and confirmed that performance remained stable even under realistic workloads such as long sequence inputs.

In particular, the fact that a 128‑GPU B200 cluster operated stably under real large‑model workloads shows that Elice Cloud’s infrastructure is more than sufficient as a foundation for both research and production services. The infrastructure’s credibility is further strengthened by the fact that it has been directly validated by a large enterprise customer. Through this project, Elice Cloud once again proved its ability to operate large‑scale model research environments and will continue to deliver reliable, high‑performance AI infrastructure to leading research organizations and enterprises across industries.

Planning to transition your research environment to B200?

Talk to an expert

LG AI Research Validates Model Inference Performance and Stability on an NVIDIA B200 128‑GPU Cluster

Validation Objectives and Approach

Global Cloud Provider’s H200 Cluster vs. Elice Cloud’s B200 Cluster

Up to 2.5× Higher Performance

Validation Results from a Research Infrastructure Operations Perspective

Elice Cloud’s B200 Cluster, Proven Stable for Enterprise‑Grade Research Organizations

Planning to transition your research environment to B200?

Related posts

Hyundai Motor Company Namyang R&D Center: Building a 99%-Accurate Image Search and Classification System for Crash Testing

Soombit AI Trains on 14 Million Medical Images, Pursuing Korea’s First Regulatory Approval for a Generative AI Medical Device

NC AI’s NVIDIA B200 Proof of Concept | Computing Infrastructure Validation for an Independent AI Foundation Company

Want to supercharge your AI with Elice Cloud?