Accelerating RL Research with Elice Cloud: A Case Study from the University of Minnesota

Hansol Park

5/23/2025

How Zae Myung Kim, a Ph.D. student at the University of Minnesota, accelerated RL training by 3× using Elice Cloud’s H100 infrastructure—meeting his NeurIPS deadline on time.

고객사례_클라우드_미네소타2505.png

“Switching to H100s on Elice Cloud gave us a 2–3× training speedup and literally made the difference between meeting our paper deadline or not.” — Zae Myung Kim, Ph.D. student, University of Minnesota

About the Researcher

고객사례_클라우드.png
Zae Myung Kim is a fourth-year Ph.D. student in the Minnesota NLP group at the University of Minnesota Twin Cities, advised by Prof. Dong Yeop Kang. His research explores how to embed more structure and intentionality into the training of large language models (LLMs). His work focuses on a “meta-scaffolding” approach—folding in elements like discourse structure, dataset metadata, and feedback into the training loop to create models that are more stable, coherent, and interpretable, while also reducing the burden of data, compute, and brittle prompt engineering.

Research Topic and Collaboration with Elice

In his recent work, Kim introduced a method called Meta Policy Optimization (MPO), developed in collaboration with Elice. MPO addresses two major pain points in RL-based LLM training: reward hacking and the need for manual prompt-tuning of reward models. The solution involves a meta-reward model that dynamically updates the reward model’s prompts during training. This makes the reward signal more adaptive, harder to exploit, and less reliant on manual tuning—offering a flexible and scalable alignment strategy for tasks like essay scoring or math proof evaluation.

Infrastructure Challenges Before Elice

Prior to using Elice Cloud, Kim conducted experiments on nodes with 8×A100 GPUs. While sufficient in the past, these setups became bottlenecks in online RL scenarios, where the reward is generated by another LLM. The limited memory required heavy gradient accumulation, which prolonged training time and made experimentation slow and rigid.

Why Elice Cloud?

Several factors made Elice Cloud the right choice:

Instant access to H100 GPUs without waitlists
Smooth and intuitive UX for launching Jupyter, VS Code, SSH, and managing environments
Pre-installed environments, saving hours of setup time
Fast and responsive support — custom kernel and storage requests were handled within 1 hour
Affordable pricing for high-performance compute

Performance Gains with G-NHHS-320 (H100 x 4)

Using Elice Cloud’s H100 setup delivered a 2–3× training speedup compared to 8×A100s, thanks to FP8 acceleration and improved memory bandwidth. Inference speed, critical for Kim’s online RL setup, improved dramatically—enabling prompt schedule tuning and PPO hyperparameter adjustment in just one afternoon, rather than over a weekend.

Time and Cost Savings

To meet a paper deadline, Kim needed to run five full MPO experiments and several ablations. On A100s, each run took roughly 12 GPU-days. On H100s, the same experiments were completed in 4 GPU-days—a decisive improvement that ensured timely submission and reduced compute usage.

Why Elice Works for Academia

“Elice actually has H100s available when others don’t. And students can run real experiments without needing DevOps support. Whether it’s for a paper or a class project, it lowers the barrier to getting started.”

A Memorable Collaboration Moment

“At one point, I hit a CUDA driver issue and contacted support. Someone from the Elice team jumped into a shared tmux session, patched it live, and had me back up and running in under 10 minutes. That level of responsiveness really stuck with me. I’ve been recommending Elice to my peers in the U.S., and if our joint paper is accepted, I think it’ll be a great case study.”

Final Thoughts

“If you want to focus on research instead of managing infrastructure, Elice Cloud is a solid choice. It’s fast, frictionless, and gives you direct access to state-of-the-art GPUs—no DevOps bottlenecks required.”

Train faster, experiment more, and meet your deadlines—with Elice Cloud.

👉 Get started

Reference
Meta Policy Optimization for Adaptive Reward Modeling
Zae Myung Kim, Chanwoo Park, Vipul Raheja, Suin Kim, Dongyeop Kang
Submitted to NeurIPS 2024
arXiv:2504.20157