Download and test a Hugging Face model

Goal

By the end of this tutorial you will have:

Set up a Python environment on an ECI VM
Understood the Hugging Face Hub download and cache layout
Run text-classification inference with the Transformers library

A CPU VM is fine for this exercise

You don't need a GPU; a CPU instance works too. Large models will simply take longer.

Step 1: Create the VM

Field	Value
Instance type	GPU or CPU instance
Image	Ubuntu 22.04
Public IP	Create new (needed for the internet to download models)

Step 2: Install the environment

pip install transformers torch accelerate

Step 3: Download a model and run inference

# test_model.py
from transformers import pipeline

# Model is downloaded automatically into ~/.cache/huggingface/hub/
classifier = pipeline(
    "text-classification",
    model="snunlp/KR-FinBert-SC"
)

# Inference
texts = [
    "The KOSPI rose 2% today.",
    "Corporate earnings fell well short of expectations.",
    "A new product launch date was announced."
]

for text in texts:
    result = classifier(text)[0]
    print(f"{text[:30]}... → {result['label']} ({result['score']:.2f})")

python3 test_model.py

Step 4: Cache management

Models are cached under ~/.cache/huggingface/hub/. Running the same model again loads from cache without downloading.

# Cached models
ls ~/.cache/huggingface/hub/

# Cache size
du -sh ~/.cache/huggingface/

If your block storage is filling up, move the cache:

export HF_HOME=/data/huggingface_cache

Step 5: GPU acceleration

On a GPU instance, set device=0 or device="cuda":

classifier = pipeline(
    "text-classification",
    model="snunlp/KR-FinBert-SC",
    device=0  # use GPU
)

Next steps

Deploy an API server with FastAPI: wrap the model as an API server
Run LLM inference: run a large language model

Goal​

Step 1: Create the VM​

Step 2: Install the environment​

Step 3: Download a model and run inference​

Step 4: Cache management​

Step 5: GPU acceleration​

Next steps​