Download and test a Hugging Face model
Goal
By the end of this tutorial you will have:
- Set up a Python environment on an ECI VM
- Understood the Hugging Face Hub download and cache layout
- Run text-classification inference with the Transformers library
A CPU VM is fine for this exercise
You don't need a GPU; a CPU instance works too. Large models will simply take longer.
Step 1: Create the VM
| Field | Value |
|---|---|
| Instance type | GPU or CPU instance |
| Image | Ubuntu 22.04 |
| Public IP | Create new (needed for the internet to download models) |
Step 2: Install the environment
pip install transformers torch accelerate
Step 3: Download a model and run inference
# test_model.py
from transformers import pipeline
# Model is downloaded automatically into ~/.cache/huggingface/hub/
classifier = pipeline(
"text-classification",
model="snunlp/KR-FinBert-SC"
)
# Inference
texts = [
"The KOSPI rose 2% today.",
"Corporate earnings fell well short of expectations.",
"A new product launch date was announced."
]
for text in texts:
result = classifier(text)[0]
print(f"{text[:30]}... → {result['label']} ({result['score']:.2f})")
python3 test_model.py
Step 4: Cache management
Models are cached under ~/.cache/huggingface/hub/. Running the same model again loads from cache without downloading.
# Cached models
ls ~/.cache/huggingface/hub/
# Cache size
du -sh ~/.cache/huggingface/
If your block storage is filling up, move the cache:
export HF_HOME=/data/huggingface_cache
Step 5: GPU acceleration
On a GPU instance, set device=0 or device="cuda":
classifier = pipeline(
"text-classification",
model="snunlp/KR-FinBert-SC",
device=0 # use GPU
)
Next steps
- Deploy an API server with FastAPI: wrap the model as an API server
- Run LLM inference: run a large language model