Deploy a Hugging Face model with FastAPI
Goal
By the end of this tutorial you will have:
- Wrapped a Hugging Face model in a FastAPI server
- Received predictions via HTTP POST
- Run the server in the background and accessed it from outside
Step 1: Prepare the environment
pip install fastapi uvicorn transformers torch accelerate
Step 2: API server code
# server.py
from fastapi import FastAPI
from pydantic import BaseModel
from transformers import pipeline
import torch
app = FastAPI()
# Initialize the model once at startup
device = 0 if torch.cuda.is_available() else -1
classifier = pipeline(
"text-classification",
model="snunlp/KR-FinBert-SC",
device=device
)
class TextRequest(BaseModel):
text: str
class PredictionResponse(BaseModel):
label: str
score: float
@app.post("/predict", response_model=PredictionResponse)
def predict(req: TextRequest):
result = classifier(req.text)[0]
return PredictionResponse(label=result["label"], score=result["score"])
@app.get("/health")
def health():
return {"status": "ok"}
Step 3: Run the server
# Foreground (for testing)
uvicorn server:app --host 0.0.0.0 --port 8000
# Background
nohup uvicorn server:app --host 0.0.0.0 --port 8000 >> server.log 2>&1 &
Step 4: Firewall and test
In the virtual network's Firewall rules on its detail page, add a rule allowing TCP 8000 (see Firewall for details). Changes take effect within one minute.
# Health check
curl http://<PUBLIC_IP>:8000/health
# Inference request
curl -X POST http://<PUBLIC_IP>:8000/predict \
-H "Content-Type: application/json" \
-d '{"text": "The stock price jumped sharply today."}'
Python client:
import requests
response = requests.post(
"http://<PUBLIC_IP>:8000/predict",
json={"text": "The stock price jumped sharply today."}
)
print(response.json())
# {"label": "positive", "score": 0.98}
Auto-start (systemd)
To have the server start automatically on reboot:
USER_NAME=$(whoami)
sudo tee /etc/systemd/system/ml-api.service >/dev/null <<EOF
[Unit]
Description=ML API Server
After=network.target
[Service]
User=${USER_NAME}
WorkingDirectory=/home/${USER_NAME}
ExecStart=/usr/bin/python3 -m uvicorn server:app --host 0.0.0.0 --port 8000
Restart=always
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl enable --now ml-api
Next steps
- Hugging Face model test: verify behavior across different models
- Firewall configuration: open the API port externally