Skip to main content

Parallel file system (PFS)

Overview

The parallel file system (PFS) is a high-performance shared storage that multiple VMs can read and write at the same time. It is provided on VAST NFS, and is used for checkpoint and log sharing in multi-node distributed training.

PFS vs object storage

PFS is a POSIX filesystem, so you can use it with existing code unchanged. Object storage requires API calls.


Prerequisites

  1. Under the left-menu Parallel File System page, click Create Parallel File System to create a PFS.
  2. On the PFS detail page, click Add VM to register VMs as members. Two options are supported: Add from cluster or Add individual VM.
  3. Note the following values from the PFS detail page's general info:
ItemDescriptionExample
VIP addressesPFS server access IP range (vip_pools)10.121.255.247–10.121.255.254
Virtual function IP (VF IP)Per-member-VM dedicated access IP10.121.0.1/16, 10.121.0.2/16
Member VMs have restrictions on delete and edit

A VM attached to a PFS cannot be deleted directly and some fields cannot be edited (you'll see parallelFileSystemMemberExist). Remove it from the PFS members first, then delete the VM.


Step 1: Check the vastnfs driver status

Run the command below to check that the vastnfs driver is working.

vastnfs-ctl status

If the output looks like the example below, skip to Step 2.

version: 4.5.5-vastdata-OFED-internal-26.01-1.0.0
kernel modules: sunrpc
services: rpcbind.socket rpcbind
rpc_pipefs: /run/rpc_pipefs
If the output does not look like the example below
patched version not running

Reinstall the driver, then proceed to Step 2: Mount.


Step 2: Mount

Before running the commands below, look up the What, nconnect, and remoteports values on the parallel file system detail page.

  • For What=, use the first IP in the VIP range.
  • For nconnect=, use the number of VIP addresses.
  • For remoteports=, use the full VIP address range.
sudo tee /etc/systemd/system/mnt-pfs.mount >/dev/null <<'EOF'
[Unit]
Description=Mount VAST NFS PFS
After=network-online.target
Wants=network-online.target

[Mount]
What=10.121.255.247:/
Where=/mnt/pfs
Type=nfs
Options=vers=3,nconnect=8,noatime,remoteports=10.121.255.247-10.121.255.254,_netdev

[Install]
WantedBy=multi-user.target
EOF

Then create the mount directory and activate the unit.

sudo mkdir -p /mnt/pfs
sudo systemctl daemon-reload
sudo systemctl enable --now mnt-pfs.mount

Verify the mount

df -h | grep pfs
# /mnt/pfs in the output means it's mounted

Next steps