Connect a GPU VM
No Kubernetes required. Connect any single GPU (or CPU) server and CaseDesk will install the runtime, pull the model, and give you an API endpoint.
Supported providers: AWS EC2, Hetzner, Lambda Labs, DigitalOcean, Google Cloud VMs, Azure VMs, or any server with a public IP.
What you need before you start
- A server with a public IP address and SSH access
- Docker installed, or a user with permission to install it (CaseDesk will install Docker if it is absent)
- If using GPU: NVIDIA drivers installed on the server
- An SSH private key for the server (RSA or Ed25519)
:::tip CPU-only servers are supported Ollama runs on CPU. Smaller models like Llama 3.2 1B and Qwen2.5 0.5B run acceptably without a GPU. Position GPU as recommended, not required. :::
Recommended: create a dedicated SSH key
Rather than using your main server key, generate a key pair specifically for CaseDesk:
ssh-keygen -t ed25519 -C "casedesk-byoc" -f ~/.ssh/casedesk_byoc
Add the public key to your server's ~/.ssh/authorized_keys, then upload the private key to CaseDesk. You can delete it from CaseDesk at any time from the VM settings page once your deployment is running.
Connect the VM
- In CaseDesk, go to Connect VM → SSH Bootstrap
- Enter a label for the server (e.g.
hetzner-gpu-1) - Enter the public IP address
- Paste or upload your SSH private key
- Set the username (default:
root; for AWS EC2 useubuntuorec2-user) - Click Test connection — CaseDesk will verify SSH access before saving
- Once the connection test passes, click Save
What CaseDesk does on your server
CaseDesk SSHes into your server once to bootstrap:
- Checks if Docker is installed; installs it if absent
- Runs
docker run -d --gpus all -p 11434:11434 ollama/ollama(falls back to CPU if no GPU detected) - Pulls the selected model:
docker exec ollama ollama pull <model-tag> - Registers the endpoint and proxies it through your CaseDesk dashboard URL
After bootstrap, the endpoint is live. CaseDesk does not maintain a persistent SSH session — it reconnects only when you trigger a new deployment.
Security notes
- Your SSH private key is encrypted with AES-256-GCM before being written to the database
- The encryption key is stored in AWS Secrets Manager, not in application config or environment variables
- Every access to the key is audit-logged
- You can delete the key at any time from the VM settings page — your running deployment continues to operate without it
- Ollama binds to
127.0.0.1:11434inside the container and is not exposed publicly — traffic reaches it only via the CaseDesk proxy
Non-root users
If your server requires a non-root user (e.g. ubuntu on AWS EC2), that user must be in the docker group:
sudo usermod -aG docker ubuntu
Log out and back in for the group change to take effect, then test the connection from CaseDesk.
Troubleshooting
Connection test fails — verify the IP is correct, the server is reachable on port 22, and the private key matches the public key on the server.
Deployment stays pending — SSH into your server and check docker logs ollama. The model pull can take several minutes depending on model size and server bandwidth.
GPU not detected — verify NVIDIA drivers are installed (nvidia-smi) and the NVIDIA Container Toolkit is set up (nvidia-ctk). CaseDesk will fall back to CPU automatically if no GPU is found.