Pricing

Scale your AI infrastructure on-demand or run on dedicated GPUs. Choose the plan that fits your team's computing needs.

On-Demand
$3.60 per GPU per hour
High-Performance Cluster
8xH100 GPUs  •  80GB memory each (SXM5)  •  3.2 TB/s Infiniband connectivity
Zero code changes required
Multi-node training support
High-bandwidth networking
Cross-cloud compatibility
Priority queuing system
Reserved
Custom pricing
All On-Demand features
Dedicated GPU allocation
Advanced monitoring
Cluster utilization insights
Enterprise SLA
From local cluster to training across 64 H100s in under an hour - the fastest, most seamless GPU cloud setup we've experienced after using SLURM, Azure, GCP, Mosaic, Foundry, Together, and Lambda
Features
On Demand
Book a demo
Base Price
$3.60/GPU hour + Cloud costs
Starting at $50,000/year
Billing Period
Usage-based
Annual contract
Infrastructure
GPU Resource
8xH100 (80 GB SXM5) + 3.2Tb/s Infiniband
Blackwell (Contact sales)
All NVIDIA Data Center GPUs
Multi-node Training
High-bandwidth Networking
Cross-cloud Compatibility
Dedicated GPU Allocation
Management
Dashboard Access
Queue Management
Team Access Controls
Automated Job Failure Recovery
GPU health monitoring
Support
Setup Time
20 minutes
2-3 days
Customer Support
24x7 Always-On Support Available
24x7 Always-On Support Available
SLA
99.5% Uptime SLA
99.5% Uptime SLA
FAQ

Frequently Asked Questions.

How do I submit jobs with Trainy?

Submitting jobs in Trainy’s platform is done via a simple yaml file that can work across clouds. You just need to enter your existing torchrun or equivalent launch command and our platform handles the rest. Read our docs for more details.

Is Trainy a Cloud Provider?

No. For most of our customers, we help them pick a cloud provider offering that makes the most sense for their specific use case. We then assist with hardware validation to ensure they are getting the promised performance. If you already have a reserved GPU cluster, our solution can be deployed in the cloud or on-prem. For startups, we can help you go from cloud credits to a functional multinode training setup with high bandwidth networking in < 20 mins.

Should my AI team access GPUs via On-Demand or Reserved?

Most Trainy customers use a hybrid of both on-demand and reserved clusters. For inference servers and dev boxes, it generally makes sense for an AI team to have a couple annually reserved GPU instances. For large-scale training workloads, on-demand allows you to burst out to larger scale at a lower cost. As AI work is quite bursty by nature, teams use on-demand to reduce GPU spend.

Kubernetes seems too complicated. Why do I need software to manage my GPUs?

Kubernetes gives AI teams higher ROI on the same pool of compute. All top-tier AI research teams (OpenAI, Meta, etc.) have similar systems in place. With automated scheduling and cleanup of queued workloads, AI engineers never have to worry about GPU availability or compatibility. On the other hand, decision makers get improved visibility and control into their team’s cluster usage and can make informed purchasing decisions.

What are the benefits of Trainy over a tool like Slurm?

Trainy offers all of the resource sharing and scheduling benefits of Slurm with much more. Teams get better workload isolation via containerization, integrated observability, and improved robustness with comprehensive health monitoring.

How does Trainy cut GPU costs?

The first step to reducing GPU spend is cutting idle time. If you have a reserved cluster, this means having a fault-tolerant scheduler in place. A scheduler allows your team to maintain a workload queue and keep your GPUs busy 24/7, while fault-tolerance ensures that GPU failures do not require manual restarts. New and restarted workloads are placed on healthy nodes — even if they fail in the middle of the night. Once idle time has been minimized, step 2 is to look at your workload efficiency. The advanced performance metrics visible in Trainy’s platform make it easy to determine how well a workload has been optimized.

How do I connect data sources to my GPU cluster with Trainy’s platform?

Most Trainy customers stream data into their GPU cluster from object store such as Cloudflare R2. In the longer term, we are looking at distributed file system integrations, but this does not exist today.

Can I use Trainy to manage multi-cloud environments?

We can give your team access to multiple K8s clusters corresponding to different clouds, but jobs are submitted to one cluster at a time.

What is the best time to start working with Trainy?

The earlier, the better. When your company is exploring gen AI applications, on-demand clusters are a cost effective way to run large scale experiments. When the time comes to choose a cloud provider, we work with you to navigate cloud provider offerings, and ensure you are getting maximum performance.

Ready to scale your AI training? Get enterprise-grade GPU infrastructure up and running in 20 minutes.