Published onJuly 26, 2024GPU Infrastructure for the Modern AI TeamCluster-ManagementLoggingClusterGPUsTake Control of your GPU Infrastructure with Trainy's lightweight management layer.
Published onJuly 11, 2024Automatic GPU Node Health and Pod SchedulingClusterCluster-ManagementGPUsLoggingMetricsAutomatically isolate faulty nodes and schedule only on healthy ones.
Published onJune 15, 2024Zero Instrumentation GPU Metrics with DCGM on Trainy: KonduktorTrainingClustersGPUsMonitor your GPU clusters with metrics.
Published onJune 7, 2024GPU Fabric Preflight Checks for MultiNode TrainingTrainingRun these benchmarks before training on multiple machines.
Published onNovember 21, 2023Function Calling with Gorilla LLMServingFunction-CallingHost your own function calling LLM.