Public cloud — AI & Machine Learning
Build models from scratch on the AI Platform. Call ready-made models through AI Endpoints. Or rent the raw GPU. Same operator, same SLA, same audit trail — whichever altitude you pick.
H200, H100, L40S, and L4 — current generation, hosted on intSignal infrastructure.
Build from scratch, call an API, or rent the GPU. Pick what fits.
At rest and in flight on every layer of the stack.
On-call rotation with incident response and post-incident reviews.
Three ways in
Most AI buyers fit one of three shapes. Pick the tier that matches where your team actually is — and slide between them as the project changes.
TIER 01
YOU HAVE A DATA SCIENCE TEAM
End-to-end workspace for building models from scratch. Notebooks, training jobs, deployment, inference — all under one operator, with versioned artifacts and reproducible runs.
TIER 02
YOU JUST NEED THE MODEL
Pre-built model APIs for embeddings, generation, classification, and translation. Pick a model from the catalog, call the endpoint, ship to production. No training, no GPU management.
TIER 03
YOU WANT THE RAW HARDWARE
Dedicated NVIDIA GPUs on intSignal infrastructure. H200 and H100 for large model training and inference. L40S for mixed AI and visual workloads. L4 for cost-efficient inference.
Tier 01 · AI Platform
Standard frameworks. Standard storage. Standard tracking. The platform handles the things that bog data science teams down — environment setup, GPU scheduling, artifact versioning, model serving — so you can focus on the model itself.
EXPLORE · PREPARE · PROTOTYPE
Managed JupyterLab environments with pre-built images for PyTorch, TensorFlow, and JAX. GPU-backed when you need them, CPU-only when you don't. Persistent volumes mean your work survives session restarts.
python · notebook.ipynb
# Standard imports. GPU available if attached.
import torch
import pandas as pd
from intsignal import storage
# Pull training data from intSignal storage
df = storage.read_parquet("s3://datasets/orders-2026.parquet")
print(f"Loaded {len(df):,} rows")
# Check what GPU we're on
if torch.cuda.is_available():
gpu = torch.cuda.get_device_name(0)
print(f"GPU: {gpu}")
# → GPU: NVIDIA H100 80GB HBM3
# Prototype model on a subsample
sample = df.sample(10_000, random_state=42)
# ... build / train / iterate ...SCHEDULED JOBS · MULTI-GPU · CHECKPOINTED
Submit training jobs that get scheduled on real GPUs without you managing the cluster. Distributed training across multiple GPUs or nodes for large models. Automatic checkpointing so a preempted job doesn't lose hours of progress.
yaml · harden.yml
# CIS hardening role applied to all production nodes
- name: Apply CIS baseline
hosts: production
become: yes
vars:
cis_level: "2"
audit_log_retention: "90d"
roles:
- role: intsignal.cis_hardening
tags: [hardening, compliance]
- role: intsignal.osquery
tags: [endpoint, monitoring]
- role: intsignal.log_forwarder
vars:
target: logs.intsignal.io
tags: [logging]
handlers:
- name: restart auditd
service: { name: auditd, state: restarted }CONTAINERIZED · VERSIONED · ROLLBACK-READY
Package your trained model as a container and deploy it to an inference endpoint. Built-in versioning means rollbacks are one command. Traffic-splitting lets you canary new model versions against production safely.
yaml · deploy.yaml
# Deploy a model version with canary traffic split
apiVersion: ai.intsignal.io/v1
kind: ModelDeployment
metadata:
name: order-classifier
spec:
model:
source: s3://models/classifier/v4.2/
runtime: pytorch:2.4-cuda12
resources:
gpu: l40s
count: 1
autoscale:
min: 1
max: 8
target_qps: 50
traffic:
- version: v4.1
weight: 90
- version: v4.2
weight: 10 # canarySERVE · SCALE · OBSERVE
Production-grade inference serving with the runtime tuned to the model — Triton for general PyTorch/TensorFlow, vLLM for LLMs, ONNX Runtime where it fits. Latency and throughput are observable. Failures are paged, not ignored.
python · client.py
# Call your deployed model from any client
import requests
resp = requests.post(
"https://inference.intsignal.io/order-classifier/v4.2/predict",
headers={"Authorization": f"Bearer {TOKEN}"},
json={
"inputs": [
{"text": "shipping address is in Quebec"},
{"text": "customer requested gift wrap"}
]
}
)
result = resp.json()
# {
# "predictions": [
# {"label": "international", "confidence": 0.94},
# {"label": "preference_note", "confidence": 0.87}
# ],
# "latency_ms": 32,
# "model_version": "v4.2"
# }EPHEMERAL · MULTI-NODE · TEAR-DOWN
Spin up an entire GPU cluster for a specific workload, run it, and tear it down. Useful for one-off large training runs, hyperparameter sweeps, or research bursts where standing capacity doesn't make sense.
shell · cluster-up.sh
# Spin up a 32-GPU H100 cluster for a 6-hour sweep
$ intsignal-ai cluster create \
--name hyperparam-sweep-may \
--gpu-type h100 \
--node-count 4 \
--gpus-per-node 8 \
--scheduler slurm \
--shared-storage 10TB \
--ttl 6h
→ cluster: cl-9b2c4f provisioning
→ ready: ~3 min
→ ssh: cl-9b2c4f.clusters.intsignal.io
→ teardown: 2026-05-21 14:00 UTC (auto)
# Run your sweep via SLURM
$ ssh cl-9b2c4f.clusters.intsignal.io
$ sbatch sweep.slurm
→ submitted 64 jobs across 32 GPUsTier 02 · AI API Services
AI Endpoints is a catalog of pre-built model APIs — embeddings, classification, generation, translation, vision, speech. Pick the model, get an API key, ship it. Three response patterns for three latency profiles.
CATALOG
browse · select · subscribe
Curated catalog of open-source and licensed models for common tasks. Each model lists its size, license, recommended use cases, and the GPU it runs on. Pin a version so updates don't break your application.
BASE
standard request/response
Synchronous request/response for most use cases. Predictable latency, fair-share scheduling, supports all models in the catalog. The default unless you specifically need fast or batch.
FAST
low-latency · reserved capacity
Same models, but with dedicated capacity and priority scheduling. Pick this for interactive use cases — search-as-you-type, voice assistants, anything where p99 latency matters more than per-call cost.
BATCH
offline · throughput-optimized
Submit a list of jobs, get back a results file. Costs less per call because we batch and run when capacity is available. Best for nightly enrichment, bulk classification, or backfill jobs.
OpenAI-compatible request shape so existing client libraries work without modification. Switch models by changing one parameter.
# Same shape, switch models freely
$ curl https://endpoints.intsignal.io/v1/chat/completions \
-H "Authorization: Bearer $INTSIGNAL_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.3-70b-instruct",
"messages": [
{"role": "user", "content": "Summarize this in 3 bullets..."}
]
}'
# → response in ~400ms (Fast API)
# → switch model: "mixtral-8x22b-instruct" — same shape
# → switch to embed: /v1/embeddings — same authTier 03 · GPU Cloud Infrastructure
Four current-generation NVIDIA GPUs on intSignal infrastructure. We handle the host, the network, the storage, the driver stack, the firmware. You handle your code.
Hardening and operating practices aligned to the frameworks your assessors recognize. intSignal is not the certified entity for most of these — we deliver the controls and evidence that make your audit possible. Where required, we partner with FedRAMP-authorized providers for federal scoping.
HARDENING
Engine hardening with documented exceptions.
SOC 2
Controls and evidence cadence ready for audit.
ISO
Cloud-services control narratives.
HIPAA
Encryption, access, audit; BAA via partner.
FEDERAL
Authorized hyperscaler regions integrated.
DATACENTER
Hosting facility carries its own attestations.
FAQ
If yours isn't here, ask in the consultation — we'd rather flag the awkward bits early than discover them in production.
If you have a data science team building custom models, start with the AI Platform. If you just need to call an existing model (chat, embeddings, vision, etc.), use AI Endpoints. If you want raw GPU access and bring your own software stack, use GPU Cloud. Most customers use two — for example, GPU Cloud for training runs plus AI Endpoints for the parts where a generic model is good enough.
For large LLM training (70B+) and long-context inference, H200. For most production LLM and serious AI training, H100. For mixed AI and visual workloads at smaller scale (7B–13B models, diffusion, vision), L40S. For cost-efficient inference and video pipelines, L4. We'll size it with you during onboarding — wrong-fit GPUs waste budget either way.
Yes — but that's actually AI Deploy on the AI Platform, not the Endpoints catalog. If you have a model you've trained or fine-tuned, deploy it via AI Deploy and call it from your application. The Endpoints catalog is for pre-curated models we operate on shared infrastructure.
Training data stays in your storage. We don't train against customer data, we don't pool data across customers, and we don't share data with model providers. The encryption-at-rest and access-control story is the same as the storage and database services.
Yes. AI Training supports multi-GPU and multi-node configurations with NVLink/NVSwitch on H100/H200 nodes. PyTorch DDP, FSDP, DeepSpeed, and Megatron-LM are all supported. For very large runs, Cluster on Demand gives you an ephemeral SLURM or Kubernetes cluster sized to the workload.
GPU Cloud is billed per GPU-hour. AI Platform jobs are billed by the GPU resources they consume. AI Endpoints are billed per request or per token depending on the model. Reserved capacity and committed-use discounts are available for predictable workloads. We'll model the cost with you against your actual workload shape before you commit.
PyTorch, TensorFlow, JAX, and HuggingFace Transformers are first-class with maintained container images. For inference, Triton Inference Server, vLLM, ONNX Runtime, and TensorRT-LLM are supported. If you're using something we don't maintain, you can bring your own container image — we operate the GPU and the host, you operate the runtime.
Tell us what you're trying to build — fine-tuning, inference at scale, an application powered by a pre-trained model, or pure research. We'll propose the right tier and walk you through the cost model.