Public Cloud
ComputeContainersStorageDatabasesAIAnalyticsIaCQuantum

Public cloud — AI & Machine Learning

AI infrastructure, three ways in.

Build models from scratch on the AI Platform. Call ready-made models through AI Endpoints. Or rent the raw GPU. Same operator, same SLA, same audit trail — whichever altitude you pick.

Schedule consultation  ⟶See the tools

NVIDIA GPUs

H200, H100, L40S, and L4 — current generation, hosted on intSignal infrastructure.

Three altitudes

Build from scratch, call an API, or rent the GPU. Pick what fits.

Encrypted by default

At rest and in flight on every layer of the stack.

24/7 operations

On-call rotation with incident response and post-incident reviews.

Three ways in

Build, call, or rent. You decide the altitude.

Most AI buyers fit one of three shapes. Pick the tier that matches where your team actually is — and slide between them as the project changes.

TIER 01

YOU HAVE A DATA SCIENCE TEAM

AI Platform

End-to-end workspace for building models from scratch. Notebooks, training jobs, deployment, inference — all under one operator, with versioned artifacts and reproducible runs.

  • AI Notebooks
  • AI Training
  • AI Deploy
  • Inference Solutions
  • Cluster on Demand
See the platform →

TIER 02

YOU JUST NEED THE MODEL

AI Endpoints

Pre-built model APIs for embeddings, generation, classification, and translation. Pick a model from the catalog, call the endpoint, ship to production. No training, no GPU management.

  • AI API Services
  • Model Catalogue
  • Base · Fast · Batch APIs
Browse endpoints →

TIER 03

YOU WANT THE RAW HARDWARE

GPU Cloud

Dedicated NVIDIA GPUs on intSignal infrastructure. H200 and H100 for large model training and inference. L40S for mixed AI and visual workloads. L4 for cost-efficient inference.

  • NVIDIA H200
  • NVIDIA H100
  • NVIDIA L40S
  • NVIDIA L4
Compare GPUs →

Tier 01 · AI Platform

The whole lifecycle, under one operator.

Standard frameworks. Standard storage. Standard tracking. The platform handles the things that bog data science teams down — environment setup, GPU scheduling, artifact versioning, model serving — so you can focus on the model itself.

AI Notebooks

EXPLORE · PREPARE · PROTOTYPE

Managed JupyterLab environments with pre-built images for PyTorch, TensorFlow, and JAX. GPU-backed when you need them, CPU-only when you don't. Persistent volumes mean your work survives session restarts.

  • JupyterLab with pre-built ML images
  • Attach GPU on demand, detach when idle
  • Persistent storage per notebook environment
  • Direct access to object storage and managed databases
  • Shared environments for team collaboration
python · notebook.ipynb

# Standard imports. GPU available if attached.
import torch
import pandas as pd
from intsignal import storage

# Pull training data from intSignal storage
df = storage.read_parquet("s3://datasets/orders-2026.parquet")
print(f"Loaded {len(df):,} rows")

# Check what GPU we're on
if torch.cuda.is_available():
    gpu = torch.cuda.get_device_name(0)
    print(f"GPU: {gpu}")
# → GPU: NVIDIA H100 80GB HBM3

# Prototype model on a subsample
sample = df.sample(10_000, random_state=42)
# ... build / train / iterate ...

AI Training

SCHEDULED JOBS · MULTI-GPU · CHECKPOINTED

Submit training jobs that get scheduled on real GPUs without you managing the cluster. Distributed training across multiple GPUs or nodes for large models. Automatic checkpointing so a preempted job doesn't lose hours of progress.

  • Submit jobs via CLI, SDK, or notebook
  • Single-GPU, multi-GPU, or multi-node configurations
  • Automatic checkpointing to object storage
  • Experiment tracking with metrics and artifacts
  • Spot-priced training for cost-flexible workloads
yaml · harden.yml

# CIS hardening role applied to all production nodes
- name: Apply CIS baseline
  hosts: production
  become: yes
  vars:
    cis_level: "2"
    audit_log_retention: "90d"

  roles:
    - role: intsignal.cis_hardening
      tags: [hardening, compliance]
    - role: intsignal.osquery
      tags: [endpoint, monitoring]
    - role: intsignal.log_forwarder
      vars:
        target: logs.intsignal.io
      tags: [logging]

  handlers:
    - name: restart auditd
      service: { name: auditd, state: restarted }

AI Deploy

CONTAINERIZED · VERSIONED · ROLLBACK-READY

Package your trained model as a container and deploy it to an inference endpoint. Built-in versioning means rollbacks are one command. Traffic-splitting lets you canary new model versions against production safely.

  • Containerized model packaging
  • Version pinning with rollback to any prior
  • Canary deployments with traffic splitting
  • Auto-scaling based on request volume
  • Built-in monitoring of latency and errors
yaml · deploy.yaml

# Deploy a model version with canary traffic split
apiVersion: ai.intsignal.io/v1
kind: ModelDeployment
metadata:
  name: order-classifier
spec:
  model:
    source: s3://models/classifier/v4.2/
    runtime: pytorch:2.4-cuda12
  resources:
    gpu: l40s
    count: 1
  autoscale:
    min: 1
    max: 8
    target_qps: 50
  traffic:
    - version: v4.1
      weight: 90
    - version: v4.2
      weight: 10   # canary

Inference Solutions

SERVE · SCALE · OBSERVE

Production-grade inference serving with the runtime tuned to the model — Triton for general PyTorch/TensorFlow, vLLM for LLMs, ONNX Runtime where it fits. Latency and throughput are observable. Failures are paged, not ignored.

  • Triton, vLLM, ONNX Runtime — pick what fits
  • Dynamic batching for higher throughput
  • p50/p95/p99 latency dashboards out of the box
  • Cold-start optimization for elastic workloads
  • Audit log of every request for compliance reviews
python · client.py

# Call your deployed model from any client
import requests

resp = requests.post(
    "https://inference.intsignal.io/order-classifier/v4.2/predict",
    headers={"Authorization": f"Bearer {TOKEN}"},
    json={
        "inputs": [
            {"text": "shipping address is in Quebec"},
            {"text": "customer requested gift wrap"}
        ]
    }
)

result = resp.json()
# {
#   "predictions": [
#     {"label": "international", "confidence": 0.94},
#     {"label": "preference_note", "confidence": 0.87}
#   ],
#   "latency_ms": 32,
#   "model_version": "v4.2"
# }

Cluster on Demand

EPHEMERAL · MULTI-NODE · TEAR-DOWN

Spin up an entire GPU cluster for a specific workload, run it, and tear it down. Useful for one-off large training runs, hyperparameter sweeps, or research bursts where standing capacity doesn't make sense.

  • Provision a multi-node GPU cluster in minutes
  • SLURM or Kubernetes scheduler — your choice
  • Shared high-throughput storage attached automatically
  • Tear-down releases all resources cleanly
  • Billed by the hour for the actual cluster lifetime
shell · cluster-up.sh

# Spin up a 32-GPU H100 cluster for a 6-hour sweep
$ intsignal-ai cluster create \
    --name hyperparam-sweep-may \
    --gpu-type h100 \
    --node-count 4 \
    --gpus-per-node 8 \
    --scheduler slurm \
    --shared-storage 10TB \
    --ttl 6h

→ cluster: cl-9b2c4f provisioning
→ ready: ~3 min
→ ssh: cl-9b2c4f.clusters.intsignal.io
→ teardown: 2026-05-21 14:00 UTC (auto)

# Run your sweep via SLURM
$ ssh cl-9b2c4f.clusters.intsignal.io
$ sbatch sweep.slurm

→ submitted 64 jobs across 32 GPUs

Tier 02 · AI API Services

Don't train a model. Call one.

AI Endpoints is a catalog of pre-built model APIs — embeddings, classification, generation, translation, vision, speech. Pick the model, get an API key, ship it. Three response patterns for three latency profiles.

CATALOG

AI Endpoints Model Catalogue

browse · select · subscribe

Curated catalog of open-source and licensed models for common tasks. Each model lists its size, license, recommended use cases, and the GPU it runs on. Pin a version so updates don't break your application.

  • Model typesLLM · embed · vision · ASR
  • Versioningpinned per consumer
  • LicensingOSS + commercial

BASE

AI Endpoints Base API

standard request/response

Synchronous request/response for most use cases. Predictable latency, fair-share scheduling, supports all models in the catalog. The default unless you specifically need fast or batch.

  • Patternsync request
  • Use formost workloads
  • Quotaper-key rate limit

FAST

AI Endpoints Fast API

low-latency · reserved capacity

Same models, but with dedicated capacity and priority scheduling. Pick this for interactive use cases — search-as-you-type, voice assistants, anything where p99 latency matters more than per-call cost.

  • Patternsync · prioritized
  • Use forinteractive UX
  • Capacityreserved

BATCH

AI Endpoints Batch API

offline · throughput-optimized

Submit a list of jobs, get back a results file. Costs less per call because we batch and run when capacity is available. Best for nightly enrichment, bulk classification, or backfill jobs.

  • Patternasync · file in/out
  • Use foroffline jobs
  • SLAcompletion windows

One auth header, every model.

OpenAI-compatible request shape so existing client libraries work without modification. Switch models by changing one parameter.

# Same shape, switch models freely
$ curl https://endpoints.intsignal.io/v1/chat/completions \
    -H "Authorization: Bearer $INTSIGNAL_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "llama-3.3-70b-instruct",
      "messages": [
        {"role": "user", "content": "Summarize this in 3 bullets..."}
      ]
    }'

# → response in ~400ms (Fast API)
# → switch model: "mixtral-8x22b-instruct" — same shape
# → switch to embed: /v1/embeddings — same auth

Tier 03 · GPU Cloud Infrastructure

Pick your card. We run the rest.

Four current-generation NVIDIA GPUs on intSignal infrastructure. We handle the host, the network, the storage, the driver stack, the firmware. You handle your code.

Spec NVIDIA H200 NVIDIA H100 NVIDIA L40S NVIDIA L4
Architecture H200Hopper H100Hopper L40SAda Lovelace L4Ada Lovelace
GPU memory 141 GB HBM3e 80 GB HBM3 48 GB GDDR6 ECC 24 GB GDDR6
Memory bandwidth ~4.8 TB/s ~3.35 TB/s ~864 GB/s ~300 GB/s
Interconnect NVLink + NVSwitch NVLink + NVSwitch PCIe Gen4 PCIe Gen4
MIG support Yes (up to 7) Yes (up to 7) No No
Best for 100B+ parameter LLM training and inference. Long-context generation. Memory-bound workloads where the H100 starts to swap. Production LLM training and inference up to ~70B parameters. The current default for serious AI workloads. Mixed AI and visual workloads — 7B–13B LLMs, diffusion models, vision transformers, multimodal inference, rendering. Cost-efficient inference for small to mid-size models. Video transcoding, encoding, and content generation pipelines.

Specs reflect NVIDIA's published reference figures. Actual provisioned configurations may vary by node type and region — we'll walk through what fits your workload in the consultation.

What we operate, every day.

Hardening and operating practices aligned to the frameworks your assessors recognize. intSignal is not the certified entity for most of these — we deliver the controls and evidence that make your audit possible. Where required, we partner with FedRAMP-authorized providers for federal scoping.

HARDENING

CIS Benchmark

Engine hardening with documented exceptions.

    SOC 2

    Aligned to Type II

    Controls and evidence cadence ready for audit.

      ISO

      Aligned to 27001 / 27017

      Cloud-services control narratives.

        HIPAA

        HIPAA-compliant ops

        Encryption, access, audit; BAA via partner.

          FEDERAL

          FedRAMP via partner

          Authorized hyperscaler regions integrated.

            DATACENTER

            Compliant facility

            Hosting facility carries its own attestations.

              FAQ

              Questions data and AI teams ask before signing.

              If yours isn't here, ask in the consultation — we'd rather flag the awkward bits early than discover them in production.

              If you have a data science team building custom models, start with the AI Platform. If you just need to call an existing model (chat, embeddings, vision, etc.), use AI Endpoints. If you want raw GPU access and bring your own software stack, use GPU Cloud. Most customers use two — for example, GPU Cloud for training runs plus AI Endpoints for the parts where a generic model is good enough.

              For large LLM training (70B+) and long-context inference, H200. For most production LLM and serious AI training, H100. For mixed AI and visual workloads at smaller scale (7B–13B models, diffusion, vision), L40S. For cost-efficient inference and video pipelines, L4. We'll size it with you during onboarding — wrong-fit GPUs waste budget either way.

              Yes — but that's actually AI Deploy on the AI Platform, not the Endpoints catalog. If you have a model you've trained or fine-tuned, deploy it via AI Deploy and call it from your application. The Endpoints catalog is for pre-curated models we operate on shared infrastructure.

              Training data stays in your storage. We don't train against customer data, we don't pool data across customers, and we don't share data with model providers. The encryption-at-rest and access-control story is the same as the storage and database services.

              Yes. AI Training supports multi-GPU and multi-node configurations with NVLink/NVSwitch on H100/H200 nodes. PyTorch DDP, FSDP, DeepSpeed, and Megatron-LM are all supported. For very large runs, Cluster on Demand gives you an ephemeral SLURM or Kubernetes cluster sized to the workload.

              GPU Cloud is billed per GPU-hour. AI Platform jobs are billed by the GPU resources they consume. AI Endpoints are billed per request or per token depending on the model. Reserved capacity and committed-use discounts are available for predictable workloads. We'll model the cost with you against your actual workload shape before you commit.

              PyTorch, TensorFlow, JAX, and HuggingFace Transformers are first-class with maintained container images. For inference, Triton Inference Server, vLLM, ONNX Runtime, and TensorRT-LLM are supported. If you're using something we don't maintain, you can bring your own container image — we operate the GPU and the host, you operate the runtime.

              Stop fighting with GPUs. Start shipping models.

              Tell us what you're trying to build — fine-tuning, inference at scale, an application powered by a pre-trained model, or pure research. We'll propose the right tier and walk you through the cost model.

              Schedule consultation  ⟶