Analytics

Smiling man with black hair and beard sitting at a wooden desk with a laptop and books.

The data flow

From the event that happened to the chart someone reads.

Analytics isn't one product — it's five stages, and every stage has gotchas. We operate all five so you debug your data, not your data pipeline.

STAGE 01

Ingest

Pull data in from applications, hyperscaler services, change-data-capture, and external systems.

Kafka Connect
Logs ingest

STAGE 02

Stream

Buffer, replicate, and route events with durability and ordering guarantees. Replicate across regions.

Kafka Connect
Logs ingest

STAGE 03

Store

Index for search, store columnar for OLAP, tier logs by retention. Pick the engine that fits the query.

OpenSearch
ClickHouse
Logs Data Platform

STAGE 04

Query

Full-text search, sub-second OLAP, log search across retention tiers. Standard interfaces, no lock-in.

OpenSearch APIs
ClickHouse SQL

STAGE 05

Visualize

Managed dashboards across all data sources. Embed in your app or hand them to stakeholders.

Managed Dashboards
Grafana / OpenSearch Dashboards

Sub-platform 01 · Event Streaming

Kafka, operated as a service.

Managed Apache Kafka with the surrounding ecosystem: Connect for ingest and egress, MirrorMaker for cross-cluster replication, schema registry, and TLS/SASL everywhere. Same operating model whether your producers live on intSignal or in a hyperscaler.

EVENT BUS

Managed Kafka

Multi-broker cluster with rack awareness, replication factor tuned per topic, and schema registry. Producers and consumers connect with standard Kafka clients.

Topology3+ broker cluster
AuthSASL · mTLS
RegistryAvro · JSON · Protobuf

java · producer.java

// Standard Kafka client. TLS + SASL.
Properties props = new Properties();
props.put("bootstrap.servers", "kafka.intsignal.io:9093");
props.put("security.protocol", "SASL_SSL");
props.put("sasl.mechanism", "SCRAM-SHA-512");
props.put("key.serializer",   StringSerializer.class.getName());
props.put("value.serializer", KafkaAvroSerializer.class.getName());
props.put("schema.registry.url", "https://sr.intsignal.io");

try (var producer = new KafkaProducer<String, Order>(props)) {
  producer.send(new ProducerRecord<>("orders", order.id(), order));
}
// → ack from majority replicas, schema validated

INGEST · EGRESS

Kafka Connect

Pull data from databases (PostgreSQL CDC, MySQL binlog), object storage, hyperscaler queues, or external APIs. Push to sinks like ClickHouse, OpenSearch, or S3. We run connectors, version them, and monitor them.

Connectorssource + sink
Distributiondistributed mode
Distributionper-connector

json · postgres-source.json

// PostgreSQL CDC → Kafka topic (Debezium)
{
  "name": "orders-cdc",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "db.intsignal.io",
    "database.dbname":   "orders",
    "plugin.name":        "pgoutput",
    "slot.name":          "kafka_orders",
    "table.include.list": "public.orders,public.line_items",
    "topic.prefix":       "orders"
  }
}

// → orders.public.orders, orders.public.line_items

REPLICATION

Kafka MirrorMaker

Replicate topics across clusters, regions, or clouds. Disaster-recovery clusters, multi-region active-active, or migration from an existing Kafka deployment — same tool, different shapes.

ModeMM2 · active-active
Directionuni- or bidirectional
Consistencyat-least-once

properties · mm2.properties

# Replicate from on-prem Kafka to intSignal
clusters = onprem, intsignal

onprem.bootstrap.servers    = kafka.onprem.lan:9093
intsignal.bootstrap.servers = kafka.intsignal.io:9093

# Topics to replicate, regex supported
onprem->intsignal.enabled = true
onprem->intsignal.topics  = orders.*, payments.*, audit.*

# Translate consumer-group offsets across clusters
sync.group.offsets.enabled = true
emit.checkpoints.enabled   = true

# → ready for failover or migration

Sub-platform 02 · Observability & Analytics

Store the data. Query it. Show it.

OpenSearch for search and log workloads. ClickHouse for sub-second analytical queries on huge tables. A logs data platform that knows about retention tiers and access policy. Managed dashboards on top of all of it.

SEARCH · INDEX

OpenSearch

Full-text search, log analytics, and aggregations. Managed cluster with hot/warm/cold node tiers, index lifecycle policies, and snapshot-based backups. API-compatible with Elasticsearch 7.x clients.

Versions2.x · 3.x
Tiershot · warm · cold
Backupsnapshot to object

http · search.sh

# Aggregation query against logs index
$ curl -X POST "https://os.intsignal.io/logs-*/_search" \
    -H "Content-Type: application/json" -d '{
  "size": 0,
  "query": {
    "range": { "@timestamp": { "gte": "now-1h" } }
  },
  "aggs": {
    "errors_by_service": {
      "terms": { "field": "service.keyword", "size": 10 },
      "aggs": { "rate": { "rate": { "unit": "minute" } } }
    }
  }
}'

# → top 10 services by error rate, last hour

COLUMNAR OLAP

ClickHouse

Sub-second analytical queries over billions of rows. Distributed tables, materialized views, and TTL-driven retention. Standard SQL — your analysts and BI tools already speak it.

Versions24.x LTS · 25.x
EngineMergeTree family
BackupBACKUP TO disk + object

sql · events.sql

-- Real-time aggregation over a billion rows
SELECT
  toStartOfMinute(event_time) AS minute,
  country,
  uniqExact(user_id)          AS users,
  sum(amount)               AS revenue
FROM events_distributed
WHERE event_time >= now() - INTERVAL 1 HOUR
  AND event_type = 'purchase'
GROUP BY minute, country
ORDER BY minute DESC, revenue DESC
LIMIT 100;

-- → Elapsed: 0.2s, 1.2B rows scanned, 14MB read

LOG PLATFORM

Logs Data Platform

End-to-end log management: ingest with FluentBit / Vector, route into OpenSearch or object storage based on retention class, search across tiers, and apply access policies per-tenant or per-source.

IngestFluentBit · Vector · syslog
Tiershot · warm · cold · frozen
Retentionconfigurable per source

yaml · fluentbit.conf

# Ship app logs to intSignal logs platform
[INPUT]
  Name              tail
  Path              /var/log/app/*.log
  Tag               app.*
  Parser            json

[FILTER]
  Name              modify
  Match             *
  Add               environment prod
  Add               cluster prod-aws

[OUTPUT]
  Name              http
  Match             *
  Host              logs.intsignal.io
  Port              443
  TLS               on
  Header            Authorization Bearer ${LOGS_TOKEN}

VISUALIZATION

Managed Dashboards

Grafana and OpenSearch Dashboards, hosted and operated. Data sources pre-wired to OpenSearch, ClickHouse, Prometheus, and your logs platform. SSO integration and per-team folder permissions.

ToolsGrafana · OS Dashboards
AuthSSO · OIDC · SAML
Embeddingvia signed URLs

yaml · dashboard.yaml

# Dashboards as code — version-controlled in Git
apiVersion: grafana.com/v1
kind: Dashboard
metadata:
  name: api-latency-overview
  folder: platform-team
spec:
  datasource:
    - name: "clickhouse-prod"
      type: "grafana-clickhouse-datasource"
  panels:
    - title: "p99 latency by endpoint"
      type: "timeseries"
      query: |
        SELECT toStartOfMinute(ts) AS t,
               endpoint,
               quantile(0.99)(latency_ms)
        FROM api_logs
        WHERE ts >= now() - INTERVAL 1 HOUR
        GROUP BY t, endpoint

What people build

Common shapes of an analytics workload.

If your project looks like one of these, the right stack is mostly already chosen. We'll walk you through it in the consultation.

REAL-TIME PRODUCT ANALYTICS

User events → dashboards in minutes

App emits clickstream events. Kafka ingests them. ClickHouse stores them columnar. Grafana shows live conversion funnels and cohort metrics that update as users click.

Kafka · ClickHouse · Grafana

CENTRALIZED OBSERVABILITY

Logs and metrics across the fleet

FluentBit ships logs from every cluster. OpenSearch indexes them with retention tiers. Dashboards expose error rates, p99 latencies, and saturation across services.

Logs Platform · OpenSearch · Dashboards

CDC INTO ANALYTICS

Operational DB → analytical store

Kafka Connect Debezium captures PostgreSQL row changes. ClickHouse materializes them into wide analytical tables. Reports run on a copy that's seconds behind production.

Kafka Connect · Kafka · ClickHouse

SECURITY EVENT PIPELINE

Audit logs → search + retention

Audit and access events stream into Kafka. They route to hot OpenSearch indices for active investigation and to cold storage for long-tail retention required by compliance.

Kafka · OpenSearch · Logs Platform

MULTI-CLOUD DR

Active-active Kafka across regions

MirrorMaker replicates topics between a primary cluster on AWS and a DR cluster on intSignal. Consumer offsets are translated so failover is a configuration change, not a code change.

Kafka · MirrorMaker 2

EMBEDDED CUSTOMER ANALYTICS

Dashboards inside your product

ClickHouse stores customer event data with per-tenant isolation. Grafana panels are embedded into your app via signed URLs. Customers see their own data, scoped automatically.

ClickHouse · Grafana · SSO

FAQ

Questions data teams ask before signing.

If yours isn't here, ask in the consultation — we'd rather flag the awkward bits early than discover them in production.

Where does the platform actually run?

On intSignal infrastructure, in our hosting facility. But the data doesn't have to live there — Kafka Connect and MirrorMaker pull from and push to your hyperscaler workloads, on-premises systems, or external SaaS, so we're rarely the only place your data sits.

Can you ingest from our existing AWS/Azure/GCP sources?

Yes. Kafka Connect has source connectors for AWS MSK, Kinesis, Azure Event Hubs (via Kafka API), Google Pub/Sub, plus database CDC and object-storage watchers. MirrorMaker handles full cluster replication when you have an existing Kafka deployment.

OpenSearch vs ClickHouse — which one should I use?

OpenSearch is best for full-text search, log analytics, and ad-hoc exploration of semi-structured data. ClickHouse is best for high-cardinality, aggregation-heavy analytical SQL over very large tables. Many customers run both — we'll help you pick during the consultation. There's no penalty for running both since the operator is the same.

Is OpenSearch compatible with our Elasticsearch tooling?

OpenSearch was forked from Elasticsearch 7.10 and remains API-compatible with that line. Most clients, Beats, Logstash, and Elasticsearch 7.x integrations continue to work without changes. If you're on Elasticsearch 8 with X-Pack features specifically, we'll review the migration path with you.

How is data isolated for multi-tenant analytics?

Several layers. Kafka topics support ACLs per principal. OpenSearch document-level security and ClickHouse row policies enforce tenant scoping at query time. Dashboards apply additional scope via signed URLs or SSO group membership. We design the isolation model with you during onboarding.

What's the retention story for logs?

The logs platform supports hot / warm / cold / frozen tiers with automatic transitions. Hot for active investigation (full search, fast), cold and frozen for compliance retention (searchable, slower, cheaper). You set the policy per log source; we operate the transitions.

Can we use our own dashboards instead?

Yes. The managed dashboards are an option, not a requirement. OpenSearch, ClickHouse, and the logs platform all expose standard APIs. Bring Metabase, Superset, Tableau, Power BI, Looker, or anything else that speaks HTTP or SQL.

Stream, store, query, visualize. All managed.

Two platforms, one operator

Connect anywhere

Encrypted by default

Audit-ready