Under the Hood

Technical Specs

How CARN detects missing persons from the air using tri-model AI fusion, agentic tool orchestration, and SAR doctrine.

AI Detection Pipeline

CARN runs a tri-model fusion pipeline on every video frame. A COCO-pretrained YOLOv8m base model detects all 9 SAR-relevant object classes, while two specialist models — RGB (fine-tuned on VisDrone) and thermal (fine-tuned on BIRDSAI) — boost person recall from aerial and infrared perspectives. Overlapping person detections are merged via NMS deduplication (IoU > 0.5 keeps the higher confidence), while non-person SAR classes pass through from the COCO model. In SAR, a missed detection can cost a life, so we optimize for recall over precision with a 0.15 confidence threshold.

Video Frame Captured

Base64 JPEG from live drone feed, webcam, or uploaded video

COCO YOLOv8m

80-class base model filtered to 9 SAR classes

Person, Car, Boat, Truck, Bus, Bicycle, Backpack, Handbag, Suitcase

RGB Specialist

YOLOv8s fine-tuned on VisDrone aerial imagery (42K annotations)

Person class only

Thermal Specialist

YOLOv8s fine-tuned on BIRDSAI infrared data (34K annotations)

Person class only

All models run in parallel with FP16 half-precision

NMS Deduplication & Merge

Specialist person boxes merged with COCO person boxes. Overlapping detections (IoU > 0.5) keep the higher confidence score. Non-person classes pass through directly.

SAHI Sliced Inference

640px tiles with 20% overlap detect small/distant persons that full-frame inference misses

ByteTrack Object Tracking

Persistent IDs across frames so the same person/vehicle is not re-counted

Client-Side Class Filtering

Operators toggle which SAR classes to display in real-time

Video Overlay

Bounding boxes with class labels and confidence scores on the live feed

Map Markers

GPS-tagged detections on MapLibre with 4-tier confidence visualization

Alert Pipeline

3-tier alerts (Critical/High/Medium) with push notifications and audio cues

0.15

Confidence Threshold

0.45

NMS IoU Threshold

640px

SAHI Slice Size

1920px

Inference Resolution

9 SAR Detection Classes

Person

Car

Truck

Bus

Boat

Bicycle

Backpack

Handbag

Suitcase

Training Results

RGB Model

0.520

Recall

Improved in v2

0.707

Precision

Improved in v2

0.593

mAP50

Improved in v2

0.233

mAP50-95

Improved in v2

Retrained on NVIDIA Tesla V100 (Tensorix.ai) with expanded VisDrone data. Precision +25% and recall +21% over v1 baseline. Dense aerial crowds remain challenging but the COCO base model compensates via tri-model fusion.

Thermal Model

0.899

Recall

Target exceeded

0.956

Precision

Target exceeded

0.947

mAP50

Target exceeded

0.560

mAP50-95

Target exceeded

Exceeds all targets. BIRDSAI thermal aerial data matches SAR use case exactly — drone-mounted infrared over wilderness terrain.

Training Datasets

VisDrone-DET

CC BY-NC-SA 3.0

Tianjin University

Aerial RGB images from drones over 14 Chinese cities at varying altitudes

2,000 images / 42,241 annotations

BIRDSAI Conservation Drones

CDLA Permissive 1.0

LILA BC

Thermal infrared footage from conservation drones in African protected areas

14,395 images / 34,384 annotations

Detection & Alert Tiers

Every detection is assigned a confidence tier that determines the alert severity. SAR doctrine demands recall over precision — we never discard a detection, even at low confidence.

CONFIRMED

Blocking modal alert with urgent alarm. Requires immediate confirm/reject from operator.

> 0.8

ALERT

Persistent banner notification with chime. High priority for review.

0.5 - 0.8

NOTICE

Toast notification, auto-dismiss. Worth investigating if in priority search area.

0.3 - 0.5

SCAN

Logged for post-mission review. Background noise level, but SAR doctrine says never discard.

< 0.3

CARN Intelligence

CARN Intelligence is an agentic AI command centre powered by Claude Sonnet 4.5 via the Anthropic SDK. A multi-turn tool-use loop (max 3 iterations) lets Claude orchestrate 7 tools — querying the live database, planning missions, generating flight paths, reviewing detections, and producing operational briefings. All through natural language with real database operations, not simulated responses.

Tool-Use Architecture

Claude Sonnet 4.5 with 7 tool definitions (Anthropic SDK)
Multi-turn agentic loop (max 3 iterations per query)
Server-Sent Events (SSE) streaming with ReadableStream
Real-time tool execution transparency (timing + status)
Operational context injection: cases, missions, detections, roster
Conversation persistence with 30-min TTL, 20-message window

Rich Output & Actions

MapLibre GL maps embedded in chat (search areas, flight paths, markers)
Actionable buttons: Approve, Launch, Complete, Abort — real PATCH ops
Auto-briefing on dashboard load with formatted operational summary
Real-time new case notifications with 15s polling + map + quick actions
Markdown rendering: tables, headers, code blocks via react-markdown
Thinking steps UI: animated spinner with execution timing

Tool	Operation	Database Action
plan_mission	ISRID search area + boustrophedon flight path	INSERT mission
update_mission_status	Validate status transition chain	UPDATE mission
show_cases_on_map	Query active cases with LKP coordinates	SELECT cases
show_detections_on_map	Filter by confidence + mission	SELECT detections
generate_briefing	Full operational sitrep with recommendations	SELECT all tables
confirm_detection	Human-in-loop detection review	UPDATE detection
show_mission_on_map	Render search area + flight path overlay	SELECT mission

SSE Streaming Protocol

The Intelligence API streams events in real-time as tools execute. The frontend consumes these via fetch().body.getReader() and renders each step as an animated row with status, description, and execution time.

thinking

Claude is processing the query

tool_start

Tool execution begins

tool_result

Tool completed with summary and duration

response

Final message with text, mapData, actions

done

Stream complete, connection closed

error

Error with message, stream terminates

System Architecture

DJI Drone

Streams dual RGB + thermal video via RTMP to the CARN server. Enterprise (Matrice 30T) or consumer (Mavic Mini 3 Pro) hardware.

MediaMTX Relay

Receives RTMP on port 1935, transcodes to HLS for low-latency browser playback

Live Dashboard

Next.js 16 + React 19 dashboard with HLS stream and real-time frame extraction for inference

GPU Inference (Tri-Model Fusion)

COCO base + RGB + thermal specialists running with SAHI slicing, FP16, NMS dedup, and ByteTrack tracking

SQLite + Drizzle ORM

Detection persistence with GPS coords, WAL mode

Socket.IO

Real-time broadcast to all connected clients

Web Push + ntfy

PWA and mobile push notifications

CARN Intelligence

Claude Sonnet 4.5 with 7 tool definitions for mission planning, status management, detection review, and operational briefings via SSE streaming

3-Tier Alert System

Critical (>80%), High (60-80%), Medium (40-60%) with blocking modals, banners, toasts, and audio cues

Human-in-the-Loop

Operator confirms or rejects detections. Confirmed locations dispatched to field teams via push notifications.

Performance

End-to-end latency from frame capture to rescue team notification. Every second matters in SAR — the pipeline is optimized for speed at every stage.

~200ms

Frame Inference (FP16)

< 1s

Detection to Database

< 2s

Socket.IO Broadcast

< 5s

Push Notification

76,625

Training Annotations

16,395

Training Images

AI Tool Definitions

Tech Stack

Web Dashboard

Next.js 16 (App Router) + React 19 + TypeScript
MapLibre GL JS + react-map-gl (open-source maps)
Socket.IO for real-time detection broadcast
NextAuth.js v5 with JWT + RBAC (4 roles)
Drizzle ORM + SQLite (WAL mode, better-sqlite3)
Zustand stores, shadcn/ui, Tailwind CSS 4
PWA with service worker and offline caching

ML / Inference

FastAPI async inference server
YOLOv8 tri-model fusion (COCO + RGB + thermal)
SAHI sliced inference (640px tiles, 20% overlap)
CUDA 12.6 + PyTorch 2.6 + FP16 half-precision
ByteTrack persistent object tracking
76K annotations across VisDrone + BIRDSAI datasets

Drone & Streaming

DJI hardware (enterprise or consumer)
MediaMTX relay: RTMP ingest to HLS output
Caddy reverse proxy with auto-TLS
EC2 deployment with PM2 process management
Web Push (VAPID) + ntfy.sh notifications

AI Intelligence

Claude Sonnet 4.5 via Anthropic SDK (server-side)
7 tool definitions with multi-turn agentic loop
SSE streaming with real-time execution transparency
react-markdown + remark-gfm for rich responses
Conversation memory (30-min TTL, 20-msg window)

Hackathon Demo Setup

For the Claude Code Hackathon (Feb 10-16, 2026), consumer FPV hardware was used to demonstrate the system. In production, CARN supports enterprise drones (DJI Matrice 30T, M300 RTK) with thermal imaging, RTMP streaming, and 40+ minute flight times.

The same AI detection pipeline works seamlessly with both consumer and enterprise hardware.

DJI Mavic Mini 3 Pro / DJI Avata

Consumer drone with FPV video transmission to goggles

DJI FPV Goggles 3 + HDMI Capture

Goggles output via USB-C to HDMI adapter, then HDMI capture card to laptop

OBS Studio Virtual Camera

Captures from capture card and creates virtual webcam source for CARN

CARN Dashboard Camera Selector

Selects OBS Virtual Camera from device dropdown, extracts frames for inference

GPU Inference Server

Runs tri-model fusion on live frames with real-time bounding box overlay