Under the Hood

Technical Specs

How CARN detects missing persons from the air using tri-model AI fusion, agentic tool orchestration, and SAR doctrine.

AI Detection Pipeline

CARN runs a tri-model fusion pipeline on every video frame. A COCO-pretrained YOLOv8m base model detects all 9 SAR-relevant object classes, while two specialist models — RGB (fine-tuned on VisDrone) and thermal (fine-tuned on BIRDSAI) — boost person recall from aerial and infrared perspectives. Overlapping person detections are merged via NMS deduplication (IoU > 0.5 keeps the higher confidence), while non-person SAR classes pass through from the COCO model. In SAR, a missed detection can cost a life, so we optimize for recall over precision with a 0.15 confidence threshold.

Video Frame Captured

Base64 JPEG from live drone feed, webcam, or uploaded video

COCO YOLOv8m

80-class base model filtered to 9 SAR classes

Person, Car, Boat, Truck, Bus, Bicycle, Backpack, Handbag, Suitcase

RGB Specialist

YOLOv8s fine-tuned on VisDrone aerial imagery (42K annotations)

Person class only

Thermal Specialist

YOLOv8s fine-tuned on BIRDSAI infrared data (34K annotations)

Person class only
All models run in parallel with FP16 half-precision

NMS Deduplication & Merge

Specialist person boxes merged with COCO person boxes. Overlapping detections (IoU > 0.5) keep the higher confidence score. Non-person classes pass through directly.

SAHI Sliced Inference

640px tiles with 20% overlap detect small/distant persons that full-frame inference misses

ByteTrack Object Tracking

Persistent IDs across frames so the same person/vehicle is not re-counted

Client-Side Class Filtering

Operators toggle which SAR classes to display in real-time

Video Overlay

Bounding boxes with class labels and confidence scores on the live feed

Map Markers

GPS-tagged detections on MapLibre with 4-tier confidence visualization

Alert Pipeline

3-tier alerts (Critical/High/Medium) with push notifications and audio cues

0.15
Confidence Threshold
0.45
NMS IoU Threshold
640px
SAHI Slice Size
1920px
Inference Resolution

9 SAR Detection Classes

Person
Car
Truck
Bus
Boat
Bicycle
Backpack
Handbag
Suitcase

Training Results

RGB Model

0.520
Recall
Improved in v2
0.707
Precision
Improved in v2
0.593
mAP50
Improved in v2
0.233
mAP50-95
Improved in v2

Retrained on NVIDIA Tesla V100 (Tensorix.ai) with expanded VisDrone data. Precision +25% and recall +21% over v1 baseline. Dense aerial crowds remain challenging but the COCO base model compensates via tri-model fusion.

Thermal Model

0.899
Recall
Target exceeded
0.956
Precision
Target exceeded
0.947
mAP50
Target exceeded
0.560
mAP50-95
Target exceeded

Exceeds all targets. BIRDSAI thermal aerial data matches SAR use case exactly — drone-mounted infrared over wilderness terrain.

Training Datasets

VisDrone-DET

CC BY-NC-SA 3.0

Tianjin University

Aerial RGB images from drones over 14 Chinese cities at varying altitudes

2,000 images / 42,241 annotations

BIRDSAI Conservation Drones

CDLA Permissive 1.0

LILA BC

Thermal infrared footage from conservation drones in African protected areas

14,395 images / 34,384 annotations

Detection & Alert Tiers

Every detection is assigned a confidence tier that determines the alert severity. SAR doctrine demands recall over precision — we never discard a detection, even at low confidence.

CONFIRMED

Blocking modal alert with urgent alarm. Requires immediate confirm/reject from operator.

> 0.8

ALERT

Persistent banner notification with chime. High priority for review.

0.5 - 0.8

NOTICE

Toast notification, auto-dismiss. Worth investigating if in priority search area.

0.3 - 0.5

SCAN

Logged for post-mission review. Background noise level, but SAR doctrine says never discard.

< 0.3

CARN Intelligence

CARN Intelligence is an agentic AI command centre powered by Claude Sonnet 4.5 via the Anthropic SDK. A multi-turn tool-use loop (max 3 iterations) lets Claude orchestrate 7 tools — querying the live database, planning missions, generating flight paths, reviewing detections, and producing operational briefings. All through natural language with real database operations, not simulated responses.

Tool-Use Architecture

  • Claude Sonnet 4.5 with 7 tool definitions (Anthropic SDK)
  • Multi-turn agentic loop (max 3 iterations per query)
  • Server-Sent Events (SSE) streaming with ReadableStream
  • Real-time tool execution transparency (timing + status)
  • Operational context injection: cases, missions, detections, roster
  • Conversation persistence with 30-min TTL, 20-message window

Rich Output & Actions

  • MapLibre GL maps embedded in chat (search areas, flight paths, markers)
  • Actionable buttons: Approve, Launch, Complete, Abort — real PATCH ops
  • Auto-briefing on dashboard load with formatted operational summary
  • Real-time new case notifications with 15s polling + map + quick actions
  • Markdown rendering: tables, headers, code blocks via react-markdown
  • Thinking steps UI: animated spinner with execution timing
ToolOperation
plan_missionISRID search area + boustrophedon flight path
update_mission_statusValidate status transition chain
show_cases_on_mapQuery active cases with LKP coordinates
show_detections_on_mapFilter by confidence + mission
generate_briefingFull operational sitrep with recommendations
confirm_detectionHuman-in-loop detection review
show_mission_on_mapRender search area + flight path overlay

SSE Streaming Protocol

The Intelligence API streams events in real-time as tools execute. The frontend consumes these via fetch().body.getReader() and renders each step as an animated row with status, description, and execution time.

thinking

Claude is processing the query

tool_start

Tool execution begins

tool_result

Tool completed with summary and duration

response

Final message with text, mapData, actions

done

Stream complete, connection closed

error

Error with message, stream terminates

System Architecture

1

DJI Drone

Streams dual RGB + thermal video via RTMP to the CARN server. Enterprise (Matrice 30T) or consumer (Mavic Mini 3 Pro) hardware.

2

MediaMTX Relay

Receives RTMP on port 1935, transcodes to HLS for low-latency browser playback

3

Live Dashboard

Next.js 16 + React 19 dashboard with HLS stream and real-time frame extraction for inference

4

GPU Inference (Tri-Model Fusion)

COCO base + RGB + thermal specialists running with SAHI slicing, FP16, NMS dedup, and ByteTrack tracking

SQLite + Drizzle ORM

Detection persistence with GPS coords, WAL mode

Socket.IO

Real-time broadcast to all connected clients

Web Push + ntfy

PWA and mobile push notifications

5

CARN Intelligence

Claude Sonnet 4.5 with 7 tool definitions for mission planning, status management, detection review, and operational briefings via SSE streaming

6

3-Tier Alert System

Critical (>80%), High (60-80%), Medium (40-60%) with blocking modals, banners, toasts, and audio cues

7

Human-in-the-Loop

Operator confirms or rejects detections. Confirmed locations dispatched to field teams via push notifications.

Performance

End-to-end latency from frame capture to rescue team notification. Every second matters in SAR — the pipeline is optimized for speed at every stage.

~200ms
Frame Inference (FP16)
< 1s
Detection to Database
< 2s
Socket.IO Broadcast
< 5s
Push Notification
76,625
Training Annotations
16,395
Training Images
7
AI Tool Definitions

Tech Stack

Web Dashboard

  • Next.js 16 (App Router) + React 19 + TypeScript
  • MapLibre GL JS + react-map-gl (open-source maps)
  • Socket.IO for real-time detection broadcast
  • NextAuth.js v5 with JWT + RBAC (4 roles)
  • Drizzle ORM + SQLite (WAL mode, better-sqlite3)
  • Zustand stores, shadcn/ui, Tailwind CSS 4
  • PWA with service worker and offline caching

ML / Inference

  • FastAPI async inference server
  • YOLOv8 tri-model fusion (COCO + RGB + thermal)
  • SAHI sliced inference (640px tiles, 20% overlap)
  • CUDA 12.6 + PyTorch 2.6 + FP16 half-precision
  • ByteTrack persistent object tracking
  • 76K annotations across VisDrone + BIRDSAI datasets

Drone & Streaming

  • DJI hardware (enterprise or consumer)
  • MediaMTX relay: RTMP ingest to HLS output
  • Caddy reverse proxy with auto-TLS
  • EC2 deployment with PM2 process management
  • Web Push (VAPID) + ntfy.sh notifications

AI Intelligence

  • Claude Sonnet 4.5 via Anthropic SDK (server-side)
  • 7 tool definitions with multi-turn agentic loop
  • SSE streaming with real-time execution transparency
  • react-markdown + remark-gfm for rich responses
  • Conversation memory (30-min TTL, 20-msg window)

Hackathon Demo Setup

For the Claude Code Hackathon (Feb 10-16, 2026), consumer FPV hardware was used to demonstrate the system. In production, CARN supports enterprise drones (DJI Matrice 30T, M300 RTK) with thermal imaging, RTMP streaming, and 40+ minute flight times.

The same AI detection pipeline works seamlessly with both consumer and enterprise hardware.

1

DJI Mavic Mini 3 Pro / DJI Avata

Consumer drone with FPV video transmission to goggles

2

DJI FPV Goggles 3 + HDMI Capture

Goggles output via USB-C to HDMI adapter, then HDMI capture card to laptop

3

OBS Studio Virtual Camera

Captures from capture card and creates virtual webcam source for CARN

4

CARN Dashboard Camera Selector

Selects OBS Virtual Camera from device dropdown, extracts frames for inference

5

GPU Inference Server

Runs tri-model fusion on live frames with real-time bounding box overlay