All projects
FlowState AI2025 — Present

Enterprise Video Intelligence & Agentic Search

Backend and core platform for ingesting, understanding, and searching tens of thousands of hours of enterprise video.

Founding Engineer (Employee #1)

Video indexed
10,000+ hrs
Role
Eng #1
Stage
Early enterprise
PythonFastAPIgRPCMilvusVLMs / RAGTemporalKubernetes

As the founding engineer at FlowState AI, I'm building the platform that turns raw enterprise video into something searchable and actionable — from the ground up.

Problem

Organizations sit on enormous archives of video — recordings, operations footage, inspections — that are effectively un-searchable. Long-form video is expensive to process, hard to index, and harder still to query in the way people actually think ("find the moment where…").

The deeper problem is that video is not just another file type. It is temporal, multimodal, and evidence-heavy. A useful system needs to know what happened, when it happened, why a moment was retrieved, and where a human can verify it.

Enterprise video intelligence

Turning video archives into searchable organizational memory

Click through the system from raw footage to timestamped evidence: video chunks produce described moments, those moment texts are embedded into a vector database, and an answerer grounds responses in the original video.

archiveingestindexretrieve + answerevidence10,000+ hoursenterprise footageIngestionchunk / queue / storeDescribedmomentseventsactionsentitiescontextchunk → many momentsVector DBmoment embeddingsqueryembeddingtop-k momentsAnswererResults00:14:22Restricted zone01:03:09Dock activity02:41:36Anomaly reviewChunk → text moments → embeddings stored in vector DBQuery embedding → similar moments → answerer model
Raw video to cited answer
VLMs + RAGTemporal pipelinesEnterprise scale

Platform

FastAPI, gRPC, Temporal, Kubernetes

Retrieval

Milvus, embeddings, multimodal RAG

Product

Search, analytics, anomaly workflows

Approach

I designed and built the backend and core platform for turning long-form enterprise video into searchable organizational memory. That means scalable ingestion, storage, retrieval, and orchestration, plus an agentic search layer that can reason across time instead of only matching keywords.

At a simplified level, the system turns each video into chunks, and each chunk can produce multiple text moments: events, entities, actions, and other details described by a VLM. Those moment descriptions are embedded and stored in a vector database. When a user asks a question, the query is embedded too, matched against the most semantically similar moment embeddings, and passed with retrieved evidence into an answerer model that can respond with grounded timestamps.

The work sits at the boundary between ML systems and product infrastructure: vision-language models for long-form understanding, vector retrieval, real-time processing, anomaly detection, enterprise analytics, and the backend services that make those capabilities reliable enough for real deployments.

What I Built

As employee #1, my role spans architecture, implementation, and early product execution. I work across the platform layer that ingests and indexes video, the retrieval systems that surface relevant moments, and the agent workflows that turn natural-language questions into grounded answers with inspectable evidence.

Impact

The platform powers early enterprise deployments across 10,000+ hours of video content. The goal is to make video feel less like passive storage and more like an active interface: something teams can search, investigate, monitor, and reason over at scale.

Beyond the code, I lead engineering across architecture, product, and early team building as the company's first engineer.

Note: kept intentionally high-level. Specifics are omitted for confidentiality.