Enterprise Video Intelligence & Agentic Search
Backend and core platform for ingesting, understanding, and searching tens of thousands of hours of enterprise video.
Founding Engineer (Employee #1)
- Video indexed
- 10,000+ hrs
- Role
- Eng #1
- Stage
- Early enterprise
As the founding engineer at FlowState AI, I'm building the platform that turns raw enterprise video into something searchable and actionable — from the ground up.
Problem
Organizations sit on enormous archives of video — recordings, operations footage, inspections — that are effectively un-searchable. Long-form video is expensive to process, hard to index, and harder still to query in the way people actually think ("find the moment where…").
The deeper problem is that video is not just another file type. It is temporal, multimodal, and evidence-heavy. A useful system needs to know what happened, when it happened, why a moment was retrieved, and where a human can verify it.
Enterprise video intelligence
Turning video archives into searchable organizational memory
Click through the system from raw footage to timestamped evidence: video chunks produce described moments, those moment texts are embedded into a vector database, and an answerer grounds responses in the original video.
Platform
FastAPI, gRPC, Temporal, Kubernetes
Retrieval
Milvus, embeddings, multimodal RAG
Product
Search, analytics, anomaly workflows
Approach
I designed and built the backend and core platform for turning long-form enterprise video into searchable organizational memory. That means scalable ingestion, storage, retrieval, and orchestration, plus an agentic search layer that can reason across time instead of only matching keywords.
At a simplified level, the system turns each video into chunks, and each chunk can produce multiple text moments: events, entities, actions, and other details described by a VLM. Those moment descriptions are embedded and stored in a vector database. When a user asks a question, the query is embedded too, matched against the most semantically similar moment embeddings, and passed with retrieved evidence into an answerer model that can respond with grounded timestamps.
The work sits at the boundary between ML systems and product infrastructure: vision-language models for long-form understanding, vector retrieval, real-time processing, anomaly detection, enterprise analytics, and the backend services that make those capabilities reliable enough for real deployments.
What I Built
As employee #1, my role spans architecture, implementation, and early product execution. I work across the platform layer that ingests and indexes video, the retrieval systems that surface relevant moments, and the agent workflows that turn natural-language questions into grounded answers with inspectable evidence.
Impact
The platform powers early enterprise deployments across 10,000+ hours of video content. The goal is to make video feel less like passive storage and more like an active interface: something teams can search, investigate, monitor, and reason over at scale.
Beyond the code, I lead engineering across architecture, product, and early team building as the company's first engineer.
Note: kept intentionally high-level. Specifics are omitted for confidentiality.