All projects
Openstream.aiSummer 2024

Multimodal Personality Detection

A multimodal fusion model that scores the five OCEAN personality traits from a live video feed.

Machine Learning Intern

Models trained
100+
Traits scored
5 (OCEAN)
PythonPyTorchMultimodal fusionAudio / video

At Openstream.ai — a multimodal, plan-based conversational AI platform — I worked on understanding people, not just their words, from audio and video together.

Problem

Conversational systems are richer when they can read tone and presence, not only transcripts. That means fusing audio and video that are temporally aligned but carry very different signals, and turning them into a stable, interpretable output.

The hard part is that personality is not visible in one frame or one word. The signal is distributed across voice dynamics, facial presence, timing, and context. The model had to handle noisy real-time inputs while still producing a compact set of scores people could understand.

Multimodal fusion

Reading personality from synchronized audio and video

Click through the stages to see how live signal becomes embeddings, how temporal aggregation stabilizes those features, and how fusion produces interpretable OCEAN scores.

live inputencoderstemporal aggregationOCEAN vectorvideo framesaudio waveformVisualFace / frame embeddingsAudioSpeech embeddingsTemporal windowsmooth aligned featuresFusionshared vector100+model variantsOpennessConscientiousnessExtraversionAgreeablenessNeuroticism
Audio + video to OCEAN
5 traits100+ modelsreal-time feed

Model exploration

The work was partly about finding which signals were worth trusting.

I explored audio and visual backbones, then trained many variants around the fusion and aggregation strategy to make the final prediction more stable.

Approach

I designed a multimodal fusion mechanism for temporally aligned audio and video inputs. The pipeline encoded audio and visual streams separately, aggregated features over time, then fused the modalities before a final regression layer produced personality scores.

I trained 100+ model variants while exploring different audio and visual backbones, aggregation strategies, and fusion setups. The output was a vector across the five OCEAN traits: openness, conscientiousness, extraversion, agreeableness, and neuroticism.

Result

The result was a prototype that could score personality traits from a user's real-time video feed, giving the conversational AI platform a richer signal than transcript-only analysis.

Note: kept high-level — the underlying algorithm is proprietary.