AI Platform

Verbal Quiz Case Study

Architecting a low-latency, real-time verbal examination and voice evaluation platform. Built voice-to-text pipelines and automated scoring frameworks designed to respond instantly.

Client

Verbal Quiz

Services

Platform

Stack

React, FastAPI

Status

Active

Launch Live Platform

Traditional browser-based examinations rely on manual text inputs or multiple-choice interfaces. Verbal Quiz shifts this paradigm by executing full oral evaluations. The primary engineering hurdle lay in keeping voice transcription, semantic analysis, and follow-up audio queries below a natural 1.5-second conversational latency threshold.

GCAN resolved this by writing an asynchronous voice processing core running over FastAPI WebSockets. Audio input streams are chunked and dispatched to specialized inference endpoints, generating concurrent transcripts while downstream LLM rubrics prepare grading criteria and return text-to-speech payloads in parallel.

Audio Streaming

Compresses raw voice inputs locally using Opus codecs prior to transport.

FastAPI Sockets

Maintains persistent bi-directional connection channels for immediate response delivery.

Evaluation Core

Transcription output will render here...

NLP Analytics

Pronunciation 0%

Grammar Coherence 0%

Semantic Match 0%

Waiting for speech simulation to finish calculations...

Speed Verification

Socket Handshake

Audio frame streams are packetized and routed.

Whisper ASR Inference

Whisper neural network processes audio data to string formats.

LLM Semantic Review

Embeddings map correct variables against rubric criteria.

Text-to-Speech Generation

Synthesized question payload returned to page receiver.

Speech-to-Text (ASR) 0ms

NLP Grading Analytics 0ms

Text-to-Speech Synthesis 0ms

Platform Processing Flow

🎙️

Audio Capture

Captures user speech inputs through standard Web Audio API integrations.

⚡

WS Streaming

Streams WebM/Opus data packets immediately over secure Fast WebSockets.

🤖

Inference Grid

Processes transcripts concurrently alongside vector search embeddings.

🔊

Pre-Synthesized Output

Compiles answers and returns natural vocal audio prompts to the page.