Research Foundation · Wang et al., Emory / InitialView (2023)

Advanced Context Management

Long-form interviews require models to remember what was said minutes ago while staying responsive to new information. The InterviewBot research outlines sliding windows, attention weighting, topic stores, and drift detection—tools we study when shaping our approach to extended conversations.

Read the Wang et al. paper ↗Review proctoring safeguards

Architectural components

Sliding window memory

Extends context windows to 228 tokens with adaptive overlap, ensuring recent utterances remain salient without exceeding model limits.

Context attention

Weights the previous five exchanges using recency and topical similarity scores to inform each response.

Topic store

Tracks 8-16 key topics across the session, preventing repetition and enabling intentional returns to unfinished threads.

Drift detection

Monitors semantic deviation in real time and signals human handoff when confidence drops below threshold.

Reference metrics from Wang et al. (2023)

Topic repetition

Baseline

30.0%

Published result

6.7%

Interview completion

Baseline

13.3%

Published result

46.7%

Off-topic responses

Baseline

20.0%

Published result

10.0%

Diarization accuracy

Baseline

72.7%

Published result

93.6%

Modeling takeaways

Challenges highlighted in the research

Agents risk repeating questions when context windows overflow or lose topical state.

Limited memory can truncate candidate rationale, reducing downstream scoring quality.

Topic drift breaks interview structure and erodes perceived fairness.

How we respond during modeling

We tune sliding-window policies and utterance weighting to prioritize recent, high-signal turns.

Candidate rationales feed structured memory slots so follow-ups retain the right level of detail.

Drift detectors route ambiguous sessions for review, keeping humans in the loop when research flags risk.

Implementation details

Hybrid memory

Combines vector recall for key facts with chronological buffers for dialogue tone and commitments.

Quality telemetry

Captures turn-level latency, interruption rate, and acknowledgement frequency for QA dashboards.

Domain tuning

Industry-specific topic ontologies keep conversations focused on required competencies.

Resources

Next: Benchmarks and datasets →Back to research overview