Decode, Detect, Diagnose LLMs: User Behavior Changes & Performance Drift


Join this hands-on workshop to learn data-centric approaches to diagnose issues in LLM-powered applications using text quality metrics such as sentence structure, vocabulary choice, toxicity, and sentiment.

Large language models (LLMs) rarely provide consistent responses for the same prompts over time. It could be due to changes in your LLM model’s performance, but it can also be a result of changes in user behavior. Text quality metrics, when combined, can help to pinpoint and mitigate these issues without the need for expensive ground truth labeling.

This workshop will cover:

  • Different types of data drift common to LLM applications
  • Sentiment, toxicity, and vocabulary metrics useful for text applications
  • Combining text quality metrics to measure change in user behavior and model performance
  • Translating changes in text quality into actionable mitigation techniques

What you’ll need:

Who should attend:

Anyone interested in building applications with LLMs, AI Observability, Model monitoring, MLOps, and DataOps! This workshop is designed to be approachable for most skill levels. Familiarity with machine learning and Python will be useful, but it's not required to attend.

About the Speaker:

Bernease Herman is a Sr. Data Scientist at WhyLabs. At WhyLabs, she is building model and data monitoring solutions using approximate statistics techniques. Earlier in her career, Bernease built ML-driven solutions for inventory planning at Amazon and conducted quantitative research at Morgan Stanley. Her ongoing academic research focuses on evaluation metrics for machine learning and LLMs. Bernease serves as faculty for the University of Washington Master’s Program in Data Science program and as chair of the Rigorous Evaluation for AI Systems (REAIS) workshop series. She has published work in top machine learning conferences and workshops such as NeurIPS, ICLR, and FAccT. She is a PhD student at the University of Washington and holds a Bachelor’s degree in mathematics and statistics from the University of Michigan.

About WhyLabs:

WhyLabs is an AI observability platform that prevents data & model performance degradation by allowing you to monitor your data and machine learning models in production.

Do you want to connect with the community, learn about WhyLabs, or get project support? Join the WhyLabs + Robust & Responsible AI community Slack:





Date & Time

Feb. 13, 2024, 1 p.m. - Feb. 13, 2024, 2 p.m.



Learn More & Register

Learn More & Register