- Evaluating LLM-generated responses for trustworthiness.
- Using Cleanlab TLM to score and flag untrustworthy responses.
- Leveraging Phoenix for tracing and visualizing response evaluations.
Key Implementation Steps for generating evals w/ TLM
- Install Dependencies, Set up API Keys, Obtain LLM Responses + Trace in Phoenix
- Download Trace Dataset
- Prep data from trace dataset
- Setup TLM & Evaluate each pair
- Upload Evals to Phoenix
Google Colab

