Running Experiments
Phoenix supports two workflows for experiments: a UI-driven flow in the Playground and a programmatic SDK flow.Run Experiments in the UI
Configure prompts and evaluators in the Playground and compare results.
Run Experiments with the SDK
Run experiments programmatically with tasks and evaluators in code.
SDK Experiment Steps
Upload a Dataset
Load your test cases into Phoenix to use as inputs for experiments.
Create a Task
Define the function or workflow you want to evaluate against your dataset.
Configure Evaluators
Set up the scoring criteria to assess your task outputs.
Run an Experiment
Execute your task across all dataset examples and collect evaluation results.
Use Repetitions
Run tasks multiple times to measure variance and consistency.
Dataset Splits
Run experiments on specific subsets of your dataset.
Using Evaluators
LLM Evaluators
Use LLM-as-a-judge to assess quality, correctness, and other criteria.
Code Evaluators
Built-in heuristic evaluators like exact match, JSON distance, and regex.
Custom Evaluators
Build your own evaluation logic with custom prompts or code.
Dataset Evaluators
Attach evaluators to datasets for automatic scoring during experiments.

