AI Benchmarking Suite

Plotline AI Benchmarks

Evaluating AI models for customer engagement excellence. We rigorously test models across quality, speed, and reliability to power our AI-native platform.

Available Benchmarks

🎨
Live

MAGE

Media Asset Generation Evaluation

Evaluating AI image generation models for marketing media assets across text integration, dimension modification, creation, and object manipulation.

3 Models
20 Tasks
- Evaluators
View Results
✍️
Coming Soon

PAGE

Personalized Ad-copy Generation Evaluation

Benchmarking LLMs for personalized messaging, push notification copy, and in-app content generation.

Coming Soon
🛡️
Coming Soon

GAGE

Guardrail Adherence & Governance Evaluation

Measuring AI safety, guardrail compliance, and governance for experience protection and decisioning.

Coming Soon
🎨 MAGE

Media Asset Generation Evaluation

Evaluating AI image generation for marketing media assets

- Evaluators
- Evaluations

Championship Standings

F1-style scoring: 25 pts for 1st, 18 pts for 2nd, 15 pts for 3rd

2
-
-
👑
1
-
-
3
-
-
Pos Model Points 🥇 Wins 🥈 2nd 🥉 3rd

Points Distribution

⚡ Runtime Performance

Generation speed comparison across all models

Average Generation Time (seconds)

Average Runtime by Category

Runtime per Task

Success Rate

Performance by Category

See how each model performs across different task categories

Task-by-Task Results

Detailed breakdown of each evaluation task