Introducing PhaseLLM Evals, your prompt engineering copilot
Over the past year, we’ve worked with dozens of startups and researchers to help launch LLM-powered products and services. Every single project and group struggles with prompt optimization, and understanding the limits and opportunities associated with prompts. This includes questions like...
- How do I balance temperature settings (and creativity!) with strict structural responses?
- How do I get my LLM to come as close as possible to reasoning?
- How do I balance cost and latency issues of GPT-4 versus GPT-3.5?
With PhaseLLM Evals, you can run batch LLM jobs to compare and contrast different responses from LLMs. You can see how consistent responses are across multiple LLM calls; you can compare models and model settings; you can iterate and replace system prompts and user messages as needed.
Prompt
engineering prior to PhaseLLM Evals.
Tutorial: Building an LLM-Powered CRM
The video tutorial below shows you how to use PhaseLLM Evals to make the best, most consistent LLM-powered apps. In this sample project, we’re building an LLM-powered CRM system that helps you choose which contacts to invite to events.
In the video above, you'll learn how to create your first chat object, how to run an experiment, and how to iterate on your LLM requests to ensure more consistent responses.
Product Roadmap and Vision
Today’s PhaseLLM Evals product focuses on batch jobs across prompts, models, and model settings. You can also manage your chat data to build new experiments. This is the earliest version of our product.
We envision a product where you can optimize prompts, build evaluation data sets, and even get external data sets to stress test your prompts and models.
We strongly believe that a collaborative approach, with open access data sets and easy-to-use evaluations, is how safe, reliable, and optimal AI experiences will be built.
Have questions? Feature requests? Need help? Reach out at w [at] phaseai [dot] com.