Evaluations
What is an Evaluation?
Evaluations are examples or statements that judge the quality or performance of your LLM outputs. In other words, evaluations (evals for short) are statements that tell your LLM Application that it is performing the way it's human wants it to.
You can evaluate a number of factors such as relevance, hallucination or if the LLM is being mean.
With evals, when you adjust or Empromptu automatically adjusts your prompts, you will if that prompt is more or less performant.
There are 4 ways to define your evaluations in Empromptu
Standard templates from Empromptu
Manually defined
End User Confirmed
Data generated
Standard Evaluation Templates from Empromptu
If you do not have any evaluation statements (yet) you can use Empromptu's UI to select an example set.
Using the UI:
Head to Empromptu's evaluations page
Click add a new evaluation
From the standard tab select a template eval set.
In Code:
Manually defined
Using the UI:
Head to Empromptu's evaluations page
Click add a new evaluation
On the custom tab write your eval statements by:
Define a name
Description of what is to be evaluated
Evaluation Criteria which is your eval statements
Expected output what is the output supposed to look like
Model you want to use
Temperature
In Code:
Name the task [string] (optional) Define an embedding model name
Name the prompt or prompt [prompt_string]
Define what must be true as a string [prompt_text]
Name a Temperature [default is 0]
Define a model
End User Confirmed
If you are using a human in the loop model or if your end users are evaluating results, you can send those results to Empromptu in order to get a better system based on your end users.
Using the UI:
Head to Empromptu's evaluations page
Click add a new evaluation
On the Feedback tab:
Name your Evals
In Code:
Find your session key from your observability or analytics provider
After where you defined your prompt registry, paste this line to send your user confirmed evaluations as the prompt evaluator.
Find where you define you ingest user scored evals
Import empromptus prompt registry into that file
Paste this line
Celebrate!
Data generated
(coming soon)
Last updated