Empromptu Docs
  • Welcome
  • Getting Started
    • Take a Tour of Empromptu
    • How do you define Accuracy?
    • Quickstart
      • Quickstart - Archive
  • Advanced Configuration
  • No App, Use Prompt Optimizer
  • Examples
  • Basics
    • Projects and Tasks
    • Prompts Overview
      • Manual Optimization
      • Automatic Optimization
    • Input Optimization
    • Edge Case Optimizer
    • Evaluations
    • Models
    • Data Generator
    • Core Concepts
    • Usage Guide
      • Data Viewer
      • Performance Over Time
      • Prompt Explorer
      • Show Input Space
      • Prompt Performance
      • Prompt Playground
    • Best Practices
    • FAQ
    • Troubleshooting
Powered by GitBook
On this page
  • What is an Evaluation?
  • Standard Evaluation Templates from Empromptu
  • Manually defined
  • End User Confirmed
  • Data generated
  1. Basics

Evaluations

What is an Evaluation?

Evaluations are examples or statements that judge the quality or performance of your LLM outputs. In other words, evaluations (evals for short) are statements that tell your LLM Application that it is performing the way it's human wants it to.

You can evaluate a number of factors such as relevance, hallucination or if the LLM is being mean.

With evals, when you adjust or Empromptu automatically adjusts your prompts, you will if that prompt is more or less performant.

There are 4 ways to define your evaluations in Empromptu

  1. Standard templates from Empromptu

  2. Manually defined

  3. End User Confirmed

  4. Data generated

Standard Evaluation Templates from Empromptu

If you do not have any evaluation statements (yet) you can use Empromptu's UI to select an example set.

Using the UI:

  1. Head to Empromptu's evaluations page

  2. Click add a new evaluation

  3. From the standard tab select a template eval set.

In Code:

Manually defined

Using the UI:

  1. Head to Empromptu's evaluations page

  2. Click add a new evaluation

  3. On the custom tab write your eval statements by:

    1. Define a name

    2. Description of what is to be evaluated

    3. Evaluation Criteria which is your eval statements

    4. Expected output what is the output supposed to look like

    5. Model you want to use

    6. Temperature

In Code:

  1. Name the task [string] (optional) Define an embedding model name

  2. Name the prompt or prompt [prompt_string]

  3. Define what must be true as a string [prompt_text]

  4. Name a Temperature [default is 0]

  5. Define a model

my_task = {
    'prompt_family_name': "summary_for_langchain",
    ‘Embedding_model’:’small-3’,  (OPTIONAL)
    'prompts': [
        {
            'prompt_name': 'prompt_1', 
            'Prompt_text':prompt_text_1, (eg. “DO thing X to this text: {{scraped_text}}”
            'temperature': 0.6, (OPTIONAL)
            'model': 'gpt-4o-mini',
        },
       ….
    ],
    'evals': [ # The short statements that will be used to grade the prompt's accuracy
        {
            'eval_name': 'extracted_truth',
            'eval_text': 'The information extracted is factual and nothing incorrect was kept.'
        },
        {
            'eval_name': 'extracted_completely',
            'eval_text': 'The information was extracted completely and nothing important is missing.'
        },
    ]
}
prompt_registry.register_prompt(my_task)

End User Confirmed

If you are using a human in the loop model or if your end users are evaluating results, you can send those results to Empromptu in order to get a better system based on your end users.

Using the UI:

  1. Head to Empromptu's evaluations page

  2. Click add a new evaluation

  3. On the Feedback tab:

    1. Name your Evals

In Code:

  1. Find your session key from your observability or analytics provider

  2. After where you defined your prompt registry, paste this line to send your user confirmed evaluations as the prompt evaluator.

# find this line of code where you defined your prompts
#input_data = {"prompt_text": ModelUtils.random_article()}
#Paste this line after the line above
my_thread_key =<your_thread_key_UID>
prompt_registry.new_thread(thread_key=my_thread_key)
  1. Find where you define you ingest user scored evals

  2. Import empromptus prompt registry into that file

  3. Paste this line

prompt_registry.annotate_data(my_thread_key, {'user_score': <the_user_score>})
  1. Celebrate!

Data generated

(coming soon)

PreviousEdge Case OptimizerNextModels

Last updated 2 months ago