Quick Start

Quick Start Guide

This guide will walk you through running your first language model evaluation using FreeEval. In just a few minutes, you’ll understand how to configure and execute an evaluation pipeline using our JSON-based configuration system.

Evaluation Philosophy

FreeEval is built on a philosophy of reproducibility and transparency. All evaluation parameters are defined in a single JSON configuration file, ensuring that evaluations can be easily shared, reproduced, and compared.

This approach keeps all configuration details in one place, making it simple to:

Track changes to evaluation setups
Share exact evaluation conditions with others
Ensure consistent evaluations across different environments

Understanding the Configuration File

Let’s examine a typical FreeEval configuration file with detailed comments explaining each section:

{
    // Where to save the evaluation results
    "results_output_path": "./result/results-llama-2-7b-chat-hf-arc_challenge.json",

    // Array of evaluation steps to perform
    "steps": [
        {
            // Type of evaluation (multiple-choice in this case)
            "step_type": "simple_multiple_choice",

            // A descriptive name for this evaluation step
            "step_name": "Simple MCP on ARC Challenge",

            // Whether to save the processed dataset
            "save_dataset": true,

            // Configuration for the dataset
            "dataset_config": {
                // Dataset type identifier
                "type": "arc_challenge",

                // Dataset-specific parameters
                "dataset_kwargs": {
                    // Random seed for reproducibility
                    "seed": 2,

                    // Which split of the dataset to use
                    "split": "test",

                    // Dataset identifier (HuggingFace dataset path)
                    "name_or_path": "allenai/ai2_arc",

                    // Specific configuration of the dataset
                    "config_name": "ARC-Challenge",

                    // For few-shot evaluations, which split to draw examples from
                    "fewshot_split": "train",

                    // Number of examples to use in few-shot prompting
                    "fewshot_num": 5
                }
            },

            // Configuration for the model inference
            "inference_config": {
                // Type of model (remote HuggingFace model in this case)
                "type": "remote_hf",

                // Where to save intermediate results
                "output_path": "./result",

                // Model-specific parameters
                "inference_kwargs": {
                    // Name of the model
                    "model_name": "llama-2-7b-chat-hf",

                    // Array of URLs where the model is deployed
                    // You can provide multiple endpoints for load balancing
                    "base_url": [
                        "http://your-tgi-url:port"
                    ],

                    // Request timeout in seconds
                    "timeout": 60,

                    // Number of parallel workers for the evaluation
                    "num_workers": 4,

                    // Rate limiting parameters
                    "request_limit": 100000,
                    "request_limit_period": 60,

                    // Whether to save individual model responses
                    "dump_individual_rsp": true,

                    // Model generation parameters
                    "generation_config": {
                        "max_new_tokens": 20
                    }
                }
            }
        }
    ]
}

Basic Evaluation Workflow

The FreeEval workflow consists of three simple steps:

Create your JSON configuration file with the specific evaluation setup
Run the evaluation using the run.py script
Analyze the results saved to your specified output path

Running Your First Evaluation

Step 1: Create Your Configuration File

Start by copying our example configuration file and customizing it for your needs. Create a file named my_evaluation.json:

{
    "results_output_path": "./result/my_first_evaluation.json",
    "steps": [
        {
            "step_type": "simple_multiple_choice",
            "step_name": "ARC Challenge Evaluation",
            "save_dataset": true,
            "dataset_config": {
                "type": "arc_challenge",
                "dataset_kwargs": {
                    "seed": 42,
                    "split": "test",
                    "name_or_path": "allenai/ai2_arc",
                    "config_name": "ARC-Challenge",
                    "fewshot_split": "train",
                    "fewshot_num": 5
                }
            },
            "inference_config": {
                "type": "remote_hf",
                "output_path": "./result",
                "inference_kwargs": {
                    "model_name": "llama-2-7b-chat-hf",
                    "base_url": [
                        "http://localhost:8080"  // Replace with your model endpoint
                    ],
                    "timeout": 60,
                    "num_workers": 4,
                    "dump_individual_rsp": true,
                    "generation_config": {
                        "max_new_tokens": 20
                    }
                }
            }
        }
    ]
}

Make sure to replace "http://localhost:8080" with the actual URL where your model is deployed.

Step 2: Run the Evaluation

Execute the evaluation with your configuration file:

python run.py -c my_evaluation.json

Unlike some frameworks, FreeEval doesn’t require setting environment variables for API keys, as all necessary configuration is included directly in the JSON file.

You’ll see progress indicators as the evaluation runs.

Step 3: Check the Results

Once the evaluation completes, you can find your results at the path specified in results_output_path:

cat ./result/my_first_evaluation.json

Exploring Example Configurations

FreeEval comes with several example configurations to help you get started:

ls config/examples/

These example configurations cover various evaluation scenarios and can serve as templates for your own evaluations.

Deploying Your Own Model

If you want to evaluate your own model, you can deploy it using the deploy_model.py script included with FreeEval:

python deploy_model.py --model meta-llama/Llama-2-7b-chat-hf --gpus 0 --port 8080

This deploys the specified model as an HTTP service that can be referenced in your configuration file.

Next Steps

Congratulations! You’ve run your first evaluation with FreeEval. To learn more:

Explore Core Concepts to understand the evaluation pipeline
Learn about the supported models and how to configure them
Discover various evaluation steps for different assessment types
See how to create custom datasets for specialized evaluations

For a deeper dive into specific features, check out our detailed documentation sections.