Quick Start
Quick Start Guide
This guide will walk you through running your first language model evaluation using FreeEval. In just a few minutes, you’ll understand how to configure and execute an evaluation pipeline using our JSON-based configuration system.
Evaluation Philosophy
FreeEval is built on a philosophy of reproducibility and transparency. All evaluation parameters are defined in a single JSON configuration file, ensuring that evaluations can be easily shared, reproduced, and compared.
This approach keeps all configuration details in one place, making it simple to:
- Track changes to evaluation setups
- Share exact evaluation conditions with others
- Ensure consistent evaluations across different environments
Understanding the Configuration File
Let’s examine a typical FreeEval configuration file with detailed comments explaining each section:
{ // Where to save the evaluation results "results_output_path": "./result/results-llama-2-7b-chat-hf-arc_challenge.json",
// Array of evaluation steps to perform "steps": [ { // Type of evaluation (multiple-choice in this case) "step_type": "simple_multiple_choice",
// A descriptive name for this evaluation step "step_name": "Simple MCP on ARC Challenge",
// Whether to save the processed dataset "save_dataset": true,
// Configuration for the dataset "dataset_config": { // Dataset type identifier "type": "arc_challenge",
// Dataset-specific parameters "dataset_kwargs": { // Random seed for reproducibility "seed": 2,
// Which split of the dataset to use "split": "test",
// Dataset identifier (HuggingFace dataset path) "name_or_path": "allenai/ai2_arc",
// Specific configuration of the dataset "config_name": "ARC-Challenge",
// For few-shot evaluations, which split to draw examples from "fewshot_split": "train",
// Number of examples to use in few-shot prompting "fewshot_num": 5 } },
// Configuration for the model inference "inference_config": { // Type of model (remote HuggingFace model in this case) "type": "remote_hf",
// Where to save intermediate results "output_path": "./result",
// Model-specific parameters "inference_kwargs": { // Name of the model "model_name": "llama-2-7b-chat-hf",
// Array of URLs where the model is deployed // You can provide multiple endpoints for load balancing "base_url": [ "http://your-tgi-url:port" ],
// Request timeout in seconds "timeout": 60,
// Number of parallel workers for the evaluation "num_workers": 4,
// Rate limiting parameters "request_limit": 100000, "request_limit_period": 60,
// Whether to save individual model responses "dump_individual_rsp": true,
// Model generation parameters "generation_config": { "max_new_tokens": 20 } } } } ]}Basic Evaluation Workflow
The FreeEval workflow consists of three simple steps:
- Create your JSON configuration file with the specific evaluation setup
- Run the evaluation using the
run.pyscript - Analyze the results saved to your specified output path
Running Your First Evaluation
Step 1: Create Your Configuration File
Start by copying our example configuration file and customizing it for your needs. Create a file named my_evaluation.json:
{ "results_output_path": "./result/my_first_evaluation.json", "steps": [ { "step_type": "simple_multiple_choice", "step_name": "ARC Challenge Evaluation", "save_dataset": true, "dataset_config": { "type": "arc_challenge", "dataset_kwargs": { "seed": 42, "split": "test", "name_or_path": "allenai/ai2_arc", "config_name": "ARC-Challenge", "fewshot_split": "train", "fewshot_num": 5 } }, "inference_config": { "type": "remote_hf", "output_path": "./result", "inference_kwargs": { "model_name": "llama-2-7b-chat-hf", "base_url": [ "http://localhost:8080" // Replace with your model endpoint ], "timeout": 60, "num_workers": 4, "dump_individual_rsp": true, "generation_config": { "max_new_tokens": 20 } } } } ]}Make sure to replace "http://localhost:8080" with the actual URL where your model is deployed.
Step 2: Run the Evaluation
Execute the evaluation with your configuration file:
python run.py -c my_evaluation.jsonUnlike some frameworks, FreeEval doesn’t require setting environment variables for API keys, as all necessary configuration is included directly in the JSON file.
You’ll see progress indicators as the evaluation runs.
Step 3: Check the Results
Once the evaluation completes, you can find your results at the path specified in results_output_path:
cat ./result/my_first_evaluation.jsonExploring Example Configurations
FreeEval comes with several example configurations to help you get started:
ls config/examples/These example configurations cover various evaluation scenarios and can serve as templates for your own evaluations.
Deploying Your Own Model
If you want to evaluate your own model, you can deploy it using the deploy_model.py script included with FreeEval:
python deploy_model.py --model meta-llama/Llama-2-7b-chat-hf --gpus 0 --port 8080This deploys the specified model as an HTTP service that can be referenced in your configuration file.
Next Steps
Congratulations! You’ve run your first evaluation with FreeEval. To learn more:
- Explore Core Concepts to understand the evaluation pipeline
- Learn about the supported models and how to configure them
- Discover various evaluation steps for different assessment types
- See how to create custom datasets for specialized evaluations
For a deeper dive into specific features, check out our detailed documentation sections.