FreeEval

A modular and extensible framework for conducting trustworthy and efficient automatic evaluations of large language models

Modular Design

Built with modularity in mind, allowing easy extension and customization of evaluation components.

Trustworthy Results

Incorporates meta-evaluation techniques like human evaluation and data contamination detection.

High Performance

Efficient infrastructure supporting multi-node, multi-GPU evaluations at scale.

Comprehensive Support

Works with both open-source and proprietary LLMs through a unified interface.