[ICLR 2024 Spotlight] FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
FLASK (Fine-grained Language Model Evaluation Based on Alignment Skill Sets) is a task-agnostic evaluation protocol for language models, focusing on fine-grained instance-wise skill sets as metrics. This repository serves as the official platform for the FLASK project, providing model-based and human-based evaluation guidelines.
FLASK is a comprehensive tool for evaluating language models with a focus on fine-grained instance-wise skill sets. It provides functionalities for model-based evaluation, metadata annotation, domain categorization, and skillset analysis. By supporting OpenAI API integration, FLASK enables users to assess and analyze language models efficiently.