[ICLR 2024 Spotlight] FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
FLASK (Fine-grained Language Model Evaluation Based on Alignment Skill Sets) is a task-agnostic evaluation protocol for language models, focusing on fine-grained instance-wise skill sets as metrics. This repository serves as the official platform for the FLASK project, providing model-based and human-based evaluation guidelines.