S-GRADES is a comprehensive benchmarking platform for automatic essay grading models. It provides evaluation across multiple curated datasets with standardized metrics to advance research in educational assessment and natural language processing.
S-GRADES provides a standardized evaluation framework for automatic essay grading systems
Test your models across all datasets. Only complete submissions appear on the leaderboard for fair comparison.
Aggregate performance using industry-standard metrics including QWK, Pearson correlation, and MAE across all datasets.
Designed for rigorous academic evaluation with hidden test labels and comprehensive assessment protocols.
Download all datasets for comprehensive benchmark evaluation
Upload predictions for all datasets to appear on the leaderboard
# ASAP-AES format: essay_id,domain1_score ASAP-AES_test_0,3.5 ASAP-AES_test_1,4.2 # BEEtlE_2way format: question_id,label BEEtlE_2way_test_0,1 BEEtlE_2way_test_1,0 id,score_1 OS_Dataset_q1_test_0,85.5
For development and debugging only. These submissions do NOT appear on the leaderboard.