S-GRADES: Studying Generalization of Student Response Assessments in Diverse Evaluative Settings

S-GRADES is a comprehensive benchmarking platform for automatic essay grading models. It provides evaluation across multiple curated datasets with standardized metrics to advance research in educational assessment and natural language processing.

Paper Platform Code Experiments

- Datasets

- Complete Benchmarks

- Evaluations

Submit Complete Benchmark View Leaderboard

Overview

S-GRADES provides a standardized evaluation framework for automatic essay grading systems

Complete Benchmark Evaluation

Test your models across all datasets. Only complete submissions appear on the leaderboard for fair comparison.

Standardized Metrics

Aggregate performance using industry-standard metrics including QWK, Pearson correlation, and MAE across all datasets.

Research Focused

Designed for rigorous academic evaluation with hidden test labels and comprehensive assessment protocols.

Datasets

Download all datasets for comprehensive benchmark evaluation

Loading datasets...

📋 Benchmark Requirements

Download all datasets as train/validation/test splits
Train your model using train/validation data (includes human scores)
Generate predictions on test data (no human scores provided)
Submit prediction CSV files for complete evaluation
Only complete benchmarks appear on the official leaderboard

Submit Complete Benchmark

Upload predictions for all datasets to appear on the leaderboard

📋 Complete Benchmark Submission

Download all datasets using the button above
Train your model on the provided train/validation splits
Generate predictions on the test splits (unlabeled)
Upload CSV files individually using the dropdown selector
Track progress as you upload each dataset
Submit complete benchmark when all datasets are uploaded

📄 Required CSV Format (varies by dataset)

# ASAP-AES format:
essay_id,domain1_score
ASAP-AES_test_0,3.5
ASAP-AES_test_1,4.2

# BEEtlE_2way format:
question_id,label
BEEtlE_2way_test_0,1
BEEtlE_2way_test_1,0

id,score_1
OS_Dataset_q1_test_0,85.5

Researcher Information

Researcher Name *

Email Address *

Model Name *

Model Description

📊 Dataset Upload Progress

Overall Progress

0 / 0 datasets

📁 Upload Individual Datasets

Select Dataset *

CSV File *

Drop CSV file here
or browse files

✅ Uploaded Datasets

🧪 Individual Dataset Testing

For development and debugging only. These submissions do NOT appear on the leaderboard.

S-GRADES