S-GRADES

S-GRADES Studying Generalization of Student Response Assessments in Diverse Evaluative Settings

S-GRADES is a comprehensive benchmarking platform for automatic essay grading models. It provides evaluation across multiple curated datasets with standardized metrics to advance research in educational assessment and natural language processing.

Paper Platform Code Experiments
- Datasets
- Complete Benchmarks
- Evaluations

Overview

S-GRADES provides a standardized evaluation framework for automatic essay grading systems

Complete Benchmark Evaluation

Test your models across all datasets. Only complete submissions appear on the leaderboard for fair comparison.

Standardized Metrics

Aggregate performance using industry-standard metrics including QWK, Pearson correlation, and MAE across all datasets.

Research Focused

Designed for rigorous academic evaluation with hidden test labels and comprehensive assessment protocols.

Datasets

Download all datasets for comprehensive benchmark evaluation

Loading datasets...

๐Ÿ“‹ Benchmark Requirements

  • Download all datasets as train/validation/test splits
  • Train your model using train/validation data (includes human scores)
  • Generate predictions on test data (no human scores provided)
  • Submit prediction CSV files for complete evaluation
  • Only complete benchmarks appear on the official leaderboard

Submit Complete Benchmark

Upload predictions for all datasets to appear on the leaderboard

๐Ÿ“‹ Complete Benchmark Submission

  • Download all datasets using the button above
  • Train your model on the provided train/validation splits
  • Generate predictions on the test splits (unlabeled)
  • Upload CSV files individually using the dropdown selector
  • Track progress as you upload each dataset
  • Submit complete benchmark when all datasets are uploaded

๐Ÿ“„ Required CSV Format (varies by dataset)

# ASAP-AES format:
essay_id,domain1_score
ASAP-AES_test_0,3.5
ASAP-AES_test_1,4.2

# BEEtlE_2way format:
question_id,label
BEEtlE_2way_test_0,1
BEEtlE_2way_test_1,0

id,score_1
OS_Dataset_q1_test_0,85.5

Researcher Information

๐Ÿ“Š Dataset Upload Progress

Overall Progress

0 / 0 datasets
0%

๐Ÿ“ Upload Individual Datasets

Drop CSV file here
or browse files

โœ… Uploaded Datasets

๐Ÿงช Individual Dataset Testing

For development and debugging only. These submissions do NOT appear on the leaderboard.