Upload folder using huggingface_hub

742fffd verified 16 days ago

4.48 kB

	---
	base_model: answerdotai/ModernBERT-base
	language:
	- en
	library_name: transformers
	pipeline_tag: text-classification
	tags:
	- reasoning
	- complexity
	- education
	- regression
	- fineweb-edu
	---

	# Reasoning Complexity Classifier

	A ModernBERT-base model fine-tuned to predict the reasoning complexity of educational text on a continuous 1–4 scale. Trained on FineWeb-Edu documents labeled by GPT-5-nano via the OpenAI Batch API (~$20 in credits).

	## Model Description

	This is a regression model (`num_labels=1`, `problem_type="regression"`) that outputs a continuous score. The score can be rounded to the nearest integer to obtain a discrete complexity level. Level 5 (Formal/Abstract reasoning) was excluded from training due to data scarcity; the model's effective range is 1.0–4.0.

	### Complexity Levels

	\| Level \| Name \| Description \| Example \|
	\|-------\|------\|-------------\|---------\|
	\| 1 \| Factual/Declarative \| States facts with no reasoning \| "The Pacific Ocean covers ~165 million km²." \|
	\| 2 \| Single-step reasoning \| One inference or comparison \| "Because boiling point decreases at altitude, water boils faster in Denver than Miami." \|
	\| 3 \| Multi-step reasoning \| 2–4 chained logical steps \| "Demand rose while supply held fixed → prices rose → consumer spending fell → GDP slowed." \|
	\| 4 \| Complex reasoning \| 5+ steps, conditionals, competing factors \| Medical differential diagnosis with branching conditions and exclusion criteria. \|

	## Training Details

	### Data

	- Source: [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) — a curated subset of Common Crawl filtered for educational content.
	- Labeling: ~100,000 documents reservoir-sampled from ~6,000 records per subject category, then labeled with GPT-5-nano via the OpenAI Batch API using structured output (integer 1–5).
	- Splits: 80% train / 10% validation / 10% test (stratified by integer complexity level).
	- Preprocessing: Texts truncated to 8,000 characters before labeling; tokenized to 512 tokens during training with dynamic padding.
	- Level 5 exclusion: Rows labeled as level 5 were excluded from the training set.

	### Hyperparameters

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Base model \| `answerdotai/ModernBERT-base` \|
	\| Epochs \| 3 \|
	\| Batch size \| 32 \|
	\| Learning rate \| 2e-5 \|
	\| Weight decay \| 0.01 \|
	\| Warmup ratio \| 0.1 \|
	\| Max token length \| 512 \|
	\| Optimizer \| AdamW \|
	\| Scheduler \| Linear with warmup \|
	\| AMP \| bf16 (CUDA) \|
	\| Loss \| MSE \|

	### Training History

	\| Epoch \| Train Loss \| Val MAE \| Val Acc (rounded) \| Val Spearman r \|
	\|-------\|-----------\|---------\|-------------------\|----------------\|
	\| 1 \| 0.6002 \| 0.5190 \| 56.98% \| 0.7533 \|
	\| 2 \| 0.3631 \| 0.5040 \| 58.43% \| 0.7597 \|
	\| 3 \| 0.2040 \| 0.5114 \| 58.19% \| 0.7485 \|

	The best checkpoint (by validation MAE) was saved at epoch 2.

	## Evaluation Results

	Evaluated on a held-out test set:

	\| Metric \| Value \|
	\|--------\|-------\|
	\| MSE \| 0.4388 \|
	\| MAE \| 0.5063 \|
	\| Rounded accuracy \| 58.6% \|
	\| Spearman r \| 0.7527 \|

	Interpretation: The model achieves a Spearman correlation of ~0.75 with gold labels, indicating strong ordinal ranking ability. The MAE of ~0.51 means predictions are on average within half a level of the true score when treated as a continuous signal.

	### Output Interpretation

	\| Raw score \| Meaning \|
	\|-----------\|---------\|
	\| ~1.0 \| Factual/Declarative \|
	\| ~2.0 \| Single-step reasoning \|
	\| ~3.0 \| Multi-step reasoning \|
	\| ~4.0 \| Complex reasoning \|

	Clip and round the raw float output to `[1, 4]` for a discrete level.

	## Architecture

	Based on `answerdotai/ModernBERT-base`:

	- Layers: 22 transformer layers (alternating full and sliding attention)
	- Hidden size: 768
	- Attention heads: 12
	- Intermediate size: 1,152
	- Max position embeddings: 8,192
	- Classifier pooling: mean
	- Classifier activation: GELU

	## Limitations

	- Labels are silver-standard (GPT-5-nano), not human-annotated; label noise may affect the ~1.5% of ambiguous texts.
	- Texts are truncated to 512 tokens; very long documents are judged on their first ~512 tokens only.
	- Trained primarily on English educational web text; performance may degrade on other domains or languages.

	## Intended Use

	Designed for data curation pipelines that need to filter or balance training corpora by reasoning complexity — for example, constructing curriculum-ordered datasets for language model training.

	---
	base_model: answerdotai/ModernBERT-base
	language:
	- en
	library_name: transformers
	pipeline_tag: text-classification
	tags:
	- reasoning
	- complexity
	- education
	- regression
	- fineweb-edu
	---

	# Reasoning Complexity Classifier

	A ModernBERT-base model fine-tuned to predict the reasoning complexity of educational text on a continuous 1–4 scale. Trained on FineWeb-Edu documents labeled by GPT-5-nano via the OpenAI Batch API (~$20 in credits).

	## Model Description

	This is a regression model (`num_labels=1`, `problem_type="regression"`) that outputs a continuous score. The score can be rounded to the nearest integer to obtain a discrete complexity level. Level 5 (Formal/Abstract reasoning) was excluded from training due to data scarcity; the model's effective range is 1.0–4.0.

	### Complexity Levels

	\| Level \| Name \| Description \| Example \|
	\|-------\|------\|-------------\|---------\|
	\| 1 \| Factual/Declarative \| States facts with no reasoning \| "The Pacific Ocean covers ~165 million km²." \|
	\| 2 \| Single-step reasoning \| One inference or comparison \| "Because boiling point decreases at altitude, water boils faster in Denver than Miami." \|
	\| 3 \| Multi-step reasoning \| 2–4 chained logical steps \| "Demand rose while supply held fixed → prices rose → consumer spending fell → GDP slowed." \|
	\| 4 \| Complex reasoning \| 5+ steps, conditionals, competing factors \| Medical differential diagnosis with branching conditions and exclusion criteria. \|

	## Training Details

	### Data

	- Source: [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) — a curated subset of Common Crawl filtered for educational content.
	- Labeling: ~100,000 documents reservoir-sampled from ~6,000 records per subject category, then labeled with GPT-5-nano via the OpenAI Batch API using structured output (integer 1–5).
	- Splits: 80% train / 10% validation / 10% test (stratified by integer complexity level).
	- Preprocessing: Texts truncated to 8,000 characters before labeling; tokenized to 512 tokens during training with dynamic padding.
	- Level 5 exclusion: Rows labeled as level 5 were excluded from the training set.

	### Hyperparameters

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Base model \| `answerdotai/ModernBERT-base` \|
	\| Epochs \| 3 \|
	\| Batch size \| 32 \|
	\| Learning rate \| 2e-5 \|
	\| Weight decay \| 0.01 \|
	\| Warmup ratio \| 0.1 \|
	\| Max token length \| 512 \|
	\| Optimizer \| AdamW \|
	\| Scheduler \| Linear with warmup \|
	\| AMP \| bf16 (CUDA) \|
	\| Loss \| MSE \|

	### Training History

	\| Epoch \| Train Loss \| Val MAE \| Val Acc (rounded) \| Val Spearman r \|
	\|-------\|-----------\|---------\|-------------------\|----------------\|
	\| 1 \| 0.6002 \| 0.5190 \| 56.98% \| 0.7533 \|
	\| 2 \| 0.3631 \| 0.5040 \| 58.43% \| 0.7597 \|
	\| 3 \| 0.2040 \| 0.5114 \| 58.19% \| 0.7485 \|

	The best checkpoint (by validation MAE) was saved at epoch 2.

	## Evaluation Results

	Evaluated on a held-out test set:

	\| Metric \| Value \|
	\|--------\|-------\|
	\| MSE \| 0.4388 \|
	\| MAE \| 0.5063 \|
	\| Rounded accuracy \| 58.6% \|
	\| Spearman r \| 0.7527 \|

	Interpretation: The model achieves a Spearman correlation of ~0.75 with gold labels, indicating strong ordinal ranking ability. The MAE of ~0.51 means predictions are on average within half a level of the true score when treated as a continuous signal.

	### Output Interpretation

	\| Raw score \| Meaning \|
	\|-----------\|---------\|
	\| ~1.0 \| Factual/Declarative \|
	\| ~2.0 \| Single-step reasoning \|
	\| ~3.0 \| Multi-step reasoning \|
	\| ~4.0 \| Complex reasoning \|

	Clip and round the raw float output to `[1, 4]` for a discrete level.

	## Architecture

	Based on `answerdotai/ModernBERT-base`:

	- Layers: 22 transformer layers (alternating full and sliding attention)
	- Hidden size: 768
	- Attention heads: 12
	- Intermediate size: 1,152
	- Max position embeddings: 8,192
	- Classifier pooling: mean
	- Classifier activation: GELU

	## Limitations

	- Labels are silver-standard (GPT-5-nano), not human-annotated; label noise may affect the ~1.5% of ambiguous texts.
	- Texts are truncated to 512 tokens; very long documents are judged on their first ~512 tokens only.
	- Trained primarily on English educational web text; performance may degrade on other domains or languages.

	## Intended Use

	Designed for data curation pipelines that need to filter or balance training corpora by reasoning complexity — for example, constructing curriculum-ordered datasets for language model training.