Financial Transaction Classifier (Item Level 3)
This model classifies German financial transactions into 42 different Item Level 3 categories with 60.2% accuracy.
Model Description
- Base Model: bert-base-german-cased
- Task: Multi-class text classification
- Language: German
- Training Data: 2106 financial transaction samples from Nextcloud tables
- Number of Classes: 42
- Final Validation Accuracy: 60.2%
- Final Validation F1-Score: 0.5421
Key Features
- Optimized for Training Data: Only predicts Item_Level_3 categories that appeared in training data
- No External Dependencies: Categories determined directly from training data
- Proper Data Splitting: Prevents data leakage with stratified train/validation split
- Class Balancing: Handles imbalanced classes with computed class weights
- German Text Processing: Specialized preprocessing for German accounting terms
Intended Use
This model is designed to automatically classify German financial transactions based on account information ("Konto") and descriptions ("Bezeichnung").
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import json
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("schauflerg/fibu-item-level-3-classifier")
model = AutoModelForSequenceClassification.from_pretrained("schauflerg/fibu-item-level-3-classifier")
# Load label mappings (download from repository)
with open('label_mappings.json', 'r', encoding='utf-8') as f:
label_mappings = json.load(f)
# Example usage
account_code = 7345
description = "kilometergelder"
text = f"Konto: {account_code} (other_expenses); Bezeichnung: {description.lower()}"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
with torch.no_grad():
outputs = model(**inputs)
predicted_class_id = torch.argmax(outputs.logits, dim=-1).item()
predicted_category = label_mappings['id2label'][str(predicted_class_id)]
confidence = torch.softmax(outputs.logits, dim=-1).max().item()
print(f"Predicted: {predicted_category} (confidence: {confidence:.3f})")
Training Details
- Training Epochs: Up to 15 (with early stopping)
- Batch Size: 32 (with gradient accumulation)
- Learning Rate: 3e-5 with warmup
- Optimizer: AdamW with weight decay (0.1)
- Hardware: GPU
- Mixed Precision: Enabled
Performance Metrics
| Metric | Value |
|---|---|
| Validation Accuracy | 0.6019 |
| Validation Precision | 0.5358 |
| Validation Recall | 0.6019 |
| Validation F1-Score | 0.5421 |
| Average Confidence | 0.5328 |
Data Sources
Training data was collected from 16 Nextcloud tables containing German financial transaction records with:
- Account codes (Konto): 3-4 digit numerical codes
- Descriptions (Bezeichnung): German text descriptions
- Item Level 3 categories: Target classification labels
Input Format
The model expects input in the format: "Konto: {account_code} ({category}); Bezeichnung: {description}"
Where:
account_codeis a 3-4 digit numbercategoryis determined by account code ranges (assets, liabilities_equity, revenue, etc.)descriptionis the German text description (automatically preprocessed)
Categories
The model can predict 42 different Item Level 3 categories including: Abschreibungen, Aufwendungen für bezogene Leistungen, Betriebs- und Geschäftsausstattung, Betriebskosten, Büro- und Verwaltungsaufwand, Erlöse aus Förderzuschüssen, Forderungen aus Lieferungen und Leistungen, Gebühren und Beiträge, Gewinn-/Verlustvortrag, Halb- und Fertigprodukte...
Limitations
- Trained specifically on German financial data
- Performance may vary on data from different domains or time periods
- Requires specific input format for optimal performance
- Only predicts categories that were present in training data
- Model performance depends on the quality and representativeness of training data
Files Included
pytorch_model.bin/model.safetensors: Model weightsconfig.json: Model configurationtokenizer.json/vocab.txt: Tokenizer fileslabel_mappings.json: Label mappings (id2label, label2id)preprocessing_components.pkl: Text preprocessing components
Citation
If you use this model, please cite:
@misc{fibu-classifier-2024,
title={German Financial Transaction Classifier},
author={Your Name},
year={2024},
url={https://huggingface.co/schauflerg/fibu-item-level-3-classifier}
}
- Downloads last month
- 3
Evaluation results
- Validation Accuracyself-reported0.602
- Validation F1-Scoreself-reported0.542
- Validation Precisionself-reported0.536
- Validation Recallself-reported0.602