Financial Transaction Classifier (Item Level 3)

This model classifies German financial transactions into 42 different Item Level 3 categories with 60.2% accuracy.

Model Description

  • Base Model: bert-base-german-cased
  • Task: Multi-class text classification
  • Language: German
  • Training Data: 2106 financial transaction samples from Nextcloud tables
  • Number of Classes: 42
  • Final Validation Accuracy: 60.2%
  • Final Validation F1-Score: 0.5421

Key Features

  • Optimized for Training Data: Only predicts Item_Level_3 categories that appeared in training data
  • No External Dependencies: Categories determined directly from training data
  • Proper Data Splitting: Prevents data leakage with stratified train/validation split
  • Class Balancing: Handles imbalanced classes with computed class weights
  • German Text Processing: Specialized preprocessing for German accounting terms

Intended Use

This model is designed to automatically classify German financial transactions based on account information ("Konto") and descriptions ("Bezeichnung").

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import json

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("schauflerg/fibu-item-level-3-classifier")
model = AutoModelForSequenceClassification.from_pretrained("schauflerg/fibu-item-level-3-classifier")

# Load label mappings (download from repository)
with open('label_mappings.json', 'r', encoding='utf-8') as f:
    label_mappings = json.load(f)

# Example usage
account_code = 7345
description = "kilometergelder"
text = f"Konto: {account_code} (other_expenses); Bezeichnung: {description.lower()}"

inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)

with torch.no_grad():
    outputs = model(**inputs)
    predicted_class_id = torch.argmax(outputs.logits, dim=-1).item()
    predicted_category = label_mappings['id2label'][str(predicted_class_id)]
    confidence = torch.softmax(outputs.logits, dim=-1).max().item()

print(f"Predicted: {predicted_category} (confidence: {confidence:.3f})")

Training Details

  • Training Epochs: Up to 15 (with early stopping)
  • Batch Size: 32 (with gradient accumulation)
  • Learning Rate: 3e-5 with warmup
  • Optimizer: AdamW with weight decay (0.1)
  • Hardware: GPU
  • Mixed Precision: Enabled

Performance Metrics

Metric Value
Validation Accuracy 0.6019
Validation Precision 0.5358
Validation Recall 0.6019
Validation F1-Score 0.5421
Average Confidence 0.5328

Data Sources

Training data was collected from 16 Nextcloud tables containing German financial transaction records with:

  • Account codes (Konto): 3-4 digit numerical codes
  • Descriptions (Bezeichnung): German text descriptions
  • Item Level 3 categories: Target classification labels

Input Format

The model expects input in the format: "Konto: {account_code} ({category}); Bezeichnung: {description}"

Where:

  • account_code is a 3-4 digit number
  • category is determined by account code ranges (assets, liabilities_equity, revenue, etc.)
  • description is the German text description (automatically preprocessed)

Categories

The model can predict 42 different Item Level 3 categories including: Abschreibungen, Aufwendungen für bezogene Leistungen, Betriebs- und Geschäftsausstattung, Betriebskosten, Büro- und Verwaltungsaufwand, Erlöse aus Förderzuschüssen, Forderungen aus Lieferungen und Leistungen, Gebühren und Beiträge, Gewinn-/Verlustvortrag, Halb- und Fertigprodukte...

Limitations

  • Trained specifically on German financial data
  • Performance may vary on data from different domains or time periods
  • Requires specific input format for optimal performance
  • Only predicts categories that were present in training data
  • Model performance depends on the quality and representativeness of training data

Files Included

  • pytorch_model.bin / model.safetensors: Model weights
  • config.json: Model configuration
  • tokenizer.json / vocab.txt: Tokenizer files
  • label_mappings.json: Label mappings (id2label, label2id)
  • preprocessing_components.pkl: Text preprocessing components

Citation

If you use this model, please cite:

@misc{fibu-classifier-2024,
  title={German Financial Transaction Classifier},
  author={Your Name},
  year={2024},
  url={https://huggingface.co/schauflerg/fibu-item-level-3-classifier}
}
Downloads last month
3
Safetensors
Model size
0.1B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results