Financial Transaction Classifier (Item Level 3)

This model classifies German financial transactions into 42 different Item Level 3 categories with 60.2% accuracy.

Model Description

Base Model: bert-base-german-cased
Task: Multi-class text classification
Language: German
Training Data: 2106 financial transaction samples from Nextcloud tables
Number of Classes: 42
Final Validation Accuracy: 60.2%
Final Validation F1-Score: 0.5421

Key Features

Optimized for Training Data: Only predicts Item_Level_3 categories that appeared in training data
No External Dependencies: Categories determined directly from training data
Proper Data Splitting: Prevents data leakage with stratified train/validation split
Class Balancing: Handles imbalanced classes with computed class weights
German Text Processing: Specialized preprocessing for German accounting terms

Intended Use

This model is designed to automatically classify German financial transactions based on account information ("Konto") and descriptions ("Bezeichnung").

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import json

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("schauflerg/fibu-item-level-3-classifier")
model = AutoModelForSequenceClassification.from_pretrained("schauflerg/fibu-item-level-3-classifier")

# Load label mappings (download from repository)
with open('label_mappings.json', 'r', encoding='utf-8') as f:
    label_mappings = json.load(f)

# Example usage
account_code = 7345
description = "kilometergelder"
text = f"Konto: {account_code} (other_expenses); Bezeichnung: {description.lower()}"

inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)

with torch.no_grad():
    outputs = model(**inputs)
    predicted_class_id = torch.argmax(outputs.logits, dim=-1).item()
    predicted_category = label_mappings['id2label'][str(predicted_class_id)]
    confidence = torch.softmax(outputs.logits, dim=-1).max().item()

print(f"Predicted: {predicted_category} (confidence: {confidence:.3f})")

Training Details

Training Epochs: Up to 15 (with early stopping)
Batch Size: 32 (with gradient accumulation)
Learning Rate: 3e-5 with warmup
Optimizer: AdamW with weight decay (0.1)
Hardware: GPU
Mixed Precision: Enabled

Performance Metrics

Metric	Value
Validation Accuracy	0.6019
Validation Precision	0.5358
Validation Recall	0.6019
Validation F1-Score	0.5421
Average Confidence	0.5328

Data Sources

Training data was collected from 16 Nextcloud tables containing German financial transaction records with:

Account codes (Konto): 3-4 digit numerical codes
Descriptions (Bezeichnung): German text descriptions
Item Level 3 categories: Target classification labels

Input Format

The model expects input in the format: "Konto: {account_code} ({category}); Bezeichnung: {description}"

Where:

account_code is a 3-4 digit number
category is determined by account code ranges (assets, liabilities_equity, revenue, etc.)
description is the German text description (automatically preprocessed)

Limitations

Trained specifically on German financial data
Performance may vary on data from different domains or time periods
Requires specific input format for optimal performance
Only predicts categories that were present in training data
Model performance depends on the quality and representativeness of training data

Files Included

pytorch_model.bin / model.safetensors: Model weights
config.json: Model configuration
tokenizer.json / vocab.txt: Tokenizer files
label_mappings.json: Label mappings (id2label, label2id)
preprocessing_components.pkl: Text preprocessing components

Citation

If you use this model, please cite:

@misc{fibu-classifier-2024,
  title={German Financial Transaction Classifier},
  author={Your Name},
  year={2024},
  url={https://huggingface.co/schauflerg/fibu-item-level-3-classifier}
}

Downloads last month: 3

Safetensors

Model size

0.1B params

Tensor type

F16

Evaluation results

Validation Accuracy
self-reported

0.602
Validation F1-Score
self-reported

0.542
Validation Precision
self-reported

0.536
Validation Recall
self-reported

0.602

Metadata error: specify a dataset to view leaderboard