πŸ” Native Log Translator

Maps heterogeneous cloud and OS logs to a unified normalized schema.

Fine-tuned from microsoft/codebert-base using LoRA (PEFT) on a curated dataset
of multi-provider security logs. Trained on Kaggle T4 x2 in FP16.


πŸš€ Quick Start

import torch
from transformers import RobertaTokenizer, RobertaForCausalLM, RobertaConfig
from peft import PeftModel

MODEL_REPO = "Swapnanil09/native-log-translator-qlora"
BASE_MODEL = "microsoft/codebert-base"

tokenizer = RobertaTokenizer.from_pretrained(MODEL_REPO)

config = RobertaConfig.from_pretrained(BASE_MODEL)
config.is_decoder = True

base = RobertaForCausalLM.from_pretrained(
    BASE_MODEL,
    config=config,
    ignore_mismatched_sizes=True,
    torch_dtype=torch.float16,
    device_map="auto",
)
model = PeftModel.from_pretrained(base, MODEL_REPO)
model.eval()

def translate_log(log_input):
    prompt = f"<log>{log_input}</log>\n<schema>"
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    with torch.no_grad():
        out = model.generate(**inputs, max_new_tokens=60, temperature=0.1,
                             do_sample=True, pad_token_id=tokenizer.eos_token_id)
    decoded = tokenizer.decode(out[0], skip_special_tokens=True)
    return decoded.split("<schema>")[-1].strip()

print(translate_log("AzureSignInLogs | ResultType=0"))
# event_type: authentication_success
# provider: azure
# risk_level: low

πŸ“‹ Output Schema

Field Description Values
event_type Normalized event category e.g. authentication_success, privilege_escalation
provider Source cloud / OS azure, aws, gcp, windows, linux, paloalto, cisco, fortinet
risk_level Severity classification low, medium, high, critical

πŸ“¦ Supported Log Sources

Provider Log Type
Azure SignInLogs, Activity, NSGFlowLogs
AWS CloudTrail
GCP Audit Logs
Windows Security Events (4624, 4625, 4688, 4698, 4720, 4732, 1102 ...)
Linux Syslog (auth, kern)
Network Palo Alto, Cisco, Fortinet (CommonSecurityLog)

βš™οΈ Training Details

Setting Value
Base model microsoft/codebert-base
Method LoRA (PEFT)
LoRA rank 16
LoRA alpha 32
Target modules query, key, value
Epochs 15
Batch size 8 per device
Gradient accumulation 4 steps
Learning rate 2e-4
Precision FP16
Hardware Kaggle T4 x2

πŸ“Œ Intended Use

  • SIEM normalization pipelines
  • Multi-cloud SOC log ingestion
  • Security event correlation
  • Threat detection preprocessing

⚠️ Limitations

  • Trained on a small curated dataset β€” production use should involve fine-tuning on your own log corpus
  • May not generalize to vendor-specific log formats not seen during training
  • Not a replacement for rule-based parsers in high-stakes pipelines without validation
Downloads last month
19
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Swapnanil09/native-log-translator-qlora

Adapter
(6)
this model