File size: 8,529 Bytes
d87abe2
 
 
 
 
 
 
 
 
d7ce272
d87abe2
 
 
d7ce272
 
291505e
 
7edcd55
 
d7ce272
7edcd55
d7ce272
7edcd55
d7ce272
7edcd55
d7ce272
7edcd55
d7ce272
7edcd55
d7ce272
 
 
 
7edcd55
d7ce272
 
 
7edcd55
d7ce272
 
 
7edcd55
d7ce272
 
 
7edcd55
d7ce272
 
 
 
 
7edcd55
d7ce272
7edcd55
 
 
 
d7ce272
 
 
7edcd55
d7ce272
 
 
7edcd55
 
d7ce272
 
 
7edcd55
d7ce272
7edcd55
 
d7ce272
 
 
7edcd55
 
d7ce272
 
 
7edcd55
d7ce272
 
7edcd55
d7ce272
7edcd55
d7ce272
7edcd55
d7ce272
7edcd55
d7ce272
7edcd55
d7ce272
7edcd55
d7ce272
7edcd55
d7ce272
7edcd55
d7ce272
 
7edcd55
d7ce272
7edcd55
d7ce272
7edcd55
d7ce272
7edcd55
d7ce272
7edcd55
d7ce272
7edcd55
d7ce272
7edcd55
 
 
 
d7ce272
 
7edcd55
d7ce272
7edcd55
d7ce272
7edcd55
d7ce272
7edcd55
d7ce272
7edcd55
d7ce272
7edcd55
 
d7ce272
7edcd55
d7ce272
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7edcd55
 
291505e
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
---
license: cc-by-4.0
language:
- en
- es
- fr
- de
metrics:
- accuracy
- f1
tags:
- misinformation
- engineering
- robustness
- adversarial
datasets:
- Stevebankz/Emc
---

# Model Card for Engineering Misinformation Detection Models

This repository provides a suite of models trained on the **Engineering Misinformation Corpus (EMC)**, introduced in our paper *"The Brittleness of Transformer Feature Fusion: A Comparative Study of Model Robustness in Engineering Misinformation Detection (2025)"*.  

The models are designed to detect **AI-generated misinformation** in **safety-critical engineering documents**. Each represents a different paradigm: traditional feature-based learning, end-to-end Transformers, and hybrid fusion models.

---

## Model Details

### Models Included
1. **XLM-R (Text Only)**  
   - Transformer-based classifier (XLM-RoBERTa) trained end-to-end on raw text.  
   - Files: `config.json`, `model.safetensors`, `sentencepiece.bpe.model`, `tokenizer_config.json`, `tokenizer.json`.

2. **Simple Fusion (Text + Features, Concatenation)**  
   - Combines XLM-R embeddings with a 12-dimensional engineered feature set via naive concatenation.  
   - Files: `fusion_simple.pt`, `scaler.pkl`.

3. **Gated Fusion (Text + Features, Dynamic Arbitration)**  
   - Combines XLM-R embeddings with engineered features using a gating mechanism to dynamically arbitrate signals.  
   - Files: `fusion_gated.pt`, `scaler.pkl`.

4. **XGBoost + Features (Feature-Only Baseline)**  
   - Lightweight tree-based model trained solely on 12D engineered features.  
   - Files: `xgb_model.json`, `scaler.pkl`.

- **Developed by:** Steve Nwaiwu and collaborators  
- **Model type:** Comparative benchmark (feature-based, Transformer-based, and hybrid fusion approaches)  
- **Languages:** Primarily English, with experimental subsets in ES/FR/DE  
- **License:** CC-BY 4.0  
- **Finetuned from:** `xlm-roberta-base` (for text-based components)

---

## Uses

### Direct Use
- **Detect AI-generated misinformation** in engineering and technical documentation.  
- Benchmark adversarial robustness across modeling paradigms.  
- Educational use for comparing model interpretability and brittleness.  

### Downstream Use
- Adaptation to domain-specific technical corpora (e.g., biomedical, aerospace, safety reports).  
- Fusion-based approaches can be extended with domain-specific structured features.

### Out-of-Scope Use
- Not suitable for **general misinformation detection** (e.g., politics, social media).  
- Not validated for **non-technical or informal text domains**.  
- Should not be used as a standalone safety system without human oversight.

---

## Bias, Risks, and Limitations
- **Bias in training corpus**: While curated for engineering, domain coverage may be uneven across subfields.  
- **Fusion brittleness**: Naive feature fusion models fail under semantic adversarial attacks (e.g., synonym swaps).  
- **Language limits**: Non-English results were confounded by dataset artifacts; multilingual robustness is not yet validated.  

### Recommendations
- Use **XLM-R** when robustness to adversarial perturbations is critical.  
- Use **XGBoost** in resource-constrained environments, but pair with human-in-the-loop oversight.
- Avoid **naive fusion** in safety-critical deployment.  

### Training Details
- Training Data

- **Dataset**: Engineering Misinformation Corpus (EMC)

- **Splits**: Train / Validation / Test

Includes both real engineering documents and AI-generated misinformation.

- Training Procedure

- **XLM-R models**: finetuned with AdamW optimizer, learning rate warmup, max length 256.

- **Fusion models**: trained with joint optimization of Transformer encoder + fusion classifier.

- **XGBoost**: trained with grid-searched hyperparameters (depth, learning rate, estimators).

### Technical Specifications
- Architectures

- XLM-RoBERTa: Transformer encoder (base size)

- Simple Fusion: Concatenation of Transformer pooled embedding + 12D features

- Gated Fusion: Transformer pooled embedding + gated feature projection (64D)

- XGBoost: Gradient-boosted decision trees

### Software

- PyTorch, Hugging Face Transformers, scikit-learn, XGBoost




### Evaluation
- Metrics

- Macro F1, Micro F1, and AP (Average Precision).

- Adversarial Robustness tested under:

- Structural perturbations (typos, casing)

- Semantic/Cue perturbations (synonym swap, term masking)

- Feature-targeted perturbations (numeric & unit corruption)


---

## How to Get Started with the Models

### Example: Load XGBoost + Features
```python
import joblib, xgboost as xgb
import numpy as np

# Load scaler + model
scaler = joblib.load("scaler.pkl")
model = xgb.XGBClassifier()
model.load_model("xgb_model.json")

# Example features (12D vector extracted with your pipeline)
x = np.array([[120, 30, 5, 0.0, 0.0, 0.05, 10, 3, 1, 0, 12.5, 0.33]])
x_scaled = scaler.transform(x)
pred = model.predict(x_scaled)
print("Predicted class:", pred[0])

### Example: Load XLM-R (Text Only)
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("your-username/emc-models", subfolder="xlmr_text_only")
model = AutoModelForSequenceClassification.from_pretrained("your-username/emc-models", subfolder="xlmr_text_only")

text = "For safety, follow the shutdown procedure."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
print(outputs.logits.softmax(-1))

### Example: Load GATED FUSION
import torch
import joblib
import numpy as np
import pandas as pd
from transformers import AutoTokenizer, XLMRobertaModel

# ---- Load tokenizer and scaler ----
tokenizer = AutoTokenizer.from_pretrained("your-username/emc-models/xlmr_text_only")
scaler = joblib.load("scaler.pkl")

# ---- Define Gated Fusion model (must match training definition) ----
import torch.nn as nn

class GatedFusion(nn.Module):
    def __init__(self, model_name: str, n_feats: int = 12, n_labels: int = 2, feat_proj: int = 64):
        super().__init__()
        self.encoder = XLMRobertaModel.from_pretrained(model_name)
        H = self.encoder.config.hidden_size
        self.fe_proj = nn.Sequential(nn.Linear(n_feats, feat_proj), nn.ReLU())
        self.gate    = nn.Sequential(
            nn.Linear(H + n_feats, 64), nn.ReLU(),
            nn.Linear(64, 1), nn.Sigmoid()
        )
        self.dropout = nn.Dropout(0.1)
        self.classifier = nn.Linear(H + feat_proj, n_labels)

    def forward(self, input_ids, attention_mask, feats):
        out = self.encoder(input_ids=input_ids, attention_mask=attention_mask, return_dict=True)
        mask = attention_mask.unsqueeze(-1).float()
        pooled = (out.last_hidden_state * mask).sum(dim=1) / mask.sum(dim=1).clamp(min=1e-6)
        alpha = self.gate(torch.cat([pooled, feats], dim=1))
        ef    = self.fe_proj(feats)
        fused = torch.cat([pooled, alpha * ef], dim=1)
        return self.classifier(self.dropout(fused))

# ---- Load trained weights ----
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = GatedFusion("xlm-roberta-base", n_feats=12, n_labels=2).to(device)
model.load_state_dict(torch.load("fusion_gated.pt", map_location=device), strict=False)
model.eval()

# ---- Example input ----
text = "For safety, follow the shutdown procedure."
features = np.array([[120, 30, 5, 0.0, 0.0, 0.05, 10, 3, 1, 0, 12.5, 0.33]])  # replace with your feature extractor
features_scaled = scaler.transform(features)

# ---- Tokenize text ----
enc = tokenizer(text, truncation=True, padding="max_length", max_length=256, return_tensors="pt")
input_ids = enc["input_ids"].to(device)
attn_mask = enc["attention_mask"].to(device)
feats = torch.tensor(features_scaled, dtype=torch.float32).to(device)

# ---- Run model ----
with torch.no_grad():
    logits = model(input_ids, attn_mask, feats)
    probs = torch.softmax(logits, dim=-1).cpu().numpy()[0]

print("Prediction probs:", probs)
print("Predicted class:", probs.argmax())  # 0 = Real, 1 = Misinformation


# ---- Citation -----

@article{NWAIWU2025107783,
title = {The brittleness of transformer feature fusion: A comparative study of model robustness in engineering misinformation detection},
journal = {Results in Engineering},
volume = {28},
pages = {107783},
year = {2025},
issn = {2590-1230},
doi = {https://doi.org/10.1016/j.rineng.2025.107783},
url = {https://www.sciencedirect.com/science/article/pii/S2590123025038368},
author = {Steve Nwaiwu and Nipat Jongsawat and Anucha Tungkasthan}}