GLiNER Multi PII Domains (v2)
GLiNER is a Named Entity Recognition (NER) model that uses a bidirectional transformer encoder (BERT-like) to identify any entity type. It offers a practical alternative to traditional NER models (limited to predefined entities) and Large Language Models (LLMs), which are flexible but resource-intensive.
This model is fine-tuned from E3-JSI/gliner-multi-pii-domains-v1 using the TAB dataset—English legal-domain text rich in privacy-sensitive entities.
GLiNER Multi PII Domains (v2) recognizes various personally identifiable information (PII) and legal entities in English legal, regulatory, and compliance text, including but not limited to:
personcodelocorgdemdatetimequantitymisc
Usage
Install the GLiNER library:
pip install gliner
Example: Extract Entities from Legal Text
from gliner import GLiNER
# Load the model
model = GLiNER.from_pretrained("aksman18/gliner-multi-pii-domains-v2")
text = """
On March 12, 2024, John Doe, a 45-year-old attorney at Smith & Wesson LLP,
registered property ID PR-45678 in Los Angeles County under policy number INS-90210.
The transaction was recorded under case ID 2024-CIV-789 and regulated by the Data Privacy Act 2020.
"""
labels = [
"person", "datetime", "dem_age", "dem_occupation",
"org_commercial", "id_property", "id_policy", "id_case",
"loc_geopolitical", "id_legislation"
]
entities = model.predict_entities(text, labels, threshold=0.4)
for entity in entities:
print(entity["text"], "=>", entity["label"])
Expected output:
March 12, 2024 => datetime
John Doe => person
45-year-old => dem_age
attorney => dem_occupation
Smith & Wesson LLP => org_commercial
property ID PR-45678 => id_property
Los Angeles County => loc_geopolitical
policy number INS-90210 => id_policy
2024-CIV-789 => id_case
Data Privacy Act 2020 => id_legislation
Example: Extract Entities from a Contract Clause
from gliner import GLiNER
model = GLiNER.from_pretrained("aksman18/gliner-multi-pii-domains-v2")
text = """
This Agreement is entered into between AlphaTech Ltd., registered under Company Registration ID CR-56789,
and Jane Smith, residing at 42 Baker Street, London. The contract shall be governed by the Data Protection Act 2018
and subject to tax ID GB-9087654.
"""
labels = [
"org_commercial", "person", "loc_address",
"id_registration", "id_legislation", "id_tax"
]
entities = model.predict_entities(text, labels, threshold=0.4)
for entity in entities:
print(entity["text"], "=>", entity["label"])
Expected output:
AlphaTech Ltd. => org_commercial
CR-56789 => id_registration
Jane Smith => person
42 Baker Street, London => loc_address
Data Protection Act 2018 => id_legislation
GB-9087654 => id_tax
Model Details
- Base model: E3-JSI/gliner-multi-pii-domains-v1
- Fine-tuned dataset: TAB (legal-domain English)
- Architecture: GLiNER Transformer Encoder
- Task: Named Entity Recognition (PII and legal entities)
- Language: English
- Domain: Legal / Regulatory / Compliance
- Max sequence length: 384 tokens
- Recommended threshold: 0.3–0.5
- Optimized for: Contracts, filings, legal documents, compliance records
Acknowledgements
This model builds on E3-JSI/gliner-multi-pii-domains-v1 and the TAB dataset for improved compliance and privacy precision. It leverages the GLiNER architecture for zero-shot, label-agnostic NER.
Special thanks to GLiNER maintainers and open-source dataset contributors advancing privacy-aware NLP research.
Model tree for aksman18/gliner-multi-pii-domains-v2
Base model
urchade/gliner_multi_pii-v1