GLiNER Multi PII Domains (v2)

GLiNER is a Named Entity Recognition (NER) model that uses a bidirectional transformer encoder (BERT-like) to identify any entity type. It offers a practical alternative to traditional NER models (limited to predefined entities) and Large Language Models (LLMs), which are flexible but resource-intensive.

This model is fine-tuned from E3-JSI/gliner-multi-pii-domains-v1 using the TAB dataset—English legal-domain text rich in privacy-sensitive entities.

GLiNER Multi PII Domains (v2) recognizes various personally identifiable information (PII) and legal entities in English legal, regulatory, and compliance text, including but not limited to:

  • person
  • code
  • loc
  • org
  • dem
  • datetime
  • quantity
  • misc

Usage

Install the GLiNER library:

pip install gliner

Example: Extract Entities from Legal Text

from gliner import GLiNER

# Load the model
model = GLiNER.from_pretrained("aksman18/gliner-multi-pii-domains-v2")

text = """
On March 12, 2024, John Doe, a 45-year-old attorney at Smith & Wesson LLP,
registered property ID PR-45678 in Los Angeles County under policy number INS-90210.
The transaction was recorded under case ID 2024-CIV-789 and regulated by the Data Privacy Act 2020.
"""

labels = [
    "person", "datetime", "dem_age", "dem_occupation",
    "org_commercial", "id_property", "id_policy", "id_case",
    "loc_geopolitical", "id_legislation"
]

entities = model.predict_entities(text, labels, threshold=0.4)

for entity in entities:
    print(entity["text"], "=>", entity["label"])

Expected output:

March 12, 2024 => datetime
John Doe => person
45-year-old => dem_age
attorney => dem_occupation
Smith & Wesson LLP => org_commercial
property ID PR-45678 => id_property
Los Angeles County => loc_geopolitical
policy number INS-90210 => id_policy
2024-CIV-789 => id_case
Data Privacy Act 2020 => id_legislation

Example: Extract Entities from a Contract Clause

from gliner import GLiNER

model = GLiNER.from_pretrained("aksman18/gliner-multi-pii-domains-v2")

text = """
This Agreement is entered into between AlphaTech Ltd., registered under Company Registration ID CR-56789,
and Jane Smith, residing at 42 Baker Street, London. The contract shall be governed by the Data Protection Act 2018
and subject to tax ID GB-9087654.
"""

labels = [
    "org_commercial", "person", "loc_address",
    "id_registration", "id_legislation", "id_tax"
]

entities = model.predict_entities(text, labels, threshold=0.4)

for entity in entities:
    print(entity["text"], "=>", entity["label"])

Expected output:

AlphaTech Ltd. => org_commercial
CR-56789 => id_registration
Jane Smith => person
42 Baker Street, London => loc_address
Data Protection Act 2018 => id_legislation
GB-9087654 => id_tax

Model Details

  • Base model: E3-JSI/gliner-multi-pii-domains-v1
  • Fine-tuned dataset: TAB (legal-domain English)
  • Architecture: GLiNER Transformer Encoder
  • Task: Named Entity Recognition (PII and legal entities)
  • Language: English
  • Domain: Legal / Regulatory / Compliance
  • Max sequence length: 384 tokens
  • Recommended threshold: 0.3–0.5
  • Optimized for: Contracts, filings, legal documents, compliance records

Acknowledgements

This model builds on E3-JSI/gliner-multi-pii-domains-v1 and the TAB dataset for improved compliance and privacy precision. It leverages the GLiNER architecture for zero-shot, label-agnostic NER.

Special thanks to GLiNER maintainers and open-source dataset contributors advancing privacy-aware NLP research.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aksman18/gliner-multi-pii-domains-v2

Finetuned
(1)
this model