Model Card for Model ID

This model performs open-ended event data generation. Given input documents similar to newswire reporting, the model will output JSON-formatted event data similar to that found in ICEWS or GDELT.

Model Details

Model Description

Developed by: Benjamin J. Radford
Finetuned from model: llama-3.1-8B-Instruct

Getting Started

from huggingface_hub import InferenceClient, login
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    pipeline
)
import torch
from peft import LoraConfig, PeftModel

## You will need to replace this with your Huggingface secret token
## The secret token is required to verify that you have permission
## to load the llama base model.

login(os.getenv("HF_TOKEN")) 

## This is the current model.
model_name = "benradford/synthEDic" 

## Quantization configuration.
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=getattr(torch, "float16"),
    bnb_4bit_use_double_quant=False,
)

## Load llama, the base model
base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8b-Instruct",
    quantization_config=bnb_config
)

## Load the adapter model (synthEDic)
model = PeftModel.from_pretrained(base_model, model_name)

## Load the proper tokenizer from synthEDic
tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side='right')

## Initialize a pipeline for inference
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

## Demo pipeline use:
output = pipe([{"role":"user","content":"Bob demanded concessions from Alice."}], max_new_tokens=256, num_beams=1, return_full_text=False)
print(output)

Model Usage

Note: In its current state (v0.4), this model will be very difficult to use. Proceed with caution.

The model will perform best if data are input in the following JSON-like format:

[{"roll":"user","content":<news like text>},...]

The model will, due to a quirk of finetuning, prepend the text "assistant\n\n" to every output.
Some outputs will not be properly-formatted JSON due to single quotation marks and missing commas.

The outputs can be processed into Python objects as below. This will require installing json_repair first. It can be installed with pip install json_repair.

import json_repair
output = output.replace("assistant\n\n","")
output = json_repair.loads(output)

Current Status

Currently this is version 0.4 of the synthEDic event coding model.
It has only been finetuned on synthetic English training data.
This model is for research purposes only and is prone to hallucinate event data attributes.

Downloads last month: -

Model tree for benradford/synthEDic

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Adapter

(1511)

this model

benradford
/

synthEDic