Drone Stalker LSTM 0.3

LSTM model for predicting drone trajectories based on bounding box sequences from video footage.

Model Description

This model predicts future drone positions given past trajectory data. It processes sequences of bounding boxes and outputs predicted future positions, significantly outperforming baseline models on the FRED dataset.

Drone Stalker LSTM 0.3 is an extremely lightweight model with just 2,224 parameters. Despite this, its performance is on par with other models of up to 300k parameters.

Architecture

Model Type: LSTM (Long Short-Term Memory)
Input Features: kinematics (x_center, y_center, x_velocity, y_velocity)
Total Parameters: 2,224
Input Sequence Length: 12 frames (Np=12)
Output Sequence Length: 12 frames (Nf=12)
Frame Interval: 33.3ms (30 FPS)
Image Resolution: 1280x720

Processing Pipeline

Input: Raw bounding boxes [x1, y1, x2, y2] in pixel coordinates
Feature Extraction: Computes normalized center positions and velocities between consecutive frames
LSTM Processing: Processes kinematic feature sequence
Output: Predicted future bounding boxes (normalized [0, 1])

Training Details

Dataset: uFRED-predict-0.4 (private)
Epochs: 10
Learning Rate: 1e-3
Optimizer: Adam
Loss Function: Smooth L1 Loss

Performance

Evaluation metrics on test set:

Average Displacement Error (ADE): 32.63px
Final Displacement Error (FDE): 49.02px
Mean Intersection over Union (mIoU)*: 0.3898

Usage

import torch

# Load the model
model = torch.hub.load_state_dict_from_url(
    'https://huggingface.co/Ecoaetix/DroneStalker-LSTM-0.3/resolve/main/drone_stalker-0.3.pth'
)

# Or download and load manually
from huggingface_hub import hf_hub_download

model_path = hf_hub_download(
    repo_id="Ecoaetix/DroneStalker-LSTM-0.3",
    filename="drone_stalker-0.3.pth"
)

# You'll need the Model class (included as model.py in this repo)
from model import Model

model = Model(Np=12, Nf=12, hidden_dim=16, num_layers=1, dropout=0)
model.load_state_dict(torch.load(model_path))
model.eval()

# Inference
with torch.no_grad():
    # Input: [batch_size, 12, 4] - 12 past bounding boxes [x1, y1, x2, y2]
    predictions = model(past_bboxes)
    # Output: [batch_size, 12, 4] - 12 future bounding boxes (min-max normalized)

Input Format

The model expects input bounding boxes in pixel coordinates:

Shape: [batch_size, 12, 4]
Format: [x1, y1, x2, y2] where (x1,y1) is top-left, (x2,y2) is bottom-right
Image dimensions: 1280x720 pixels

Output Format

The model outputs normalized predictions:

Shape: [batch_size, 12, 4]
Format: [x1_norm, y1_norm, x2_norm, y2_norm] where values are in range [0, 1]
Multiply x-coordinates by 1280 and y-coordinates by 720 to get pixel values

Limitations

Trained specifically on drone footage at 1280x720 resolution
Assumes consistent frame rate of 30 FPS
Best performance on stationary, ground-based tracking scenarios similar to training data
Single object tracking only

Citation

@misc{DroneStalker-LSTM-0.3,
  author = {Jacob Kenney},
  title = {DroneStalker-LSTM-0.3},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/Ecoaetix/DroneStalker-LSTM-0.3}}
}

License

Apache 2.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support