Drone Stalker LSTM 0.3

Demo GIF

LSTM model for predicting drone trajectories based on bounding box sequences from video footage.

Model Description

This model predicts future drone positions given past trajectory data. It processes sequences of bounding boxes and outputs predicted future positions, significantly outperforming baseline models on the FRED dataset.

Drone Stalker LSTM 0.3 is an extremely lightweight model with just 2,224 parameters. Despite this, its performance is on par with other models of up to 300k parameters.

Architecture

  • Model Type: LSTM (Long Short-Term Memory)
  • Input Features: kinematics (x_center, y_center, x_velocity, y_velocity)
  • Total Parameters: 2,224
  • Input Sequence Length: 12 frames (Np=12)
  • Output Sequence Length: 12 frames (Nf=12)
  • Frame Interval: 33.3ms (30 FPS)
  • Image Resolution: 1280x720

Processing Pipeline

  1. Input: Raw bounding boxes [x1, y1, x2, y2] in pixel coordinates
  2. Feature Extraction: Computes normalized center positions and velocities between consecutive frames
  3. LSTM Processing: Processes kinematic feature sequence
  4. Output: Predicted future bounding boxes (normalized [0, 1])

Training Details

  • Dataset: uFRED-predict-0.4 (private)
  • Epochs: 10
  • Learning Rate: 1e-3
  • Optimizer: Adam
  • Loss Function: Smooth L1 Loss

Performance

Evaluation metrics on test set:

  • Average Displacement Error (ADE): 32.63px
  • Final Displacement Error (FDE): 49.02px
  • Mean Intersection over Union (mIoU)*: 0.3898

Performance Comparison Chart

Usage

import torch

# Load the model
model = torch.hub.load_state_dict_from_url(
    'https://huggingface.co/Ecoaetix/DroneStalker-LSTM-0.3/resolve/main/drone_stalker-0.3.pth'
)

# Or download and load manually
from huggingface_hub import hf_hub_download

model_path = hf_hub_download(
    repo_id="Ecoaetix/DroneStalker-LSTM-0.3",
    filename="drone_stalker-0.3.pth"
)

# You'll need the Model class (included as model.py in this repo)
from model import Model

model = Model(Np=12, Nf=12, hidden_dim=16, num_layers=1, dropout=0)
model.load_state_dict(torch.load(model_path))
model.eval()

# Inference
with torch.no_grad():
    # Input: [batch_size, 12, 4] - 12 past bounding boxes [x1, y1, x2, y2]
    predictions = model(past_bboxes)
    # Output: [batch_size, 12, 4] - 12 future bounding boxes (min-max normalized)

Input Format

The model expects input bounding boxes in pixel coordinates:

  • Shape: [batch_size, 12, 4]
  • Format: [x1, y1, x2, y2] where (x1,y1) is top-left, (x2,y2) is bottom-right
  • Image dimensions: 1280x720 pixels

Output Format

The model outputs normalized predictions:

  • Shape: [batch_size, 12, 4]
  • Format: [x1_norm, y1_norm, x2_norm, y2_norm] where values are in range [0, 1]
  • Multiply x-coordinates by 1280 and y-coordinates by 720 to get pixel values

Limitations

  • Trained specifically on drone footage at 1280x720 resolution
  • Assumes consistent frame rate of 30 FPS
  • Best performance on stationary, ground-based tracking scenarios similar to training data
  • Single object tracking only

Citation

@misc{DroneStalker-LSTM-0.3,
  author = {Jacob Kenney},
  title = {DroneStalker-LSTM-0.3},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/Ecoaetix/DroneStalker-LSTM-0.3}}
}

License

Apache 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support