Drone Stalker LSTM 0.3
LSTM model for predicting drone trajectories based on bounding box sequences from video footage.
Model Description
This model predicts future drone positions given past trajectory data. It processes sequences of bounding boxes and outputs predicted future positions, significantly outperforming baseline models on the FRED dataset.
Drone Stalker LSTM 0.3 is an extremely lightweight model with just 2,224 parameters. Despite this, its performance is on par with other models of up to 300k parameters.
Architecture
- Model Type: LSTM (Long Short-Term Memory)
- Input Features: kinematics (x_center, y_center, x_velocity, y_velocity)
- Total Parameters: 2,224
- Input Sequence Length: 12 frames (Np=12)
- Output Sequence Length: 12 frames (Nf=12)
- Frame Interval: 33.3ms (30 FPS)
- Image Resolution: 1280x720
Processing Pipeline
- Input: Raw bounding boxes [x1, y1, x2, y2] in pixel coordinates
- Feature Extraction: Computes normalized center positions and velocities between consecutive frames
- LSTM Processing: Processes kinematic feature sequence
- Output: Predicted future bounding boxes (normalized [0, 1])
Training Details
- Dataset: uFRED-predict-0.4 (private)
- Epochs: 10
- Learning Rate: 1e-3
- Optimizer: Adam
- Loss Function: Smooth L1 Loss
Performance
Evaluation metrics on test set:
- Average Displacement Error (ADE): 32.63px
- Final Displacement Error (FDE): 49.02px
- Mean Intersection over Union (mIoU)*: 0.3898
Usage
import torch
# Load the model
model = torch.hub.load_state_dict_from_url(
'https://huggingface.co/Ecoaetix/DroneStalker-LSTM-0.3/resolve/main/drone_stalker-0.3.pth'
)
# Or download and load manually
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(
repo_id="Ecoaetix/DroneStalker-LSTM-0.3",
filename="drone_stalker-0.3.pth"
)
# You'll need the Model class (included as model.py in this repo)
from model import Model
model = Model(Np=12, Nf=12, hidden_dim=16, num_layers=1, dropout=0)
model.load_state_dict(torch.load(model_path))
model.eval()
# Inference
with torch.no_grad():
# Input: [batch_size, 12, 4] - 12 past bounding boxes [x1, y1, x2, y2]
predictions = model(past_bboxes)
# Output: [batch_size, 12, 4] - 12 future bounding boxes (min-max normalized)
Input Format
The model expects input bounding boxes in pixel coordinates:
- Shape:
[batch_size, 12, 4] - Format:
[x1, y1, x2, y2]where (x1,y1) is top-left, (x2,y2) is bottom-right - Image dimensions: 1280x720 pixels
Output Format
The model outputs normalized predictions:
- Shape:
[batch_size, 12, 4] - Format:
[x1_norm, y1_norm, x2_norm, y2_norm]where values are in range [0, 1] - Multiply x-coordinates by 1280 and y-coordinates by 720 to get pixel values
Limitations
- Trained specifically on drone footage at 1280x720 resolution
- Assumes consistent frame rate of 30 FPS
- Best performance on stationary, ground-based tracking scenarios similar to training data
- Single object tracking only
Citation
@misc{DroneStalker-LSTM-0.3,
author = {Jacob Kenney},
title = {DroneStalker-LSTM-0.3},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/Ecoaetix/DroneStalker-LSTM-0.3}}
}
License
Apache 2.0
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support

