SmartKNN is a weighted and interpretable extension of classical K-Nearest Neighbours (KNN), designed for real-world tabular machine learning. It automatically learns feature importance, filters weak features, handles missing values, normalizes inputs internally, and consistently achieves higher accuracy and robustness than classical KNN — while maintaining a simple scikit-learn-style API.

Model Details

Model Description SmartKNN improves classical KNN by learning feature weights and applying a weighted Euclidean distance for neighbour selection. It performs normalization, NaN/Inf cleaning, median imputation, outlier clipping, and feature filtering internally. It exposes feature importance for transparency and explainability.

Developed by: Jashwanth Thatipamula
Model type: Weighted KNN for tabular ML
License: MIT
Language(s): Not language-dependent (numerical tabular ML)
Finetuned from model: Not applicable (original algorithm)

Model Sources Repository: https://github.com/thatipamula-jashwanth/smart-knn
Paper (DOI): https://doi.org/10.5281/zenodo.17713746
Demo: Coming soon

Uses

Direct Use • Regression on tabular datasets
• Classification on tabular datasets
• Interpretable ML where feature importance matters
• Real-world ML pipelines with missing values and noisy features

Downstream Use • Research on distance-metric learning
• Explainable ML baselines
• AutoML components for tabular data

Out-of-Scope Use • NLP, image or audio modelling
• Deep learning / GPU models
• Raw categorical datasets without encoding

Bias, Risks, and Limitations

• Instance-based prediction can be slower than tree-based models on large datasets
• Low performance on categorical-only datasets without encoding
• Requires storing full training set for inference

Recommendations Users should numerically encode categorical features before fitting SmartKNN.

How to Get Started with the Model

pip install smart-knn

import pandas as pd from smart_knn import SmartKNN

df = pd.read_csv("data.csv") X = df.drop("target", axis=1) y = df["target"]

model = SmartKNN(k=5) model.fit(X, y)

sample = X.iloc[0] pred = model.predict(sample) print(pred)

Training Details

Training Data SmartKNN is not pretrained and does not ship with training data; users train on their own dataset.

Preprocessing Performed automatically: • Normalization • NaN / Inf cleaning • Median imputation • Outlier clipping • Feature filtering via learned weights

Training Hyperparameters • k = number of neighbors
• weight_threshold = drop features below learned importance

Evaluation

Testing Data Evaluated across 35 regression and 20 classification public tabular datasets.

Metrics

Regression: R², MSE
Classification: Accuracy

Results

• Regression: SmartKNN outperformed classical KNN on 90%+ datasets
• Classification: SmartKNN beat classical KNN on 60% of datasets

Summary

SmartKNN delivers higher accuracy, greater robustness to noise, and better interpretability than classical KNN while preserving its simplicity.

Environmental Impact

SmartKNN requires no GPU and has minimal energy usage. Hardware Type: CPU
Hours used: Minimal
Carbon Emitted: Negligible

Technical Specifications

Model Architecture and Objective • Instance-based learner
• Weighted Euclidean distance metric
• Learned feature weights (MSE + MI + Random Forest)

Compute Infrastructure • Runs efficiently on CPU systems
• Implemented using NumPy

Citation

@software{smartknn2025, author = {Jashwanth Thatipamula}, title = {SmartKNN: An Interpretable Weighted Distance Framework for K-Nearest Neighbours}, year = {2025}, publisher = {Zenodo}, doi = {10.5281/zenodo.17713746}, url = {https://doi.org/10.5281/zenodo.17713746} }

Model Card Authors

Jashwanth Thatipamula

Model Card Contact Contact via GitHub issues: https://github.com/thatipamula-jashwanth/smart-knn

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support