SmartKNN is a weighted and interpretable extension of classical K-Nearest Neighbours (KNN), designed for real-world tabular machine learning. It automatically learns feature importance, filters weak features, handles missing values, normalizes inputs internally, and consistently achieves higher accuracy and robustness than classical KNN — while maintaining a simple scikit-learn-style API.
Model Details
Model Description SmartKNN improves classical KNN by learning feature weights and applying a weighted Euclidean distance for neighbour selection. It performs normalization, NaN/Inf cleaning, median imputation, outlier clipping, and feature filtering internally. It exposes feature importance for transparency and explainability.
Developed by: Jashwanth Thatipamula
Model type: Weighted KNN for tabular ML
License: MIT
Language(s): Not language-dependent (numerical tabular ML)
Finetuned from model: Not applicable (original algorithm)
Model Sources
Repository: https://github.com/thatipamula-jashwanth/smart-knn
Paper (DOI): https://doi.org/10.5281/zenodo.17713746
Demo: Coming soon
Uses
Direct Use
• Regression on tabular datasets
• Classification on tabular datasets
• Interpretable ML where feature importance matters
• Real-world ML pipelines with missing values and noisy features
Downstream Use
• Research on distance-metric learning
• Explainable ML baselines
• AutoML components for tabular data
Out-of-Scope Use
• NLP, image or audio modelling
• Deep learning / GPU models
• Raw categorical datasets without encoding
Bias, Risks, and Limitations
• Instance-based prediction can be slower than tree-based models on large datasets
• Low performance on categorical-only datasets without encoding
• Requires storing full training set for inference
Recommendations Users should numerically encode categorical features before fitting SmartKNN.
How to Get Started with the Model
pip install smart-knn
import pandas as pd from smart_knn import SmartKNN
df = pd.read_csv("data.csv") X = df.drop("target", axis=1) y = df["target"]
model = SmartKNN(k=5) model.fit(X, y)
sample = X.iloc[0] pred = model.predict(sample) print(pred)
Training Details
Training Data SmartKNN is not pretrained and does not ship with training data; users train on their own dataset.
Preprocessing Performed automatically: • Normalization • NaN / Inf cleaning • Median imputation • Outlier clipping • Feature filtering via learned weights
Training Hyperparameters
• k = number of neighbors
• weight_threshold = drop features below learned importance
Evaluation
Testing Data Evaluated across 35 regression and 20 classification public tabular datasets.
Metrics
Regression: R², MSE
Classification: Accuracy
Results
• Regression: SmartKNN outperformed classical KNN on 90%+ datasets
• Classification: SmartKNN beat classical KNN on 60% of datasets
Summary
SmartKNN delivers higher accuracy, greater robustness to noise, and better interpretability than classical KNN while preserving its simplicity.
Environmental Impact
SmartKNN requires no GPU and has minimal energy usage.
Hardware Type: CPU
Hours used: Minimal
Carbon Emitted: Negligible
Technical Specifications
Model Architecture and Objective
• Instance-based learner
• Weighted Euclidean distance metric
• Learned feature weights (MSE + MI + Random Forest)
Compute Infrastructure
• Runs efficiently on CPU systems
• Implemented using NumPy
Citation
@software{smartknn2025, author = {Jashwanth Thatipamula}, title = {SmartKNN: An Interpretable Weighted Distance Framework for K-Nearest Neighbours}, year = {2025}, publisher = {Zenodo}, doi = {10.5281/zenodo.17713746}, url = {https://doi.org/10.5281/zenodo.17713746} }
Model Card Authors
Jashwanth Thatipamula
Model Card Contact Contact via GitHub issues: https://github.com/thatipamula-jashwanth/smart-knn