arxiv:2511.11683

Stratified Knowledge-Density Super-Network for Scalable Vision Transformers

Published on Nov 12

Authors:

Longhua Li ,

Abstract

Transforming pre-trained Vision Transformers into stratified knowledge-density super-networks with Weighted PCA for Attention Contraction and Progressive Importance-Aware Dropout achieves efficient model size adjustment and knowledge concentration.

AI-generated summary

Training and deploying multiple vision transformer (ViT) models for different resource constraints is costly and inefficient. To address this, we propose transforming a pre-trained ViT into a stratified knowledge-density super-network, where knowledge is hierarchically organized across weights. This enables flexible extraction of sub-networks that retain maximal knowledge for varying model sizes. We introduce Weighted PCA for Attention Contraction (WPAC), which concentrates knowledge into a compact set of critical weights. WPAC applies token-wise weighted principal component analysis to intermediate features and injects the resulting transformation and inverse matrices into adjacent layers, preserving the original network function while enhancing knowledge compactness. To further promote stratified knowledge organization, we propose Progressive Importance-Aware Dropout (PIAD). PIAD progressively evaluates the importance of weight groups, updates an importance-aware dropout list, and trains the super-network under this dropout regime to promote knowledge stratification. Experiments demonstrate that WPAC outperforms existing pruning criteria in knowledge concentration, and the combination with PIAD offers a strong alternative to state-of-the-art model compression and model expansion methods.

View arXiv page View PDF GitHub 1 Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2511.11683 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2511.11683 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2511.11683 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.