Phani Srikanth's picture

Phani Srikanth

binga

·

https://www.kaggle.com/phanisrikanth

AI & ML interests

Classical ML, NLP, LLMs, Security for ML

Recent Activity

liked a dataset 21 days ago

nvidia/Nemotron-PII

liked a dataset 21 days ago

nvidia/Nemotron-VLM-Dataset-v2

liked a model 25 days ago

moonshotai/Kimi-K2-Instruct-0905

View all activity

Organizations

upvoted 2 collections about 2 months ago

GLiNER-PII

PII detection models developed in collaboration with Wordcab • 5 items • Updated Sep 24 • 21

NeMo Curator - Classifier Models

Classifier models that can be used in NeMo Curator for labelling/filtering datasets. • 11 items • Updated about 6 hours ago • 24

upvoted 2 collections 2 months ago

Granite Docling

5 items • Updated 1 day ago • 58

Tiny Language Model Datasets

Collection of Synthetic Datasets that can be used in pretraining of any the Tiny Language Model • 14 items • Updated Sep 21 • 29

upvoted an article 2 months ago

Article

Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers

Sep 11

•

163

upvoted a paper 2 months ago

GLiNER2: An Efficient Multi-Task Information Extraction System with Schema-Driven Interface

Paper • 2507.18546 • Published Jul 24 • 25

upvoted a collection 4 months ago

SauerkrautLM-Multilingual-(Reason)-ColBERT

SauerkrautLM ColBERT is a suite of Late-Interaction retrieval models built with PyLate’s ColBERT architecture and tuned for seven European languages. • 7 items • Updated Aug 3 • 18

upvoted 2 articles 4 months ago

Article

Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face

Jul 29

•

198

Article

The 4 Things Qwen-3’s Chat Template Teaches Us

Apr 30

•

76

upvoted 2 collections 4 months ago

EuroBERT

Scaling Multilingual Encoders for European Languages • 4 items • Updated Mar 10 • 13

OpenReasoning-Nemotron

Collection of models for OpenReasoning-Nemotron which are trained on 5M reasoning traces for Math, Code and Science. • 6 items • Updated about 6 hours ago • 44

upvoted a paper 4 months ago

Apple Intelligence Foundation Language Models

Paper • 2407.21075 • Published Jul 29, 2024 • 5

upvoted an article 4 months ago

Article

FineWeb-C: A Community-Driven Dataset for Educational Quality Annotations in 122 Languages

Jul 8

•

32

upvoted a collection 10 months ago

ModernBERT

Bringing BERT into modernity via both architecture changes and scaling • 3 items • Updated Dec 19, 2024 • 151

upvoted an article 10 months ago

Article

Finally, a Replacement for BERT: Introducing ModernBERT

Dec 19, 2024

•

707

upvoted 2 papers 10 months ago

MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents

Paper • 2501.08828 • Published Jan 15 • 31

Scaling Synthetic Data Creation with 1,000,000,000 Personas

Paper • 2406.20094 • Published Jun 28, 2024 • 104

upvoted an article about 1 year ago

Article

How to build a custom text classifier without days of human labeling

Oct 17, 2024

•

56

upvoted a paper about 1 year ago

Unifying Multimodal Retrieval via Document Screenshot Embedding

Paper • 2406.11251 • Published Jun 17, 2024 • 10

upvoted an article over 1 year ago

Article

Docmatix - a huge dataset for Document Visual Question Answering

Jul 18, 2024

•

78