GPU-accelerated data curation for LLM training. Supports text/image/video/audio. Features fuzzy deduplication (16×...

Educational GPT implementation in ~300 lines. Reproduces GPT-2 (124M) on OpenWebText. Clean, hackable code for...

Write Python code in n8n Code nodes. Use when writing Python in n8n, using _input/_json/_node syntax, working with...

Molecular featurization for ML (100+ featurizers). ECFP, MACCS, descriptors, pretrained models (ChemBERTa), convert...

Train Mixture of Experts (MoE) models using DeepSpeed or HuggingFace. Use when training large-scale models with...

Reduce LLM size and accelerate inference using pruning techniques like Wanda and SparseGPT. Use when compressing...

Merge multiple fine-tuned models using mergekit to combine capabilities without retraining. Use when creating...

Run Python code in the cloud with serverless containers, GPUs, and autoscaling. Use when deploying ML models,...

Track ML experiments, manage model registry with versioning, deploy models to production, and reproduce experiments...

Medicinal chemistry filters. Apply drug-likeness rules (Lipinski, Veber), PAINS filters, structural alerts,...