Work with Obsidian vaults (plain Markdown notes) and automate via obsidian-cli.
npx skills add ramidamolis-alt/agent-skills-workflows --skill "nlp-master"
Install specific skill from multi-skill repository
# Description
Advanced NLP skill - Text classification, NER, sentiment analysis, document summarization, embeddings, and multi-language support. Use for any natural language processing task.
# SKILL.md
name: nlp-master
description: Advanced NLP skill - Text classification, NER, sentiment analysis, document summarization, embeddings, and multi-language support. Use for any natural language processing task.
triggers: ["nlp", "text", "language", "sentiment", "classify", "summarize", "translate", "ภาษา", "วิเคราะห์ข้อความ"]
📝 NLP Master Skill
Expert in Natural Language Processing with modern transformer architectures and production-ready patterns.
Capability Matrix
capabilities:
text_processing:
- tokenization: "Word, subword, character-level"
- preprocessing: "Cleaning, normalization, lemmatization"
- embeddings: "Word2Vec, FastText, BERT, OpenAI"
understanding:
- classification: "Binary, multi-class, multi-label"
- ner: "Named Entity Recognition"
- sentiment: "Positive, negative, neutral, aspect-based"
- intent: "Intent detection for chatbots"
generation:
- summarization: "Extractive, abstractive"
- translation: "Multi-language support"
- qa: "Question answering"
- chat: "Conversational AI patterns"
advanced:
- semantic_search: "Vector similarity"
- topic_modeling: "LDA, BERTopic"
- relation_extraction: "Knowledge graphs"
Text Classification Patterns
Binary Classification
from transformers import pipeline
# Quick classification with Hugging Face
classifier = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-english")
def classify_sentiment(text):
result = classifier(text)[0]
return {
"label": result["label"],
"confidence": result["score"]
}
Multi-Class Classification
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
def create_text_classifier():
"""
Traditional ML text classification pipeline
"""
return Pipeline([
('tfidf', TfidfVectorizer(
max_features=10000,
ngram_range=(1, 2),
stop_words='english'
)),
('clf', LogisticRegression(
max_iter=1000,
class_weight='balanced'
))
])
Using MCP for Classification Research
async def research_classification_approach(problem):
"""
Research best classification approach with MCP
"""
results = await asyncio.gather(
mcp_Context7_query_docs(
libraryId="/huggingface/transformers",
query="text classification best practices"
),
mcp_Brave_brave_web_search(
"text classification state of the art 2026"
),
mcp_Memory_search_nodes("text classification patterns")
)
return await mcp_UltraThink_ultrathink(
thought=f"""
Analyzing classification approaches:
- Problem: {problem}
- Documentation: {results[0]}
- Latest research: {results[1]}
- Past patterns: {results[2]}
Recommendations:
1. ...
""",
total_thoughts=15
)
Named Entity Recognition (NER)
Standard NER Pipeline
from transformers import pipeline
# Pre-trained NER
ner = pipeline("ner", model="dslim/bert-base-NER", aggregation_strategy="simple")
def extract_entities(text):
entities = ner(text)
return [
{
"text": e["word"],
"type": e["entity_group"],
"confidence": e["score"],
"start": e["start"],
"end": e["end"]
}
for e in entities
]
# Example output:
# [
# {"text": "Google", "type": "ORG", "confidence": 0.99},
# {"text": "San Francisco", "type": "LOC", "confidence": 0.98}
# ]
Custom NER Training
custom_ner:
data_format:
- format: "CoNLL"
example: |
John O
works O
at O
Google B-ORG
.O
training:
base_model: "bert-base-uncased"
fine_tuning_steps:
- load_pretrained
- add_token_classification_head
- train_on_custom_data
- evaluate_f1_score
entity_types:
common:
- PER: "Person names"
- ORG: "Organizations"
- LOC: "Locations"
- DATE: "Dates and times"
- MONEY: "Currency values"
Sentiment Analysis
Basic Sentiment
from transformers import pipeline
sentiment = pipeline("sentiment-analysis")
def analyze_sentiment(text):
result = sentiment(text)[0]
return {
"sentiment": result["label"],
"confidence": result["score"]
}
Aspect-Based Sentiment
async def aspect_sentiment(text, aspects):
"""
Analyze sentiment for specific aspects
"""
return await mcp_UltraThink_ultrathink(
thought=f"""
Analyzing aspect-based sentiment:
Text: "{text}"
Aspects to analyze: {aspects}
For each aspect:
1. Find relevant mentions
2. Determine sentiment (positive/negative/neutral)
3. Provide confidence score
4. Quote supporting text
""",
total_thoughts=10
)
Thai Sentiment (Multi-language)
from transformers import pipeline
# Thai sentiment model
thai_sentiment = pipeline(
"sentiment-analysis",
model="airesearch/wangchanberta-base-att-spm-uncased"
)
def analyze_thai_sentiment(text):
result = thai_sentiment(text)[0]
return {
"sentiment": result["label"],
"confidence": result["score"]
}
Document Summarization
Extractive Summarization
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer
def extractive_summary(text, sentence_count=3):
"""
Extract key sentences from document
"""
parser = PlaintextParser.from_string(text, Tokenizer("english"))
summarizer = LsaSummarizer()
summary = summarizer(parser.document, sentence_count)
return " ".join(str(s) for s in summary)
Abstractive Summarization
from transformers import pipeline
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
def abstractive_summary(text, max_length=130, min_length=30):
"""
Generate new summary text
"""
result = summarizer(
text,
max_length=max_length,
min_length=min_length,
do_sample=False
)
return result[0]["summary_text"]
With UltraThink for Long Documents
async def smart_summarize(document, target_length="medium"):
"""
Intelligent summarization using UltraThink
"""
return await mcp_UltraThink_ultrathink(
thought=f"""
Summarizing document:
Document length: {len(document.split())} words
Target length: {target_length}
Steps:
1. Identify main themes
2. Extract key arguments
3. Note important details
4. Synthesize coherent summary
Summary:
""",
total_thoughts=20
)
Text Embeddings
Embedding Patterns
embedding_models:
sentence_transformers:
model: "all-MiniLM-L6-v2"
dimension: 384
use_case: "Semantic search, clustering"
speed: "Fast"
openai:
model: "text-embedding-3-large"
dimension: 3072
use_case: "High quality embeddings"
speed: "API call"
e5:
model: "intfloat/e5-large-v2"
dimension: 1024
use_case: "Retrieval, similarity"
speed: "Medium"
Semantic Search Implementation
from sentence_transformers import SentenceTransformer
import numpy as np
# Load model
model = SentenceTransformer('all-MiniLM-L6-v2')
def create_embeddings(texts):
"""Create embeddings for a list of texts"""
return model.encode(texts)
def semantic_search(query, documents, top_k=5):
"""
Find most similar documents to query
"""
# Embed query and documents
query_embedding = model.encode([query])[0]
doc_embeddings = model.encode(documents)
# Calculate cosine similarity
similarities = np.dot(doc_embeddings, query_embedding) / (
np.linalg.norm(doc_embeddings, axis=1) * np.linalg.norm(query_embedding)
)
# Get top-k results
top_indices = np.argsort(similarities)[-top_k:][::-1]
return [
{"document": documents[i], "score": similarities[i]}
for i in top_indices
]
Multi-Language Support
Language Detection
from langdetect import detect, detect_langs
def detect_language(text):
"""Detect language of text"""
lang = detect(text)
probabilities = detect_langs(text)
return {
"language": lang,
"probabilities": [
{"lang": p.lang, "prob": p.prob}
for p in probabilities
]
}
Translation Patterns
from transformers import pipeline
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-th")
def translate(text, source_lang="en", target_lang="th"):
"""Translate text between languages"""
result = translator(text)
return result[0]["translation_text"]
Multilingual Models
multilingual_models:
mbert:
name: "bert-base-multilingual-cased"
languages: 104
use_case: "NER, classification across languages"
xlm_roberta:
name: "xlm-roberta-large"
languages: 100
use_case: "Cross-lingual transfer"
mt5:
name: "google/mt5-base"
languages: 101
use_case: "Multilingual text generation"
MCP Integration
Research NLP Techniques
async def research_nlp_solution(problem):
"""
Research NLP solution using all relevant MCPs
"""
results = await asyncio.gather(
# Documentation
mcp_Context7_query_docs(
libraryId="/huggingface/transformers",
query=problem
),
# Web research
mcp_Brave_brave_web_search(f"NLP {problem} 2026"),
mcp_DuckDuckGo_iask_search(problem, mode="academic"),
# Past solutions
mcp_Memory_search_nodes(f"NLP {problem}"),
# Deep notebooks
mcp_NotebookLM_search_notebooks("NLP")
)
return await mcp_UltraThink_ultrathink(
thought=f"""
Synthesizing NLP research:
- Problem: {problem}
- Documentation: {results[0][:1000]}
- Latest: {results[1][:1000]}
- Academic: {results[2][:1000]}
- Past patterns: {results[3]}
Recommended approach: ...
""",
total_thoughts=25
)
Quick Reference
Common Tasks
| Task | Quick Solution | Production Solution |
|---|---|---|
| Sentiment | pipeline("sentiment-analysis") |
Fine-tuned BERT |
| NER | pipeline("ner") |
Custom entity types |
| Summarize | pipeline("summarization") |
Multi-stage pipeline |
| Classify | TF-IDF + LogReg | Fine-tuned transformer |
| Search | BM25 | Semantic embeddings |
Model Selection Guide
model_decision_tree:
has_labeled_data:
yes:
data_size:
small: "Fine-tune small transformer"
medium: "Fine-tune BERT/RoBERTa"
large: "Train from scratch or larger model"
no:
use: "Zero-shot with large LLM or UltraThink"
latency_requirement:
strict: "DistilBERT, TinyBERT"
moderate: "BERT-base"
relaxed: "BERT-large, GPT"
Related Skills
ml-pipeline: General ML trainingomega-agent: Complex NLP workflowsknowledge-graph: Entity relationship mappingprompt-master: LLM prompt optimization
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.