ramidamolis-alt

nlp-master

0
0
# Install this skill:
npx skills add ramidamolis-alt/agent-skills-workflows --skill "nlp-master"

Install specific skill from multi-skill repository

# Description

Advanced NLP skill - Text classification, NER, sentiment analysis, document summarization, embeddings, and multi-language support. Use for any natural language processing task.

# SKILL.md


name: nlp-master
description: Advanced NLP skill - Text classification, NER, sentiment analysis, document summarization, embeddings, and multi-language support. Use for any natural language processing task.
triggers: ["nlp", "text", "language", "sentiment", "classify", "summarize", "translate", "ภาษา", "วิเคราะห์ข้อความ"]


📝 NLP Master Skill

Expert in Natural Language Processing with modern transformer architectures and production-ready patterns.


Capability Matrix

capabilities:
  text_processing:
    - tokenization: "Word, subword, character-level"
    - preprocessing: "Cleaning, normalization, lemmatization"
    - embeddings: "Word2Vec, FastText, BERT, OpenAI"

  understanding:
    - classification: "Binary, multi-class, multi-label"
    - ner: "Named Entity Recognition"
    - sentiment: "Positive, negative, neutral, aspect-based"
    - intent: "Intent detection for chatbots"

  generation:
    - summarization: "Extractive, abstractive"
    - translation: "Multi-language support"
    - qa: "Question answering"
    - chat: "Conversational AI patterns"

  advanced:
    - semantic_search: "Vector similarity"
    - topic_modeling: "LDA, BERTopic"
    - relation_extraction: "Knowledge graphs"

Text Classification Patterns

Binary Classification

from transformers import pipeline

# Quick classification with Hugging Face
classifier = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-english")

def classify_sentiment(text):
    result = classifier(text)[0]
    return {
        "label": result["label"],
        "confidence": result["score"]
    }

Multi-Class Classification

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline

def create_text_classifier():
    """
    Traditional ML text classification pipeline
    """
    return Pipeline([
        ('tfidf', TfidfVectorizer(
            max_features=10000,
            ngram_range=(1, 2),
            stop_words='english'
        )),
        ('clf', LogisticRegression(
            max_iter=1000,
            class_weight='balanced'
        ))
    ])

Using MCP for Classification Research

async def research_classification_approach(problem):
    """
    Research best classification approach with MCP
    """
    results = await asyncio.gather(
        mcp_Context7_query_docs(
            libraryId="/huggingface/transformers",
            query="text classification best practices"
        ),
        mcp_Brave_brave_web_search(
            "text classification state of the art 2026"
        ),
        mcp_Memory_search_nodes("text classification patterns")
    )

    return await mcp_UltraThink_ultrathink(
        thought=f"""
        Analyzing classification approaches:
        - Problem: {problem}
        - Documentation: {results[0]}
        - Latest research: {results[1]}
        - Past patterns: {results[2]}

        Recommendations:
        1. ...
        """,
        total_thoughts=15
    )

Named Entity Recognition (NER)

Standard NER Pipeline

from transformers import pipeline

# Pre-trained NER
ner = pipeline("ner", model="dslim/bert-base-NER", aggregation_strategy="simple")

def extract_entities(text):
    entities = ner(text)
    return [
        {
            "text": e["word"],
            "type": e["entity_group"],
            "confidence": e["score"],
            "start": e["start"],
            "end": e["end"]
        }
        for e in entities
    ]

# Example output:
# [
#   {"text": "Google", "type": "ORG", "confidence": 0.99},
#   {"text": "San Francisco", "type": "LOC", "confidence": 0.98}
# ]

Custom NER Training

custom_ner:
  data_format:
    - format: "CoNLL"
      example: |
        John O
        works O
        at O
        Google B-ORG
        .O

  training:
    base_model: "bert-base-uncased"
    fine_tuning_steps:
      - load_pretrained
      - add_token_classification_head
      - train_on_custom_data
      - evaluate_f1_score

  entity_types:
    common:
      - PER: "Person names"
      - ORG: "Organizations"
      - LOC: "Locations"
      - DATE: "Dates and times"
      - MONEY: "Currency values"

Sentiment Analysis

Basic Sentiment

from transformers import pipeline

sentiment = pipeline("sentiment-analysis")

def analyze_sentiment(text):
    result = sentiment(text)[0]
    return {
        "sentiment": result["label"],
        "confidence": result["score"]
    }

Aspect-Based Sentiment

async def aspect_sentiment(text, aspects):
    """
    Analyze sentiment for specific aspects
    """
    return await mcp_UltraThink_ultrathink(
        thought=f"""
        Analyzing aspect-based sentiment:

        Text: "{text}"
        Aspects to analyze: {aspects}

        For each aspect:
        1. Find relevant mentions
        2. Determine sentiment (positive/negative/neutral)
        3. Provide confidence score
        4. Quote supporting text
        """,
        total_thoughts=10
    )

Thai Sentiment (Multi-language)

from transformers import pipeline

# Thai sentiment model
thai_sentiment = pipeline(
    "sentiment-analysis",
    model="airesearch/wangchanberta-base-att-spm-uncased"
)

def analyze_thai_sentiment(text):
    result = thai_sentiment(text)[0]
    return {
        "sentiment": result["label"],
        "confidence": result["score"]
    }

Document Summarization

Extractive Summarization

from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer

def extractive_summary(text, sentence_count=3):
    """
    Extract key sentences from document
    """
    parser = PlaintextParser.from_string(text, Tokenizer("english"))
    summarizer = LsaSummarizer()

    summary = summarizer(parser.document, sentence_count)
    return " ".join(str(s) for s in summary)

Abstractive Summarization

from transformers import pipeline

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

def abstractive_summary(text, max_length=130, min_length=30):
    """
    Generate new summary text
    """
    result = summarizer(
        text,
        max_length=max_length,
        min_length=min_length,
        do_sample=False
    )
    return result[0]["summary_text"]

With UltraThink for Long Documents

async def smart_summarize(document, target_length="medium"):
    """
    Intelligent summarization using UltraThink
    """
    return await mcp_UltraThink_ultrathink(
        thought=f"""
        Summarizing document:

        Document length: {len(document.split())} words
        Target length: {target_length}

        Steps:
        1. Identify main themes
        2. Extract key arguments
        3. Note important details
        4. Synthesize coherent summary

        Summary:
        """,
        total_thoughts=20
    )

Text Embeddings

Embedding Patterns

embedding_models:
  sentence_transformers:
    model: "all-MiniLM-L6-v2"
    dimension: 384
    use_case: "Semantic search, clustering"
    speed: "Fast"

  openai:
    model: "text-embedding-3-large"
    dimension: 3072
    use_case: "High quality embeddings"
    speed: "API call"

  e5:
    model: "intfloat/e5-large-v2"
    dimension: 1024
    use_case: "Retrieval, similarity"
    speed: "Medium"

Semantic Search Implementation

from sentence_transformers import SentenceTransformer
import numpy as np

# Load model
model = SentenceTransformer('all-MiniLM-L6-v2')

def create_embeddings(texts):
    """Create embeddings for a list of texts"""
    return model.encode(texts)

def semantic_search(query, documents, top_k=5):
    """
    Find most similar documents to query
    """
    # Embed query and documents
    query_embedding = model.encode([query])[0]
    doc_embeddings = model.encode(documents)

    # Calculate cosine similarity
    similarities = np.dot(doc_embeddings, query_embedding) / (
        np.linalg.norm(doc_embeddings, axis=1) * np.linalg.norm(query_embedding)
    )

    # Get top-k results
    top_indices = np.argsort(similarities)[-top_k:][::-1]

    return [
        {"document": documents[i], "score": similarities[i]}
        for i in top_indices
    ]

Multi-Language Support

Language Detection

from langdetect import detect, detect_langs

def detect_language(text):
    """Detect language of text"""
    lang = detect(text)
    probabilities = detect_langs(text)
    return {
        "language": lang,
        "probabilities": [
            {"lang": p.lang, "prob": p.prob}
            for p in probabilities
        ]
    }

Translation Patterns

from transformers import pipeline

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-th")

def translate(text, source_lang="en", target_lang="th"):
    """Translate text between languages"""
    result = translator(text)
    return result[0]["translation_text"]

Multilingual Models

multilingual_models:
  mbert:
    name: "bert-base-multilingual-cased"
    languages: 104
    use_case: "NER, classification across languages"

  xlm_roberta:
    name: "xlm-roberta-large"
    languages: 100
    use_case: "Cross-lingual transfer"

  mt5:
    name: "google/mt5-base"
    languages: 101
    use_case: "Multilingual text generation"

MCP Integration

Research NLP Techniques

async def research_nlp_solution(problem):
    """
    Research NLP solution using all relevant MCPs
    """
    results = await asyncio.gather(
        # Documentation
        mcp_Context7_query_docs(
            libraryId="/huggingface/transformers",
            query=problem
        ),

        # Web research
        mcp_Brave_brave_web_search(f"NLP {problem} 2026"),
        mcp_DuckDuckGo_iask_search(problem, mode="academic"),

        # Past solutions
        mcp_Memory_search_nodes(f"NLP {problem}"),

        # Deep notebooks
        mcp_NotebookLM_search_notebooks("NLP")
    )

    return await mcp_UltraThink_ultrathink(
        thought=f"""
        Synthesizing NLP research:
        - Problem: {problem}
        - Documentation: {results[0][:1000]}
        - Latest: {results[1][:1000]}
        - Academic: {results[2][:1000]}
        - Past patterns: {results[3]}

        Recommended approach: ...
        """,
        total_thoughts=25
    )

Quick Reference

Common Tasks

Task Quick Solution Production Solution
Sentiment pipeline("sentiment-analysis") Fine-tuned BERT
NER pipeline("ner") Custom entity types
Summarize pipeline("summarization") Multi-stage pipeline
Classify TF-IDF + LogReg Fine-tuned transformer
Search BM25 Semantic embeddings

Model Selection Guide

model_decision_tree:
  has_labeled_data:
    yes:
      data_size:
        small: "Fine-tune small transformer"
        medium: "Fine-tune BERT/RoBERTa"
        large: "Train from scratch or larger model"
    no:
      use: "Zero-shot with large LLM or UltraThink"

  latency_requirement:
    strict: "DistilBERT, TinyBERT"
    moderate: "BERT-base"
    relaxed: "BERT-large, GPT"

  • ml-pipeline: General ML training
  • omega-agent: Complex NLP workflows
  • knowledge-graph: Entity relationship mapping
  • prompt-master: LLM prompt optimization

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.