minminminminchew

pandas-expert

0
0
# Install this skill:
npx skills add minminminminchew/my-claude-skills

Or install specific skill: npx add-skill https://github.com/minminminminchew/my-claude-skills

# Description

Expert guidance for pandas data manipulation and analysis in Python. Use when working with DataFrames, Series, data cleaning, transformation, aggregation, merging/joining datasets, time series analysis, or any tabular data operations. Triggers on pandas imports, DataFrame operations, CSV/Excel file processing, data wrangling tasks, or questions about efficient pandas patterns.

# SKILL.md


name: pandas-expert
description: Expert guidance for pandas data manipulation and analysis in Python. Use when working with DataFrames, Series, data cleaning, transformation, aggregation, merging/joining datasets, time series analysis, or any tabular data operations. Triggers on pandas imports, DataFrame operations, CSV/Excel file processing, data wrangling tasks, or questions about efficient pandas patterns.


Pandas Expert

Expert guidance for pandas data manipulation and analysis.

Quick Start

import pandas as pd

# Read data
df = pd.read_csv('data.csv')
df = pd.read_excel('data.xlsx', sheet_name='Sheet1')

# Basic inspection
df.head()
df.info()
df.describe()

Core Operations

Selection & Filtering

# Column selection
df['col']           # Single column (Series)
df[['col1', 'col2']] # Multiple columns (DataFrame)

# Row filtering
df[df['col'] > 10]
df.query('col > 10 and other_col == "value"')
df.loc[df['col'].isin(['a', 'b'])]

# loc vs iloc
df.loc[row_labels, col_labels]  # Label-based
df.iloc[row_idx, col_idx]       # Integer position-based

Data Cleaning

# Missing values
df.isna().sum()
df.dropna(subset=['col'])
df.fillna({'col': 0, 'other': 'unknown'})

# Duplicates
df.drop_duplicates(subset=['key_col'])

# Type conversion
df['col'] = df['col'].astype('int64')
df['date'] = pd.to_datetime(df['date'])

Transformation

# Apply functions
df['new'] = df['col'].apply(lambda x: x * 2)
df['new'] = df.apply(lambda row: row['a'] + row['b'], axis=1)

# String operations
df['col'].str.lower()
df['col'].str.contains('pattern', regex=True)
df['col'].str.split('_', expand=True)

# Mapping values
df['col'].map({'old': 'new'})
df['col'].replace({'old': 'new'})

Aggregation

# GroupBy
df.groupby('category')['value'].sum()
df.groupby(['cat1', 'cat2']).agg({'val1': 'sum', 'val2': 'mean'})

# Pivot tables
pd.pivot_table(df, values='val', index='row', columns='col', aggfunc='sum')

# Crosstab
pd.crosstab(df['a'], df['b'], margins=True)

Merging & Joining

# Merge (SQL-style joins)
pd.merge(df1, df2, on='key')
pd.merge(df1, df2, left_on='id', right_on='user_id', how='left')

# Concatenate
pd.concat([df1, df2], ignore_index=True)  # Vertical stack
pd.concat([df1, df2], axis=1)             # Horizontal stack

Performance Tips

  1. Use vectorized operations - Avoid loops, use built-in pandas/numpy methods
  2. Filter early - Reduce DataFrame size before complex operations
  3. Use appropriate dtypes - category for low-cardinality strings, int32 vs int64
  4. Chunk large files - pd.read_csv('file.csv', chunksize=10000)
  5. Use query() for complex filters - Often faster than boolean indexing

Common Patterns

Read multiple files

from pathlib import Path
dfs = [pd.read_csv(f) for f in Path('data/').glob('*.csv')]
combined = pd.concat(dfs, ignore_index=True)

Safe column renaming

df.columns = df.columns.str.lower().str.replace(' ', '_')

Memory-efficient reading

df = pd.read_csv('large.csv', usecols=['col1', 'col2'], dtype={'col1': 'int32'})

References

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.