Manage Apple Reminders via the `remindctl` CLI on macOS (list, add, edit, complete, delete)....
npx skills add Mindrally/skills --skill "pandas-best-practices"
Install specific skill from multi-skill repository
# Description
Best practices for Pandas data manipulation, analysis, and DataFrame operations in Python
# SKILL.md
name: pandas-best-practices
description: Best practices for Pandas data manipulation, analysis, and DataFrame operations in Python
Pandas Best Practices
Expert guidelines for Pandas development, focusing on data manipulation, analysis, and efficient DataFrame operations.
Code Style and Structure
- Write concise, technical responses with accurate Python examples
- Prioritize reproducibility in data analysis workflows
- Use functional programming; avoid unnecessary classes
- Prefer vectorized operations over explicit loops
- Use descriptive variable names reflecting data content
- Follow PEP 8 style guidelines
DataFrame Creation and I/O
- Use
pd.read_csv(),pd.read_excel(),pd.read_json()with appropriate parameters - Specify
dtypeparameter to ensure correct data types on load - Use
parse_datesfor automatic datetime parsing - Set
index_colwhen the data has a natural index column - Use
chunksizefor reading large files incrementally
Data Selection
- Use
.loc[]for label-based indexing - Use
.iloc[]for integer position-based indexing - Avoid chained indexing (e.g.,
df['col'][0]) - use.locor.ilocinstead - Use boolean indexing for conditional selection:
df[df['col'] > value] - Use
.query()method for complex filtering conditions
Method Chaining
- Prefer method chaining for data transformations when possible
- Use
.pipe()for applying custom functions in a chain - Chain operations like
.assign(),.query(),.groupby(),.agg() - Keep chains readable by breaking across multiple lines
Data Cleaning and Validation
Missing Data
- Check for missing data with
.isna()and.info() - Handle missing data appropriately:
.fillna(),.dropna(), or imputation - Use
pd.NAfor nullable integer and boolean types - Document decisions about missing data handling
Data Quality Checks
- Implement data quality checks at the beginning of analysis
- Validate data types with
.dtypesand convert as needed - Check for duplicates with
.duplicated()and handle appropriately - Use
.describe()for quick statistical overview
Type Conversion
- Use
.astype()for explicit type conversion - Use
pd.to_datetime()for date parsing - Use
pd.to_numeric()witherrors='coerce'for safe numeric conversion - Utilize categorical data types for low-cardinality string columns
Grouping and Aggregation
GroupBy Operations
- Use
.groupby()for efficient aggregation operations - Specify aggregation functions with
.agg()for multiple operations - Use named aggregation for clearer output column names
- Consider
.transform()for broadcasting results back to original shape
Pivot Tables and Reshaping
- Use
.pivot_table()for multi-dimensional aggregation - Use
.melt()to convert wide to long format - Use
.pivot()to convert long to wide format - Use
.stack()and.unstack()for hierarchical index manipulation
Performance Optimization
Memory Efficiency
- Use categorical data types for low-cardinality strings
- Downcast numeric types when appropriate
- Use
pd.eval()and.eval()for large expression evaluation
Computation Speed
- Use vectorized operations instead of
.apply()with row-wise functions - Prefer built-in aggregation functions over custom ones
- Use
.valuesor.to_numpy()for NumPy operations when faster
Avoiding Common Pitfalls
- Avoid iterating with
.iterrows()- use vectorized operations - Don't modify DataFrames while iterating
- Be aware of SettingWithCopyWarning - use
.copy()when needed - Avoid growing DataFrames row by row - collect in list and create once
Time Series Operations
- Use
DatetimeIndexfor time series data - Leverage
.resample()for time-based aggregation - Use
.shift()and.diff()for lag operations - Use
.rolling()and.expanding()for window calculations
Merging and Joining
- Use
.merge()for SQL-style joins - Specify
howparameter: 'inner', 'outer', 'left', 'right' - Use
validateparameter to check join cardinality - Use
.concat()for stacking DataFrames
Key Conventions
- Import as
import pandas as pd - Use
snake_casefor column names when possible - Document data sources and transformations
- Keep notebooks reproducible with clear cell execution order
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.