jackspace

dnanexus-integration

8
2
# Install this skill:
npx skills add jackspace/ClaudeSkillz --skill "dnanexus-integration"

Install specific skill from multi-skill repository

# Description

DNAnexus cloud genomics platform. Build apps/applets, manage data (upload/download), dxpy Python SDK, run workflows, FASTQ/BAM/VCF, for genomics pipeline development and execution.

# SKILL.md


name: dnanexus-integration
description: "DNAnexus cloud genomics platform. Build apps/applets, manage data (upload/download), dxpy Python SDK, run workflows, FASTQ/BAM/VCF, for genomics pipeline development and execution."


DNAnexus Integration

Overview

DNAnexus is a cloud platform for biomedical data analysis and genomics. Build and deploy apps/applets, manage data objects, run workflows, and use the dxpy Python SDK for genomics pipeline development and execution.

When to Use This Skill

This skill should be used when:
- Creating, building, or modifying DNAnexus apps/applets
- Uploading, downloading, searching, or organizing files and records
- Running analyses, monitoring jobs, creating workflows
- Writing scripts using dxpy to interact with the platform
- Setting up dxapp.json, managing dependencies, using Docker
- Processing FASTQ, BAM, VCF, or other bioinformatics files
- Managing projects, permissions, or platform resources

Core Capabilities

The skill is organized into five main areas, each with detailed reference documentation:

1. App Development

Purpose: Create executable programs (apps/applets) that run on the DNAnexus platform.

Key Operations:
- Generate app skeleton with dx-app-wizard
- Write Python or Bash apps with proper entry points
- Handle input/output data objects
- Deploy with dx build or dx build --app
- Test apps on the platform

Common Use Cases:
- Bioinformatics pipelines (alignment, variant calling)
- Data processing workflows
- Quality control and filtering
- Format conversion tools

Reference: See references/app-development.md for:
- Complete app structure and patterns
- Python entry point decorators
- Input/output handling with dxpy
- Development best practices
- Common issues and solutions

2. Data Operations

Purpose: Manage files, records, and other data objects on the platform.

Key Operations:
- Upload/download files with dxpy.upload_local_file() and dxpy.download_dxfile()
- Create and manage records with metadata
- Search for data objects by name, properties, or type
- Clone data between projects
- Manage project folders and permissions

Common Use Cases:
- Uploading sequencing data (FASTQ files)
- Organizing analysis results
- Searching for specific samples or experiments
- Backing up data across projects
- Managing reference genomes and annotations

Reference: See references/data-operations.md for:
- Complete file and record operations
- Data object lifecycle (open/closed states)
- Search and discovery patterns
- Project management
- Batch operations

3. Job Execution

Purpose: Run analyses, monitor execution, and orchestrate workflows.

Key Operations:
- Launch jobs with applet.run() or app.run()
- Monitor job status and logs
- Create subjobs for parallel processing
- Build and run multi-step workflows
- Chain jobs with output references

Common Use Cases:
- Running genomics analyses on sequencing data
- Parallel processing of multiple samples
- Multi-step analysis pipelines
- Monitoring long-running computations
- Debugging failed jobs

Reference: See references/job-execution.md for:
- Complete job lifecycle and states
- Workflow creation and orchestration
- Parallel execution patterns
- Job monitoring and debugging
- Resource management

4. Python SDK (dxpy)

Purpose: Programmatic access to DNAnexus platform through Python.

Key Operations:
- Work with data object handlers (DXFile, DXRecord, DXApplet, etc.)
- Use high-level functions for common tasks
- Make direct API calls for advanced operations
- Create links and references between objects
- Search and discover platform resources

Common Use Cases:
- Automation scripts for data management
- Custom analysis pipelines
- Batch processing workflows
- Integration with external tools
- Data migration and organization

Reference: See references/python-sdk.md for:
- Complete dxpy class reference
- High-level utility functions
- API method documentation
- Error handling patterns
- Common code patterns

5. Configuration and Dependencies

Purpose: Configure app metadata and manage dependencies.

Key Operations:
- Write dxapp.json with inputs, outputs, and run specs
- Install system packages (execDepends)
- Bundle custom tools and resources
- Use assets for shared dependencies
- Integrate Docker containers
- Configure instance types and timeouts

Common Use Cases:
- Defining app input/output specifications
- Installing bioinformatics tools (samtools, bwa, etc.)
- Managing Python package dependencies
- Using Docker images for complex environments
- Selecting computational resources

Reference: See references/configuration.md for:
- Complete dxapp.json specification
- Dependency management strategies
- Docker integration patterns
- Regional and resource configuration
- Example configurations

Quick Start Examples

Upload and Analyze Data

import dxpy

# Upload input file
input_file = dxpy.upload_local_file("sample.fastq", project="project-xxxx")

# Run analysis
job = dxpy.DXApplet("applet-xxxx").run({
    "reads": dxpy.dxlink(input_file.get_id())
})

# Wait for completion
job.wait_on_done()

# Download results
output_id = job.describe()["output"]["aligned_reads"]["$dnanexus_link"]
dxpy.download_dxfile(output_id, "aligned.bam")

Search and Download Files

import dxpy

# Find BAM files from a specific experiment
files = dxpy.find_data_objects(
    classname="file",
    name="*.bam",
    properties={"experiment": "exp001"},
    project="project-xxxx"
)

# Download each file
for file_result in files:
    file_obj = dxpy.DXFile(file_result["id"])
    filename = file_obj.describe()["name"]
    dxpy.download_dxfile(file_result["id"], filename)

Create Simple App

# src/my-app.py
import dxpy
import subprocess

@dxpy.entry_point('main')
def main(input_file, quality_threshold=30):
    # Download input
    dxpy.download_dxfile(input_file["$dnanexus_link"], "input.fastq")

    # Process
    subprocess.check_call([
        "quality_filter",
        "--input", "input.fastq",
        "--output", "filtered.fastq",
        "--threshold", str(quality_threshold)
    ])

    # Upload output
    output_file = dxpy.upload_local_file("filtered.fastq")

    return {
        "filtered_reads": dxpy.dxlink(output_file)
    }

dxpy.run()

Workflow Decision Tree

When working with DNAnexus, follow this decision tree:

  1. Need to create a new executable?
  2. Yes β†’ Use App Development (references/app-development.md)
  3. No β†’ Continue to step 2

  4. Need to manage files or data?

  5. Yes β†’ Use Data Operations (references/data-operations.md)
  6. No β†’ Continue to step 3

  7. Need to run an analysis or workflow?

  8. Yes β†’ Use Job Execution (references/job-execution.md)
  9. No β†’ Continue to step 4

  10. Writing Python scripts for automation?

  11. Yes β†’ Use Python SDK (references/python-sdk.md)
  12. No β†’ Continue to step 5

  13. Configuring app settings or dependencies?

  14. Yes β†’ Use Configuration (references/configuration.md)

Often you'll need multiple capabilities together (e.g., app development + configuration, or data operations + job execution).

Installation and Authentication

Install dxpy

pip install dxpy

Login to DNAnexus

dx login

This authenticates your session and sets up access to projects and data.

Verify Installation

dx --version
dx whoami

Common Patterns

Pattern 1: Batch Processing

Process multiple files with the same analysis:

# Find all FASTQ files
files = dxpy.find_data_objects(
    classname="file",
    name="*.fastq",
    project="project-xxxx"
)

# Launch parallel jobs
jobs = []
for file_result in files:
    job = dxpy.DXApplet("applet-xxxx").run({
        "input": dxpy.dxlink(file_result["id"])
    })
    jobs.append(job)

# Wait for all completions
for job in jobs:
    job.wait_on_done()

Pattern 2: Multi-Step Pipeline

Chain multiple analyses together:

# Step 1: Quality control
qc_job = qc_applet.run({"reads": input_file})

# Step 2: Alignment (uses QC output)
align_job = align_applet.run({
    "reads": qc_job.get_output_ref("filtered_reads")
})

# Step 3: Variant calling (uses alignment output)
variant_job = variant_applet.run({
    "bam": align_job.get_output_ref("aligned_bam")
})

Pattern 3: Data Organization

Organize analysis results systematically:

# Create organized folder structure
dxpy.api.project_new_folder(
    "project-xxxx",
    {"folder": "/experiments/exp001/results", "parents": True}
)

# Upload with metadata
result_file = dxpy.upload_local_file(
    "results.txt",
    project="project-xxxx",
    folder="/experiments/exp001/results",
    properties={
        "experiment": "exp001",
        "sample": "sample1",
        "analysis_date": "2025-10-20"
    },
    tags=["validated", "published"]
)

Best Practices

  1. Error Handling: Always wrap API calls in try-except blocks
  2. Resource Management: Choose appropriate instance types for workloads
  3. Data Organization: Use consistent folder structures and metadata
  4. Cost Optimization: Archive old data, use appropriate storage classes
  5. Documentation: Include clear descriptions in dxapp.json
  6. Testing: Test apps with various input types before production use
  7. Version Control: Use semantic versioning for apps
  8. Security: Never hardcode credentials in source code
  9. Logging: Include informative log messages for debugging
  10. Cleanup: Remove temporary files and failed jobs

Resources

This skill includes detailed reference documentation:

references/

  • app-development.md - Complete guide to building and deploying apps/applets
  • data-operations.md - File management, records, search, and project operations
  • job-execution.md - Running jobs, workflows, monitoring, and parallel processing
  • python-sdk.md - Comprehensive dxpy library reference with all classes and functions
  • configuration.md - dxapp.json specification and dependency management

Load these references when you need detailed information about specific operations or when working on complex tasks.

Getting Help

  • Official documentation: https://documentation.dnanexus.com/
  • API reference: http://autodoc.dnanexus.com/
  • GitHub repository: https://github.com/dnanexus/dx-toolkit
  • Support: [email protected]

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.