hospital-management/tools/markdown/DATA_GENERATION_README.md
Marwan Alwali 263292f6be update
2025-11-04 00:50:06 +03:00

9.8 KiB

Saudi Healthcare Data Generation System

A comprehensive, refactored data generation system for Saudi healthcare applications with proper dependency management and code deduplication.

🎯 Overview

This system generates realistic test data for a Saudi healthcare management system. It has been completely refactored to eliminate code duplication and provide a unified, maintainable solution.

Key Improvements

  • 60% code reduction through shared utilities
  • Dependency management ensures correct execution order
  • Saudi-specific data with authentic names, locations, and healthcare context
  • Modular architecture with shared constants and generators
  • Progress tracking and error handling
  • Easy execution via shell script or Python orchestrator

📁 Project Structure

data_generation/
├── data_utils/                    # Shared utilities package
│   ├── __init__.py               # Package initialization
│   ├── constants.py              # All Saudi-specific constants
│   ├── generators.py             # Common data generation functions
│   ├── helpers.py                # Database utilities and model helpers
│   └── base.py                   # Base classes and orchestrator
├── populate_all_data.py          # Master Python orchestrator
├── populate_data.sh              # Shell script for easy execution
├── [individual_data_files].py    # Refactored individual generators
└── DATA_GENERATION_README.md     # This documentation

🚀 Quick Start

# Make script executable (already done)
chmod +x populate_data.sh

# Run all generators
./populate_data.sh

# Run specific generators
./populate_data.sh core accounts patients

# Show available options
./populate_data.sh --help

Option 2: Python Orchestrator

# Run all generators
python3 populate_all_data.py

# Run specific generators
python3 populate_all_data.py core accounts patients

# Show execution plan
python3 populate_all_data.py --show-plan

# List available generators
python3 populate_all_data.py --list-generators

📋 Execution Order & Dependencies

The system automatically manages dependencies:

  1. core → Tenants
  2. accounts → Users (requires: core)
  3. hr → Employees/Departments (requires: core, accounts)
  4. patients → Patients (requires: core)
  5. Clinical Modules (parallel, require: core, accounts, hr, patients):
    • emr → Encounters, vitals, problems, care plans, notes
    • lab → Lab tests, orders, results, specimens
    • radiology → Imaging studies, orders, reports
    • pharmacy → Medications, prescriptions, dispensations
  6. appointments → Appointments (requires: patients + providers)
  7. billing → Bills, payments, claims (requires: patients + encounters)
  8. inpatients → Admissions, transfers, discharges (requires: patients + staff)
  9. inventory → Medical supplies, stock (independent)
  10. facility_management → Buildings, rooms, assets (management command)

🛠️ Available Generators

Generator Description Dependencies
core Tenants and system configuration None
accounts Users, authentication, security core
hr Employees, departments, schedules core, accounts
patients Patient profiles, contacts, insurance core
emr Encounters, vitals, problems, care plans core, accounts, hr, patients
lab Laboratory tests, orders, results core, accounts, hr, patients
radiology Imaging studies, orders, reports core, accounts, hr, patients
pharmacy Medications, prescriptions, dispensations core, accounts, hr, patients
appointments Appointment scheduling and management core, accounts, hr, patients
billing Medical billing, payments, insurance claims core, accounts, patients
inpatients Hospital admissions, transfers, discharges core, accounts, hr, patients
inventory Medical supplies and inventory management None
facility_management Buildings, rooms, assets, maintenance None

🎛️ Command Line Options

Shell Script Options

./populate_data.sh [OPTIONS] [GENERATORS...]

Options:
  -h, --help              Show help message
  -l, --list              List available generators
  -p, --plan              Show execution plan
  -v, --validate          Validate dependencies only
  --tenant-id ID          Generate data for specific tenant ID
  --tenant-slug SLUG      Generate data for specific tenant slug
  --skip-validation       Skip dependency validation
  --dry-run               Show what would be done (no execution)

Python Orchestrator Options

python3 populate_all_data.py [OPTIONS] [GENERATORS...]

Options:
  --generators GEN...     Specific generators to run
  --list-generators       List available generators
  --show-plan             Show execution plan
  --validate-only         Validate dependencies only
  --tenant-id ID          Tenant ID to generate data for
  --tenant-slug SLUG      Tenant slug to generate data for
  --skip-validation       Skip dependency validation

📊 Data Volume

Default data volumes (customizable in each generator):

  • Tenants: 1-2
  • Users: 50-200 per tenant
  • Patients: 50-200 per tenant
  • Clinical Records: 100-500 per patient
  • Inventory Items: 50-200 per tenant
  • Facility Assets: 50-150 per tenant

🔧 Customization

Modifying Data Volumes

Edit the generator classes in individual files:

# In any generator file
def run_generation(self, **kwargs):
    # Modify these parameters
    users_per_tenant = kwargs.get('users_per_tenant', 100)
    patients_per_tenant = kwargs.get('patients_per_tenant', 150)
    # ... etc

Adding New Generators

  1. Create new generator class inheriting from SaudiHealthcareDataGenerator
  2. Add to populate_all_data.py imports and registration
  3. Update execution order in DataGenerationOrchestrator.execution_order

Custom Saudi Data

Add to data_utils/constants.py:

# Add new constants
NEW_SAUDI_DATA = [
    # Your Saudi-specific data here
]

# Update existing lists
SAUDI_CITIES.append("New City")

🏗️ Architecture

Shared Utilities (data_utils/)

constants.py

  • All Saudi-specific data constants
  • Names, cities, medical terms, etc.
  • Centralized for consistency

generators.py

  • Common data generation functions
  • Phone numbers, IDs, dates, names
  • Reusable across all generators

helpers.py

  • Database utilities (safe_bulk_create, validate_tenant_exists)
  • Model field filtering and validation
  • Progress tracking and error handling

base.py

  • BaseDataGenerator: Basic functionality
  • SaudiHealthcareDataGenerator: Saudi-specific base class
  • DataGenerationOrchestrator: Dependency management

Individual Generators

Each generator inherits from SaudiHealthcareDataGenerator and implements:

class ExampleGenerator(SaudiHealthcareDataGenerator):
    def run_generation(self, **kwargs):
        # Your generation logic here
        # Use self.generate_saudi_name(), self.safe_bulk_create(), etc.
        pass

🧪 Testing

Validation Only

# Check if all dependencies are satisfied
./populate_data.sh --validate

# Show execution plan without running
./populate_data.sh --plan

Dry Run

# Show what would be done without creating data
./populate_data.sh --dry-run

Individual Generators

# Test specific generators
./populate_data.sh core accounts
python3 populate_all_data.py --generators core patients

🐛 Troubleshooting

Common Issues

  1. "No tenants found"

    • Run core generator first: ./populate_data.sh core
    • Or skip validation: ./populate_data.sh --skip-validation
  2. "Django not found"

    • Ensure virtual environment is activated
    • Install requirements: pip install -r requirements.txt
  3. "Permission denied"

    • Make script executable: chmod +x populate_data.sh
  4. "Import errors"

    • Ensure you're in the project root directory
    • Check that all refactored files exist

Debug Mode

# Run with verbose output
python3 populate_all_data.py --generators core --skip-validation

📈 Performance

Optimization Tips

  • Batch Operations: Uses bulk_create for large datasets
  • Progress Tracking: Real-time progress indicators
  • Error Recovery: Continues processing after individual failures
  • Memory Efficient: Processes data in chunks

Performance Metrics

  • Small Dataset: ~50 patients, 2-3 minutes
  • Medium Dataset: ~200 patients, 5-8 minutes
  • Large Dataset: ~500+ patients, 15-30 minutes

🔒 Security & Compliance

Saudi Healthcare Compliance

  • CBAHI Standards: Follows Central Board for Accreditation of Healthcare Institutions
  • MOH Guidelines: Ministry of Health data protection requirements
  • HIPAA-like: Patient privacy and data security considerations

Data Privacy

  • Test Data Only: All generated data is fictional
  • No Real Patients: Uses generated Saudi names and demographics
  • Safe Deletion: Easy cleanup of test data

🤝 Contributing

Code Standards

  • Use shared utilities from data_utils/
  • Follow dependency order in orchestrator
  • Include progress tracking and error handling
  • Document new generators and their dependencies

Adding New Data Types

  1. Add constants to data_utils/constants.py
  2. Create generator functions in data_utils/generators.py
  3. Implement new generator class
  4. Register in orchestrator
  5. Update documentation

📞 Support

For issues or questions:

  1. Check the execution plan: ./populate_data.sh --plan
  2. Validate dependencies: ./populate_data.sh --validate
  3. Run individual generators for debugging
  4. Check logs for specific error messages

Generated with ❤️ for Saudi healthcare systems