9.8 KiB
9.8 KiB
Saudi Healthcare Data Generation System
A comprehensive, refactored data generation system for Saudi healthcare applications with proper dependency management and code deduplication.
🎯 Overview
This system generates realistic test data for a Saudi healthcare management system. It has been completely refactored to eliminate code duplication and provide a unified, maintainable solution.
Key Improvements
- 60% code reduction through shared utilities
- Dependency management ensures correct execution order
- Saudi-specific data with authentic names, locations, and healthcare context
- Modular architecture with shared constants and generators
- Progress tracking and error handling
- Easy execution via shell script or Python orchestrator
📁 Project Structure
data_generation/
├── data_utils/ # Shared utilities package
│ ├── __init__.py # Package initialization
│ ├── constants.py # All Saudi-specific constants
│ ├── generators.py # Common data generation functions
│ ├── helpers.py # Database utilities and model helpers
│ └── base.py # Base classes and orchestrator
├── populate_all_data.py # Master Python orchestrator
├── populate_data.sh # Shell script for easy execution
├── [individual_data_files].py # Refactored individual generators
└── DATA_GENERATION_README.md # This documentation
🚀 Quick Start
Option 1: Shell Script (Recommended)
# Make script executable (already done)
chmod +x populate_data.sh
# Run all generators
./populate_data.sh
# Run specific generators
./populate_data.sh core accounts patients
# Show available options
./populate_data.sh --help
Option 2: Python Orchestrator
# Run all generators
python3 populate_all_data.py
# Run specific generators
python3 populate_all_data.py core accounts patients
# Show execution plan
python3 populate_all_data.py --show-plan
# List available generators
python3 populate_all_data.py --list-generators
📋 Execution Order & Dependencies
The system automatically manages dependencies:
- core → Tenants
- accounts → Users (requires: core)
- hr → Employees/Departments (requires: core, accounts)
- patients → Patients (requires: core)
- Clinical Modules (parallel, require: core, accounts, hr, patients):
- emr → Encounters, vitals, problems, care plans, notes
- lab → Lab tests, orders, results, specimens
- radiology → Imaging studies, orders, reports
- pharmacy → Medications, prescriptions, dispensations
- appointments → Appointments (requires: patients + providers)
- billing → Bills, payments, claims (requires: patients + encounters)
- inpatients → Admissions, transfers, discharges (requires: patients + staff)
- inventory → Medical supplies, stock (independent)
- facility_management → Buildings, rooms, assets (management command)
🛠️ Available Generators
| Generator | Description | Dependencies |
|---|---|---|
core |
Tenants and system configuration | None |
accounts |
Users, authentication, security | core |
hr |
Employees, departments, schedules | core, accounts |
patients |
Patient profiles, contacts, insurance | core |
emr |
Encounters, vitals, problems, care plans | core, accounts, hr, patients |
lab |
Laboratory tests, orders, results | core, accounts, hr, patients |
radiology |
Imaging studies, orders, reports | core, accounts, hr, patients |
pharmacy |
Medications, prescriptions, dispensations | core, accounts, hr, patients |
appointments |
Appointment scheduling and management | core, accounts, hr, patients |
billing |
Medical billing, payments, insurance claims | core, accounts, patients |
inpatients |
Hospital admissions, transfers, discharges | core, accounts, hr, patients |
inventory |
Medical supplies and inventory management | None |
facility_management |
Buildings, rooms, assets, maintenance | None |
🎛️ Command Line Options
Shell Script Options
./populate_data.sh [OPTIONS] [GENERATORS...]
Options:
-h, --help Show help message
-l, --list List available generators
-p, --plan Show execution plan
-v, --validate Validate dependencies only
--tenant-id ID Generate data for specific tenant ID
--tenant-slug SLUG Generate data for specific tenant slug
--skip-validation Skip dependency validation
--dry-run Show what would be done (no execution)
Python Orchestrator Options
python3 populate_all_data.py [OPTIONS] [GENERATORS...]
Options:
--generators GEN... Specific generators to run
--list-generators List available generators
--show-plan Show execution plan
--validate-only Validate dependencies only
--tenant-id ID Tenant ID to generate data for
--tenant-slug SLUG Tenant slug to generate data for
--skip-validation Skip dependency validation
📊 Data Volume
Default data volumes (customizable in each generator):
- Tenants: 1-2
- Users: 50-200 per tenant
- Patients: 50-200 per tenant
- Clinical Records: 100-500 per patient
- Inventory Items: 50-200 per tenant
- Facility Assets: 50-150 per tenant
🔧 Customization
Modifying Data Volumes
Edit the generator classes in individual files:
# In any generator file
def run_generation(self, **kwargs):
# Modify these parameters
users_per_tenant = kwargs.get('users_per_tenant', 100)
patients_per_tenant = kwargs.get('patients_per_tenant', 150)
# ... etc
Adding New Generators
- Create new generator class inheriting from
SaudiHealthcareDataGenerator - Add to
populate_all_data.pyimports and registration - Update execution order in
DataGenerationOrchestrator.execution_order
Custom Saudi Data
Add to data_utils/constants.py:
# Add new constants
NEW_SAUDI_DATA = [
# Your Saudi-specific data here
]
# Update existing lists
SAUDI_CITIES.append("New City")
🏗️ Architecture
Shared Utilities (data_utils/)
constants.py
- All Saudi-specific data constants
- Names, cities, medical terms, etc.
- Centralized for consistency
generators.py
- Common data generation functions
- Phone numbers, IDs, dates, names
- Reusable across all generators
helpers.py
- Database utilities (
safe_bulk_create,validate_tenant_exists) - Model field filtering and validation
- Progress tracking and error handling
base.py
BaseDataGenerator: Basic functionalitySaudiHealthcareDataGenerator: Saudi-specific base classDataGenerationOrchestrator: Dependency management
Individual Generators
Each generator inherits from SaudiHealthcareDataGenerator and implements:
class ExampleGenerator(SaudiHealthcareDataGenerator):
def run_generation(self, **kwargs):
# Your generation logic here
# Use self.generate_saudi_name(), self.safe_bulk_create(), etc.
pass
🧪 Testing
Validation Only
# Check if all dependencies are satisfied
./populate_data.sh --validate
# Show execution plan without running
./populate_data.sh --plan
Dry Run
# Show what would be done without creating data
./populate_data.sh --dry-run
Individual Generators
# Test specific generators
./populate_data.sh core accounts
python3 populate_all_data.py --generators core patients
🐛 Troubleshooting
Common Issues
-
"No tenants found"
- Run core generator first:
./populate_data.sh core - Or skip validation:
./populate_data.sh --skip-validation
- Run core generator first:
-
"Django not found"
- Ensure virtual environment is activated
- Install requirements:
pip install -r requirements.txt
-
"Permission denied"
- Make script executable:
chmod +x populate_data.sh
- Make script executable:
-
"Import errors"
- Ensure you're in the project root directory
- Check that all refactored files exist
Debug Mode
# Run with verbose output
python3 populate_all_data.py --generators core --skip-validation
📈 Performance
Optimization Tips
- Batch Operations: Uses
bulk_createfor large datasets - Progress Tracking: Real-time progress indicators
- Error Recovery: Continues processing after individual failures
- Memory Efficient: Processes data in chunks
Performance Metrics
- Small Dataset: ~50 patients, 2-3 minutes
- Medium Dataset: ~200 patients, 5-8 minutes
- Large Dataset: ~500+ patients, 15-30 minutes
🔒 Security & Compliance
Saudi Healthcare Compliance
- CBAHI Standards: Follows Central Board for Accreditation of Healthcare Institutions
- MOH Guidelines: Ministry of Health data protection requirements
- HIPAA-like: Patient privacy and data security considerations
Data Privacy
- Test Data Only: All generated data is fictional
- No Real Patients: Uses generated Saudi names and demographics
- Safe Deletion: Easy cleanup of test data
🤝 Contributing
Code Standards
- Use shared utilities from
data_utils/ - Follow dependency order in orchestrator
- Include progress tracking and error handling
- Document new generators and their dependencies
Adding New Data Types
- Add constants to
data_utils/constants.py - Create generator functions in
data_utils/generators.py - Implement new generator class
- Register in orchestrator
- Update documentation
📞 Support
For issues or questions:
- Check the execution plan:
./populate_data.sh --plan - Validate dependencies:
./populate_data.sh --validate - Run individual generators for debugging
- Check logs for specific error messages
Generated with ❤️ for Saudi healthcare systems