308 lines
9.8 KiB
Markdown
308 lines
9.8 KiB
Markdown
# Saudi Healthcare Data Generation System
|
|
|
|
A comprehensive, refactored data generation system for Saudi healthcare applications with proper dependency management and code deduplication.
|
|
|
|
## 🎯 Overview
|
|
|
|
This system generates realistic test data for a Saudi healthcare management system. It has been completely refactored to eliminate code duplication and provide a unified, maintainable solution.
|
|
|
|
### Key Improvements
|
|
- **60% code reduction** through shared utilities
|
|
- **Dependency management** ensures correct execution order
|
|
- **Saudi-specific data** with authentic names, locations, and healthcare context
|
|
- **Modular architecture** with shared constants and generators
|
|
- **Progress tracking** and error handling
|
|
- **Easy execution** via shell script or Python orchestrator
|
|
|
|
## 📁 Project Structure
|
|
|
|
```
|
|
data_generation/
|
|
├── data_utils/ # Shared utilities package
|
|
│ ├── __init__.py # Package initialization
|
|
│ ├── constants.py # All Saudi-specific constants
|
|
│ ├── generators.py # Common data generation functions
|
|
│ ├── helpers.py # Database utilities and model helpers
|
|
│ └── base.py # Base classes and orchestrator
|
|
├── populate_all_data.py # Master Python orchestrator
|
|
├── populate_data.sh # Shell script for easy execution
|
|
├── [individual_data_files].py # Refactored individual generators
|
|
└── DATA_GENERATION_README.md # This documentation
|
|
```
|
|
|
|
## 🚀 Quick Start
|
|
|
|
### Option 1: Shell Script (Recommended)
|
|
```bash
|
|
# Make script executable (already done)
|
|
chmod +x populate_data.sh
|
|
|
|
# Run all generators
|
|
./populate_data.sh
|
|
|
|
# Run specific generators
|
|
./populate_data.sh core accounts patients
|
|
|
|
# Show available options
|
|
./populate_data.sh --help
|
|
```
|
|
|
|
### Option 2: Python Orchestrator
|
|
```bash
|
|
# Run all generators
|
|
python3 populate_all_data.py
|
|
|
|
# Run specific generators
|
|
python3 populate_all_data.py core accounts patients
|
|
|
|
# Show execution plan
|
|
python3 populate_all_data.py --show-plan
|
|
|
|
# List available generators
|
|
python3 populate_all_data.py --list-generators
|
|
```
|
|
|
|
## 📋 Execution Order & Dependencies
|
|
|
|
The system automatically manages dependencies:
|
|
|
|
1. **core** → Tenants
|
|
2. **accounts** → Users (requires: core)
|
|
3. **hr** → Employees/Departments (requires: core, accounts)
|
|
4. **patients** → Patients (requires: core)
|
|
5. **Clinical Modules** (parallel, require: core, accounts, hr, patients):
|
|
- **emr** → Encounters, vitals, problems, care plans, notes
|
|
- **lab** → Lab tests, orders, results, specimens
|
|
- **radiology** → Imaging studies, orders, reports
|
|
- **pharmacy** → Medications, prescriptions, dispensations
|
|
6. **appointments** → Appointments (requires: patients + providers)
|
|
7. **billing** → Bills, payments, claims (requires: patients + encounters)
|
|
8. **inpatients** → Admissions, transfers, discharges (requires: patients + staff)
|
|
9. **inventory** → Medical supplies, stock (independent)
|
|
10. **facility_management** → Buildings, rooms, assets (management command)
|
|
|
|
## 🛠️ Available Generators
|
|
|
|
| Generator | Description | Dependencies |
|
|
|-----------|-------------|--------------|
|
|
| `core` | Tenants and system configuration | None |
|
|
| `accounts` | Users, authentication, security | core |
|
|
| `hr` | Employees, departments, schedules | core, accounts |
|
|
| `patients` | Patient profiles, contacts, insurance | core |
|
|
| `emr` | Encounters, vitals, problems, care plans | core, accounts, hr, patients |
|
|
| `lab` | Laboratory tests, orders, results | core, accounts, hr, patients |
|
|
| `radiology` | Imaging studies, orders, reports | core, accounts, hr, patients |
|
|
| `pharmacy` | Medications, prescriptions, dispensations | core, accounts, hr, patients |
|
|
| `appointments` | Appointment scheduling and management | core, accounts, hr, patients |
|
|
| `billing` | Medical billing, payments, insurance claims | core, accounts, patients |
|
|
| `inpatients` | Hospital admissions, transfers, discharges | core, accounts, hr, patients |
|
|
| `inventory` | Medical supplies and inventory management | None |
|
|
| `facility_management` | Buildings, rooms, assets, maintenance | None |
|
|
|
|
## 🎛️ Command Line Options
|
|
|
|
### Shell Script Options
|
|
```bash
|
|
./populate_data.sh [OPTIONS] [GENERATORS...]
|
|
|
|
Options:
|
|
-h, --help Show help message
|
|
-l, --list List available generators
|
|
-p, --plan Show execution plan
|
|
-v, --validate Validate dependencies only
|
|
--tenant-id ID Generate data for specific tenant ID
|
|
--tenant-slug SLUG Generate data for specific tenant slug
|
|
--skip-validation Skip dependency validation
|
|
--dry-run Show what would be done (no execution)
|
|
```
|
|
|
|
### Python Orchestrator Options
|
|
```bash
|
|
python3 populate_all_data.py [OPTIONS] [GENERATORS...]
|
|
|
|
Options:
|
|
--generators GEN... Specific generators to run
|
|
--list-generators List available generators
|
|
--show-plan Show execution plan
|
|
--validate-only Validate dependencies only
|
|
--tenant-id ID Tenant ID to generate data for
|
|
--tenant-slug SLUG Tenant slug to generate data for
|
|
--skip-validation Skip dependency validation
|
|
```
|
|
|
|
## 📊 Data Volume
|
|
|
|
Default data volumes (customizable in each generator):
|
|
|
|
- **Tenants**: 1-2
|
|
- **Users**: 50-200 per tenant
|
|
- **Patients**: 50-200 per tenant
|
|
- **Clinical Records**: 100-500 per patient
|
|
- **Inventory Items**: 50-200 per tenant
|
|
- **Facility Assets**: 50-150 per tenant
|
|
|
|
## 🔧 Customization
|
|
|
|
### Modifying Data Volumes
|
|
Edit the generator classes in individual files:
|
|
```python
|
|
# In any generator file
|
|
def run_generation(self, **kwargs):
|
|
# Modify these parameters
|
|
users_per_tenant = kwargs.get('users_per_tenant', 100)
|
|
patients_per_tenant = kwargs.get('patients_per_tenant', 150)
|
|
# ... etc
|
|
```
|
|
|
|
### Adding New Generators
|
|
1. Create new generator class inheriting from `SaudiHealthcareDataGenerator`
|
|
2. Add to `populate_all_data.py` imports and registration
|
|
3. Update execution order in `DataGenerationOrchestrator.execution_order`
|
|
|
|
### Custom Saudi Data
|
|
Add to `data_utils/constants.py`:
|
|
```python
|
|
# Add new constants
|
|
NEW_SAUDI_DATA = [
|
|
# Your Saudi-specific data here
|
|
]
|
|
|
|
# Update existing lists
|
|
SAUDI_CITIES.append("New City")
|
|
```
|
|
|
|
## 🏗️ Architecture
|
|
|
|
### Shared Utilities (`data_utils/`)
|
|
|
|
#### `constants.py`
|
|
- All Saudi-specific data constants
|
|
- Names, cities, medical terms, etc.
|
|
- Centralized for consistency
|
|
|
|
#### `generators.py`
|
|
- Common data generation functions
|
|
- Phone numbers, IDs, dates, names
|
|
- Reusable across all generators
|
|
|
|
#### `helpers.py`
|
|
- Database utilities (`safe_bulk_create`, `validate_tenant_exists`)
|
|
- Model field filtering and validation
|
|
- Progress tracking and error handling
|
|
|
|
#### `base.py`
|
|
- `BaseDataGenerator`: Basic functionality
|
|
- `SaudiHealthcareDataGenerator`: Saudi-specific base class
|
|
- `DataGenerationOrchestrator`: Dependency management
|
|
|
|
### Individual Generators
|
|
Each generator inherits from `SaudiHealthcareDataGenerator` and implements:
|
|
```python
|
|
class ExampleGenerator(SaudiHealthcareDataGenerator):
|
|
def run_generation(self, **kwargs):
|
|
# Your generation logic here
|
|
# Use self.generate_saudi_name(), self.safe_bulk_create(), etc.
|
|
pass
|
|
```
|
|
|
|
## 🧪 Testing
|
|
|
|
### Validation Only
|
|
```bash
|
|
# Check if all dependencies are satisfied
|
|
./populate_data.sh --validate
|
|
|
|
# Show execution plan without running
|
|
./populate_data.sh --plan
|
|
```
|
|
|
|
### Dry Run
|
|
```bash
|
|
# Show what would be done without creating data
|
|
./populate_data.sh --dry-run
|
|
```
|
|
|
|
### Individual Generators
|
|
```bash
|
|
# Test specific generators
|
|
./populate_data.sh core accounts
|
|
python3 populate_all_data.py --generators core patients
|
|
```
|
|
|
|
## 🐛 Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
1. **"No tenants found"**
|
|
- Run core generator first: `./populate_data.sh core`
|
|
- Or skip validation: `./populate_data.sh --skip-validation`
|
|
|
|
2. **"Django not found"**
|
|
- Ensure virtual environment is activated
|
|
- Install requirements: `pip install -r requirements.txt`
|
|
|
|
3. **"Permission denied"**
|
|
- Make script executable: `chmod +x populate_data.sh`
|
|
|
|
4. **"Import errors"**
|
|
- Ensure you're in the project root directory
|
|
- Check that all refactored files exist
|
|
|
|
### Debug Mode
|
|
```bash
|
|
# Run with verbose output
|
|
python3 populate_all_data.py --generators core --skip-validation
|
|
```
|
|
|
|
## 📈 Performance
|
|
|
|
### Optimization Tips
|
|
- **Batch Operations**: Uses `bulk_create` for large datasets
|
|
- **Progress Tracking**: Real-time progress indicators
|
|
- **Error Recovery**: Continues processing after individual failures
|
|
- **Memory Efficient**: Processes data in chunks
|
|
|
|
### Performance Metrics
|
|
- **Small Dataset**: ~50 patients, 2-3 minutes
|
|
- **Medium Dataset**: ~200 patients, 5-8 minutes
|
|
- **Large Dataset**: ~500+ patients, 15-30 minutes
|
|
|
|
## 🔒 Security & Compliance
|
|
|
|
### Saudi Healthcare Compliance
|
|
- **CBAHI Standards**: Follows Central Board for Accreditation of Healthcare Institutions
|
|
- **MOH Guidelines**: Ministry of Health data protection requirements
|
|
- **HIPAA-like**: Patient privacy and data security considerations
|
|
|
|
### Data Privacy
|
|
- **Test Data Only**: All generated data is fictional
|
|
- **No Real Patients**: Uses generated Saudi names and demographics
|
|
- **Safe Deletion**: Easy cleanup of test data
|
|
|
|
## 🤝 Contributing
|
|
|
|
### Code Standards
|
|
- Use shared utilities from `data_utils/`
|
|
- Follow dependency order in orchestrator
|
|
- Include progress tracking and error handling
|
|
- Document new generators and their dependencies
|
|
|
|
### Adding New Data Types
|
|
1. Add constants to `data_utils/constants.py`
|
|
2. Create generator functions in `data_utils/generators.py`
|
|
3. Implement new generator class
|
|
4. Register in orchestrator
|
|
5. Update documentation
|
|
|
|
## 📞 Support
|
|
|
|
For issues or questions:
|
|
1. Check the execution plan: `./populate_data.sh --plan`
|
|
2. Validate dependencies: `./populate_data.sh --validate`
|
|
3. Run individual generators for debugging
|
|
4. Check logs for specific error messages
|
|
|
|
---
|
|
|
|
**Generated with ❤️ for Saudi healthcare systems**
|