agdar/DATA_GENERATION_GUIDE.md

# Saudi-Influenced Test Data Generation Guide

## Overview

This project includes a comprehensive management command to generate realistic Saudi-influenced test data for all applications in the AgdarCentre healthcare platform.

## Features

The `generate_test_data` command creates:

### Saudi Cultural Context
- **Arabic Names**: Both English transliteration and Arabic script
  - Male names: محمد (Mohammed), عبدالله (Abdullah), فهد (Fahad), etc.
  - Female names: نورة (Noura), فاطمة (Fatima), سارة (Sarah), etc.
  - Family names: العتيبي (Al-Otaibi), الغامدي (Al-Ghamdi), etc.

- **Saudi Phone Numbers**: Proper format with Saudi mobile prefixes
  - Format: +966 5X XXX XXXX
  - Valid prefixes: 50, 53, 54, 55, 56, 57, 58, 59

- **National IDs**: 10-digit Saudi national ID format
  - Format: 1XXXXXXXXX (Saudi) or 2XXXXXXXXX (Resident)

- **Addresses**: Saudi cities and districts
  - Cities: Riyadh, Jeddah, Mecca, Medina, Dammam, etc.
  - Riyadh districts: Al-Olaya, Al-Malaz, Al-Naseem, etc.

- **Currency**: All financial data in SAR (Saudi Riyals)

- **Work Schedule**: Saudi work week (Sunday-Thursday)
  - Morning shift: 08:00-12:00
  - Afternoon shift: 15:00-19:00 (after prayer break)

- **Insurance**: Saudi insurance companies
  - Bupa Arabia, Tawuniya, Medgulf, Malath, etc.

### Generated Data

The command generates data for all apps:

1. **Core App**
   - Tenants (healthcare organizations)
   - Users (with various roles: doctors, nurses, therapists, admin, etc.)
   - Patients (with Saudi demographics)
   - Clinics/Departments
   - Files and SubFiles
   - Notification Preferences

2. **Appointments App**
   - Providers
   - Rooms
   - Schedules (Sunday-Thursday)
   - Appointments (with varied statuses)
   - Appointment Reminders
   - Appointment Confirmations

3. **Finance App**
   - Services (billable services)
   - Packages (session bundles)
   - Payers (insurance companies)
   - Invoices (with VAT)
   - Payments
   - Package Purchases

4. **Clinical Apps**
   - Medical Consultations
   - Nursing Encounters (with vital signs)
   - ABA Consultations
   - OT Sessions
   - SLP Interventions

5. **Notifications App**
   - Message Templates (bilingual)
   - Messages (SMS/WhatsApp)

6. **Referrals App**
   - Internal and external referrals

## Usage

### Basic Usage

```bash
python manage.py generate_test_data
```

This will create:
- 1 tenant
- 50 patients per tenant
- 100 appointments per tenant
- Associated clinical, financial, and communication records

### Command Options

```bash
python manage.py generate_test_data [OPTIONS]
```

**Options:**

- `--tenants N`: Number of tenants to create (default: 1)
  ```bash
  python manage.py generate_test_data --tenants 2
  ```

- `--patients N`: Number of patients per tenant (default: 50)
  ```bash
  python manage.py generate_test_data --patients 100
  ```

- `--appointments N`: Number of appointments per tenant (default: 100)
  ```bash
  python manage.py generate_test_data --appointments 200
  ```

- `--clear`: Clear existing data before generating new data
  ```bash
  python manage.py generate_test_data --clear
  ```
  **⚠️ Warning**: This will delete all existing data except superuser accounts!

### Examples

**Generate data for a single tenant with default settings:**
```bash
python manage.py generate_test_data
```

**Generate data for multiple tenants:**
```bash
python manage.py generate_test_data --tenants 3 --patients 30 --appointments 80
```

**Clear existing data and generate fresh data:**
```bash
python manage.py generate_test_data --clear --patients 100 --appointments 200
```

**Generate large dataset for testing:**
```bash
python manage.py generate_test_data --patients 200 --appointments 500
```

## Data Characteristics

### Patient Demographics
- **Age Distribution**: Weighted towards children (therapy center context)
  - 60% children (2-12 years)
  - 25% teenagers (13-18 years)
  - 15% adults (19-60 years)

### Appointment Distribution
- **Time Range**: Past 3 months + next month
- **Status Distribution**:
  - Past appointments: 75% completed, 15% no-show, 10% cancelled
  - Today's appointments: Mix of confirmed, arrived, in-progress
  - Future appointments: 70% confirmed, 30% booked
- **Scheduling**: Excludes Saudi weekends (Friday & Saturday)

### Financial Data
- **Services**: 5 service types per clinic
- **Pricing**: Realistic Saudi healthcare pricing (200-400 SAR)
- **VAT**: 15% tax applied to all invoices
- **Insurance**: Mix of self-pay and insured patients

### Clinical Records
- Generated for completed appointments only
- Includes realistic vital signs and measurements
- Age-appropriate clinical data

## Dependencies

The command requires the following Python packages:
- `django` - Django framework
- `faker` - For generating fake data
- `phonenumbers` / `django-phonenumber-field` - For phone number handling

These should already be installed in your project environment.

## Output

The command provides detailed progress output:

```
Starting Saudi-influenced test data generation...

Generating data for tenant: Agdar Rehabilitation Center
  Created 18 users
  Created 5 clinics
  Created 50 patients
  Created 15 providers
  Created 14 rooms
  Created 150 schedules
  Created 100 appointments
  Created clinical records
  Created 25 services
  Created 5 packages
  Created financial records
  Created communication records
  Created integration records

============================================================
DATA GENERATION SUMMARY
============================================================
  Appointments: 100
  Clinics: 5
  Patients: 50
  Providers: 15
  Rooms: 14
  Schedules: 150
  Services: 25
  Tenants: 1
  Users: 18
============================================================

✓ Test data generation completed successfully!
```

## Notes

- All generated data is realistic and follows Saudi cultural norms
- Patient names are bilingual (English and Arabic)
- Phone numbers follow Saudi mobile format
- Addresses use real Saudi cities and districts
- Work schedules respect Saudi work week and prayer times
- Financial data uses SAR currency with proper VAT
- Clinical data is age-appropriate and realistic

## Troubleshooting

**Issue**: Command not found
```bash
python manage.py generate_test_data
# Error: No module named 'core.management.commands.generate_test_data'
```
**Solution**: Ensure the management command directory structure exists:
```
core/
  management/
    __init__.py
    commands/
      __init__.py
      generate_test_data.py
```

**Issue**: Import errors
```bash
# Error: No module named 'faker'
```
**Solution**: Install required dependencies:
```bash
pip install faker
```

**Issue**: Database errors
```bash
# Error: UNIQUE constraint failed
```
**Solution**: Use the `--clear` flag to clear existing data first:
```bash
python manage.py generate_test_data --clear
```

## Best Practices

1. **Development**: Use smaller datasets for faster generation
   ```bash
   python manage.py generate_test_data --patients 20 --appointments 40
   ```

2. **Testing**: Use moderate datasets
   ```bash
   python manage.py generate_test_data --patients 50 --appointments 100
   ```

3. **Demo**: Use larger datasets for realistic demos
   ```bash
   python manage.py generate_test_data --patients 200 --appointments 500
   ```

4. **Fresh Start**: Always use `--clear` when you want to reset data
   ```bash
   python manage.py generate_test_data --clear
   ```

## Support

For issues or questions about the data generation command, please refer to the project documentation or contact the development team.