hospital-management/tools/markdown/EMR_DATA_GENERATOR_README.md
Marwan Alwali 263292f6be update
2025-11-04 00:50:06 +03:00

350 lines
8.4 KiB
Markdown

# EMR Data Generator - Comprehensive Guide
This document explains how to use the unified `emr_data.py` script that combines sample EMR data generation with full ICD-10-CM XML import capabilities.
## Overview
The `emr_data.py` script provides three modes of operation:
1. **Standard Mode**: Generate sample EMR data with ~35 sample ICD-10 codes
2. **Full ICD-10 Import Mode**: Generate sample EMR data + import complete ICD-10-CM from XML
3. **ICD-10 Only Mode**: Import only ICD-10-CM codes, skip other EMR data
## Prerequisites
### Required
- Python 3.8+
- Django project properly configured
- Existing tenants in the database (run `core_data.py` first)
- Existing patients in the database (run `patients_data.py` first)
### Optional (for ICD-10 XML import)
- `xmlschema` library: `pip install xmlschema`
- ICD-10-CM XML files (download from CDC/CMS)
## Usage Examples
### 1. Standard Mode (Default)
Generate sample EMR data with sample ICD-10 codes:
```bash
python3 emr_data.py
```
**What it creates:**
- Note templates (5 templates)
- Patient encounters (~900-1000)
- Vital signs (~1800-2000 records)
- Problem lists (~140-150 entries)
- Care plans (~70-80 plans)
- Clinical notes (~800-900 notes)
- **Sample ICD-10 codes (~35 codes)**
- Clinical recommendations (~100-110)
- Allergy alerts (~20-30)
- Treatment protocols (5 protocols)
- Clinical guidelines (10 guidelines)
- Critical alerts (~5-10)
- Diagnostic suggestions (~10-20)
---
### 2. Full ICD-10 Import Mode
Generate sample EMR data + import complete ICD-10-CM from XML:
```bash
python3 emr_data.py \
--import-icd10 \
--xsd /path/to/icd10cm-tabular-2026.xsd \
--xml /path/to/icd10cm-tabular-2026.xml
```
**What it creates:**
- All standard EMR data (as above)
- **Complete ICD-10-CM codes (~70,000+ codes)** instead of sample codes
**Optional: Truncate existing codes first**
```bash
python3 emr_data.py \
--import-icd10 \
--xsd /path/to/icd10cm-tabular-2026.xsd \
--xml /path/to/icd10cm-tabular-2026.xml \
--truncate
```
---
### 3. ICD-10 Only Mode
Import only ICD-10-CM codes, skip all other EMR data generation:
```bash
python3 emr_data.py \
--icd10-only \
--xsd /path/to/icd10cm-tabular-2026.xsd \
--xml /path/to/icd10cm-tabular-2026.xml
```
**What it creates:**
- **Only ICD-10-CM codes (~70,000+ codes)**
- Skips all other EMR data generation
**With truncate:**
```bash
python3 emr_data.py \
--icd10-only \
--xsd /path/to/icd10cm-tabular-2026.xsd \
--xml /path/to/icd10cm-tabular-2026.xml \
--truncate
```
---
## Command-Line Arguments
| Argument | Description | Required | Default |
|----------|-------------|----------|---------|
| `--import-icd10` | Import full ICD-10 codes from XML files | No | False |
| `--xsd` | Path to ICD-10 XSD schema file | Yes (with --import-icd10) | None |
| `--xml` | Path to ICD-10 XML data file | Yes (with --import-icd10) | None |
| `--icd10-only` | Only import ICD-10, skip other EMR data | No | False |
| `--truncate` | Delete existing ICD-10 codes before importing | No | False |
---
## Obtaining ICD-10-CM XML Files
### Official Sources
1. **CDC (Centers for Disease Control and Prevention)**
- URL: https://www.cdc.gov/nchs/icd/icd-10-cm.htm
- Download the "ICD-10-CM Tabular List" XML files
2. **CMS (Centers for Medicare & Medicaid Services)**
- URL: https://www.cms.gov/medicare/coding-billing/icd-10-codes
- Download the complete ICD-10-CM code set
### Required Files
You need two files:
- `icd10cm-tabular-YYYY.xsd` (Schema definition)
- `icd10cm-tabular-YYYY.xml` (Actual codes)
Where `YYYY` is the year (e.g., 2026)
---
## Multi-Tenant Support
The script automatically creates ICD-10 codes for **all tenants** in your database:
```python
# Codes are created for each tenant
for tenant in tenants:
# Creates ICD-10 codes with tenant relationship
Icd10.objects.create(
tenant=tenant,
code='E11.9',
description='Type 2 diabetes mellitus without complications',
...
)
```
---
## Performance Considerations
### Sample Mode (Default)
- **Runtime**: ~30-60 seconds
- **Database Impact**: Minimal (~2,000 total records)
- **Recommended for**: Development, testing, demos
### Full ICD-10 Import Mode
- **Runtime**: ~5-15 minutes (depending on system)
- **Database Impact**: Significant (~70,000+ ICD-10 codes + ~2,000 other records)
- **Recommended for**: Production, staging, comprehensive testing
### ICD-10 Only Mode
- **Runtime**: ~3-10 minutes
- **Database Impact**: ~70,000+ ICD-10 codes only
- **Recommended for**: Updating ICD-10 codes without regenerating other data
---
## Error Handling
### Missing xmlschema Library
```
❌ Error: xmlschema library not installed.
Install it with: pip install xmlschema
```
**Solution**: `pip install xmlschema`
### Missing XML/XSD Files
```
❌ Error: --xsd and --xml are required when using --import-icd10
```
**Solution**: Provide both `--xsd` and `--xml` arguments
### No Tenants Found
```
❌ No tenants found. Please run the core data generator first.
```
**Solution**: Run `python3 core_data.py` first
### XML Parsing Errors
```
❌ Failed to parse XML: [error details]
```
**Solution**: Verify XML file integrity, ensure correct file paths
---
## Data Generated
### Sample ICD-10 Codes (Default Mode)
The script creates ~35 sample codes covering:
- Infectious diseases (A00-A04)
- Neoplasms (C00-C04)
- Circulatory diseases (I00-I06)
- Respiratory diseases (J00-J04)
- Digestive diseases (K00-K04)
- Genitourinary diseases (N00-N04)
- Symptoms and signs (R00-R04)
### Full ICD-10-CM Import
Complete ICD-10-CM code set including:
- All chapters (1-22)
- All sections
- All diagnosis codes
- Parent-child relationships
- Code descriptions
- Chapter and section names
---
## Integration with Other Data Generators
### Recommended Execution Order
1. **Core Data** (Required first)
```bash
python3 core_data.py
```
2. **Patients Data** (Required before EMR)
```bash
python3 patients_data.py
```
3. **EMR Data** (This script)
```bash
python3 emr_data.py
# or with full ICD-10 import
python3 emr_data.py --import-icd10 --xsd path/to/file.xsd --xml path/to/file.xml
```
4. **Other Modules** (Optional, any order)
```bash
python3 appointments_data.py
python3 billing_data.py
python3 pharmacy_data.py
# etc.
```
---
## Troubleshooting
### Issue: Duplicate ICD-10 Codes
**Symptom**: UNIQUE constraint failed errors
**Solution**: Use `--truncate` flag to delete existing codes first
### Issue: Slow Performance
**Symptom**: Script takes very long to complete
**Solution**:
- Ensure database indexes are created
- Use SSD storage
- Consider using PostgreSQL instead of SQLite for large datasets
### Issue: Memory Errors
**Symptom**: Out of memory errors during import
**Solution**:
- The script uses batch processing (1000 records at a time)
- Increase available system memory
- Close other applications
---
## Advanced Usage
### Custom Days Back for Encounters
Modify the script to change the number of days of encounter history:
```python
# In main() function, change:
encounters = create_encounters(tenants, days_back=30)
# To:
encounters = create_encounters(tenants, days_back=90) # 90 days of history
```
### Adjusting Data Volume
Modify the random ranges in each function:
```python
# Example: More encounters per day
daily_encounters = random.randint(20, 50) # Default
daily_encounters = random.randint(50, 100) # More data
```
---
## Migration from Old System
If you were using the separate `emr/management/commands/import_icd10.py` Django management command:
### Old Way (Django Management Command)
```bash
python manage.py import_icd10 \
--xsd path/to/file.xsd \
--xml path/to/file.xml \
--truncate
```
### New Way (Unified Script)
```bash
python3 emr_data.py \
--icd10-only \
--xsd path/to/file.xsd \
--xml path/to/file.xml \
--truncate
```
**Benefits of new approach:**
- No need for Django management command infrastructure
- Consistent with other data generators
- Can combine with EMR data generation
- Simpler command-line interface
---
## Support
For issues or questions:
1. Check this README
2. Review error messages carefully
3. Verify prerequisites are met
4. Check file paths are correct
5. Ensure database is accessible
---
## Version History
- **v2.0** (Current): Merged ICD-10 XML import functionality
- **v1.0**: Original EMR data generator with sample ICD-10 codes
---
## License
This script is part of the Hospital Management System v4 project.