350 lines
8.4 KiB
Markdown
350 lines
8.4 KiB
Markdown
# EMR Data Generator - Comprehensive Guide
|
|
|
|
This document explains how to use the unified `emr_data.py` script that combines sample EMR data generation with full ICD-10-CM XML import capabilities.
|
|
|
|
## Overview
|
|
|
|
The `emr_data.py` script provides three modes of operation:
|
|
|
|
1. **Standard Mode**: Generate sample EMR data with ~35 sample ICD-10 codes
|
|
2. **Full ICD-10 Import Mode**: Generate sample EMR data + import complete ICD-10-CM from XML
|
|
3. **ICD-10 Only Mode**: Import only ICD-10-CM codes, skip other EMR data
|
|
|
|
## Prerequisites
|
|
|
|
### Required
|
|
- Python 3.8+
|
|
- Django project properly configured
|
|
- Existing tenants in the database (run `core_data.py` first)
|
|
- Existing patients in the database (run `patients_data.py` first)
|
|
|
|
### Optional (for ICD-10 XML import)
|
|
- `xmlschema` library: `pip install xmlschema`
|
|
- ICD-10-CM XML files (download from CDC/CMS)
|
|
|
|
## Usage Examples
|
|
|
|
### 1. Standard Mode (Default)
|
|
Generate sample EMR data with sample ICD-10 codes:
|
|
|
|
```bash
|
|
python3 emr_data.py
|
|
```
|
|
|
|
**What it creates:**
|
|
- Note templates (5 templates)
|
|
- Patient encounters (~900-1000)
|
|
- Vital signs (~1800-2000 records)
|
|
- Problem lists (~140-150 entries)
|
|
- Care plans (~70-80 plans)
|
|
- Clinical notes (~800-900 notes)
|
|
- **Sample ICD-10 codes (~35 codes)**
|
|
- Clinical recommendations (~100-110)
|
|
- Allergy alerts (~20-30)
|
|
- Treatment protocols (5 protocols)
|
|
- Clinical guidelines (10 guidelines)
|
|
- Critical alerts (~5-10)
|
|
- Diagnostic suggestions (~10-20)
|
|
|
|
---
|
|
|
|
### 2. Full ICD-10 Import Mode
|
|
Generate sample EMR data + import complete ICD-10-CM from XML:
|
|
|
|
```bash
|
|
python3 emr_data.py \
|
|
--import-icd10 \
|
|
--xsd /path/to/icd10cm-tabular-2026.xsd \
|
|
--xml /path/to/icd10cm-tabular-2026.xml
|
|
```
|
|
|
|
**What it creates:**
|
|
- All standard EMR data (as above)
|
|
- **Complete ICD-10-CM codes (~70,000+ codes)** instead of sample codes
|
|
|
|
**Optional: Truncate existing codes first**
|
|
```bash
|
|
python3 emr_data.py \
|
|
--import-icd10 \
|
|
--xsd /path/to/icd10cm-tabular-2026.xsd \
|
|
--xml /path/to/icd10cm-tabular-2026.xml \
|
|
--truncate
|
|
```
|
|
|
|
---
|
|
|
|
### 3. ICD-10 Only Mode
|
|
Import only ICD-10-CM codes, skip all other EMR data generation:
|
|
|
|
```bash
|
|
python3 emr_data.py \
|
|
--icd10-only \
|
|
--xsd /path/to/icd10cm-tabular-2026.xsd \
|
|
--xml /path/to/icd10cm-tabular-2026.xml
|
|
```
|
|
|
|
**What it creates:**
|
|
- **Only ICD-10-CM codes (~70,000+ codes)**
|
|
- Skips all other EMR data generation
|
|
|
|
**With truncate:**
|
|
```bash
|
|
python3 emr_data.py \
|
|
--icd10-only \
|
|
--xsd /path/to/icd10cm-tabular-2026.xsd \
|
|
--xml /path/to/icd10cm-tabular-2026.xml \
|
|
--truncate
|
|
```
|
|
|
|
---
|
|
|
|
## Command-Line Arguments
|
|
|
|
| Argument | Description | Required | Default |
|
|
|----------|-------------|----------|---------|
|
|
| `--import-icd10` | Import full ICD-10 codes from XML files | No | False |
|
|
| `--xsd` | Path to ICD-10 XSD schema file | Yes (with --import-icd10) | None |
|
|
| `--xml` | Path to ICD-10 XML data file | Yes (with --import-icd10) | None |
|
|
| `--icd10-only` | Only import ICD-10, skip other EMR data | No | False |
|
|
| `--truncate` | Delete existing ICD-10 codes before importing | No | False |
|
|
|
|
---
|
|
|
|
## Obtaining ICD-10-CM XML Files
|
|
|
|
### Official Sources
|
|
|
|
1. **CDC (Centers for Disease Control and Prevention)**
|
|
- URL: https://www.cdc.gov/nchs/icd/icd-10-cm.htm
|
|
- Download the "ICD-10-CM Tabular List" XML files
|
|
|
|
2. **CMS (Centers for Medicare & Medicaid Services)**
|
|
- URL: https://www.cms.gov/medicare/coding-billing/icd-10-codes
|
|
- Download the complete ICD-10-CM code set
|
|
|
|
### Required Files
|
|
|
|
You need two files:
|
|
- `icd10cm-tabular-YYYY.xsd` (Schema definition)
|
|
- `icd10cm-tabular-YYYY.xml` (Actual codes)
|
|
|
|
Where `YYYY` is the year (e.g., 2026)
|
|
|
|
---
|
|
|
|
## Multi-Tenant Support
|
|
|
|
The script automatically creates ICD-10 codes for **all tenants** in your database:
|
|
|
|
```python
|
|
# Codes are created for each tenant
|
|
for tenant in tenants:
|
|
# Creates ICD-10 codes with tenant relationship
|
|
Icd10.objects.create(
|
|
tenant=tenant,
|
|
code='E11.9',
|
|
description='Type 2 diabetes mellitus without complications',
|
|
...
|
|
)
|
|
```
|
|
|
|
---
|
|
|
|
## Performance Considerations
|
|
|
|
### Sample Mode (Default)
|
|
- **Runtime**: ~30-60 seconds
|
|
- **Database Impact**: Minimal (~2,000 total records)
|
|
- **Recommended for**: Development, testing, demos
|
|
|
|
### Full ICD-10 Import Mode
|
|
- **Runtime**: ~5-15 minutes (depending on system)
|
|
- **Database Impact**: Significant (~70,000+ ICD-10 codes + ~2,000 other records)
|
|
- **Recommended for**: Production, staging, comprehensive testing
|
|
|
|
### ICD-10 Only Mode
|
|
- **Runtime**: ~3-10 minutes
|
|
- **Database Impact**: ~70,000+ ICD-10 codes only
|
|
- **Recommended for**: Updating ICD-10 codes without regenerating other data
|
|
|
|
---
|
|
|
|
## Error Handling
|
|
|
|
### Missing xmlschema Library
|
|
```
|
|
❌ Error: xmlschema library not installed.
|
|
Install it with: pip install xmlschema
|
|
```
|
|
**Solution**: `pip install xmlschema`
|
|
|
|
### Missing XML/XSD Files
|
|
```
|
|
❌ Error: --xsd and --xml are required when using --import-icd10
|
|
```
|
|
**Solution**: Provide both `--xsd` and `--xml` arguments
|
|
|
|
### No Tenants Found
|
|
```
|
|
❌ No tenants found. Please run the core data generator first.
|
|
```
|
|
**Solution**: Run `python3 core_data.py` first
|
|
|
|
### XML Parsing Errors
|
|
```
|
|
❌ Failed to parse XML: [error details]
|
|
```
|
|
**Solution**: Verify XML file integrity, ensure correct file paths
|
|
|
|
---
|
|
|
|
## Data Generated
|
|
|
|
### Sample ICD-10 Codes (Default Mode)
|
|
The script creates ~35 sample codes covering:
|
|
- Infectious diseases (A00-A04)
|
|
- Neoplasms (C00-C04)
|
|
- Circulatory diseases (I00-I06)
|
|
- Respiratory diseases (J00-J04)
|
|
- Digestive diseases (K00-K04)
|
|
- Genitourinary diseases (N00-N04)
|
|
- Symptoms and signs (R00-R04)
|
|
|
|
### Full ICD-10-CM Import
|
|
Complete ICD-10-CM code set including:
|
|
- All chapters (1-22)
|
|
- All sections
|
|
- All diagnosis codes
|
|
- Parent-child relationships
|
|
- Code descriptions
|
|
- Chapter and section names
|
|
|
|
---
|
|
|
|
## Integration with Other Data Generators
|
|
|
|
### Recommended Execution Order
|
|
|
|
1. **Core Data** (Required first)
|
|
```bash
|
|
python3 core_data.py
|
|
```
|
|
|
|
2. **Patients Data** (Required before EMR)
|
|
```bash
|
|
python3 patients_data.py
|
|
```
|
|
|
|
3. **EMR Data** (This script)
|
|
```bash
|
|
python3 emr_data.py
|
|
# or with full ICD-10 import
|
|
python3 emr_data.py --import-icd10 --xsd path/to/file.xsd --xml path/to/file.xml
|
|
```
|
|
|
|
4. **Other Modules** (Optional, any order)
|
|
```bash
|
|
python3 appointments_data.py
|
|
python3 billing_data.py
|
|
python3 pharmacy_data.py
|
|
# etc.
|
|
```
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### Issue: Duplicate ICD-10 Codes
|
|
**Symptom**: UNIQUE constraint failed errors
|
|
**Solution**: Use `--truncate` flag to delete existing codes first
|
|
|
|
### Issue: Slow Performance
|
|
**Symptom**: Script takes very long to complete
|
|
**Solution**:
|
|
- Ensure database indexes are created
|
|
- Use SSD storage
|
|
- Consider using PostgreSQL instead of SQLite for large datasets
|
|
|
|
### Issue: Memory Errors
|
|
**Symptom**: Out of memory errors during import
|
|
**Solution**:
|
|
- The script uses batch processing (1000 records at a time)
|
|
- Increase available system memory
|
|
- Close other applications
|
|
|
|
---
|
|
|
|
## Advanced Usage
|
|
|
|
### Custom Days Back for Encounters
|
|
Modify the script to change the number of days of encounter history:
|
|
|
|
```python
|
|
# In main() function, change:
|
|
encounters = create_encounters(tenants, days_back=30)
|
|
# To:
|
|
encounters = create_encounters(tenants, days_back=90) # 90 days of history
|
|
```
|
|
|
|
### Adjusting Data Volume
|
|
Modify the random ranges in each function:
|
|
|
|
```python
|
|
# Example: More encounters per day
|
|
daily_encounters = random.randint(20, 50) # Default
|
|
daily_encounters = random.randint(50, 100) # More data
|
|
```
|
|
|
|
---
|
|
|
|
## Migration from Old System
|
|
|
|
If you were using the separate `emr/management/commands/import_icd10.py` Django management command:
|
|
|
|
### Old Way (Django Management Command)
|
|
```bash
|
|
python manage.py import_icd10 \
|
|
--xsd path/to/file.xsd \
|
|
--xml path/to/file.xml \
|
|
--truncate
|
|
```
|
|
|
|
### New Way (Unified Script)
|
|
```bash
|
|
python3 emr_data.py \
|
|
--icd10-only \
|
|
--xsd path/to/file.xsd \
|
|
--xml path/to/file.xml \
|
|
--truncate
|
|
```
|
|
|
|
**Benefits of new approach:**
|
|
- No need for Django management command infrastructure
|
|
- Consistent with other data generators
|
|
- Can combine with EMR data generation
|
|
- Simpler command-line interface
|
|
|
|
---
|
|
|
|
## Support
|
|
|
|
For issues or questions:
|
|
1. Check this README
|
|
2. Review error messages carefully
|
|
3. Verify prerequisites are met
|
|
4. Check file paths are correct
|
|
5. Ensure database is accessible
|
|
|
|
---
|
|
|
|
## Version History
|
|
|
|
- **v2.0** (Current): Merged ICD-10 XML import functionality
|
|
- **v1.0**: Original EMR data generator with sample ICD-10 codes
|
|
|
|
---
|
|
|
|
## License
|
|
|
|
This script is part of the Hospital Management System v4 project.
|