# EMR Data Generator - Comprehensive Guide This document explains how to use the unified `emr_data.py` script that combines sample EMR data generation with full ICD-10-CM XML import capabilities. ## Overview The `emr_data.py` script provides three modes of operation: 1. **Standard Mode**: Generate sample EMR data with ~35 sample ICD-10 codes 2. **Full ICD-10 Import Mode**: Generate sample EMR data + import complete ICD-10-CM from XML 3. **ICD-10 Only Mode**: Import only ICD-10-CM codes, skip other EMR data ## Prerequisites ### Required - Python 3.8+ - Django project properly configured - Existing tenants in the database (run `core_data.py` first) - Existing patients in the database (run `patients_data.py` first) ### Optional (for ICD-10 XML import) - `xmlschema` library: `pip install xmlschema` - ICD-10-CM XML files (download from CDC/CMS) ## Usage Examples ### 1. Standard Mode (Default) Generate sample EMR data with sample ICD-10 codes: ```bash python3 emr_data.py ``` **What it creates:** - Note templates (5 templates) - Patient encounters (~900-1000) - Vital signs (~1800-2000 records) - Problem lists (~140-150 entries) - Care plans (~70-80 plans) - Clinical notes (~800-900 notes) - **Sample ICD-10 codes (~35 codes)** - Clinical recommendations (~100-110) - Allergy alerts (~20-30) - Treatment protocols (5 protocols) - Clinical guidelines (10 guidelines) - Critical alerts (~5-10) - Diagnostic suggestions (~10-20) --- ### 2. Full ICD-10 Import Mode Generate sample EMR data + import complete ICD-10-CM from XML: ```bash python3 emr_data.py \ --import-icd10 \ --xsd /path/to/icd10cm-tabular-2026.xsd \ --xml /path/to/icd10cm-tabular-2026.xml ``` **What it creates:** - All standard EMR data (as above) - **Complete ICD-10-CM codes (~70,000+ codes)** instead of sample codes **Optional: Truncate existing codes first** ```bash python3 emr_data.py \ --import-icd10 \ --xsd /path/to/icd10cm-tabular-2026.xsd \ --xml /path/to/icd10cm-tabular-2026.xml \ --truncate ``` --- ### 3. ICD-10 Only Mode Import only ICD-10-CM codes, skip all other EMR data generation: ```bash python3 emr_data.py \ --icd10-only \ --xsd /path/to/icd10cm-tabular-2026.xsd \ --xml /path/to/icd10cm-tabular-2026.xml ``` **What it creates:** - **Only ICD-10-CM codes (~70,000+ codes)** - Skips all other EMR data generation **With truncate:** ```bash python3 emr_data.py \ --icd10-only \ --xsd /path/to/icd10cm-tabular-2026.xsd \ --xml /path/to/icd10cm-tabular-2026.xml \ --truncate ``` --- ## Command-Line Arguments | Argument | Description | Required | Default | |----------|-------------|----------|---------| | `--import-icd10` | Import full ICD-10 codes from XML files | No | False | | `--xsd` | Path to ICD-10 XSD schema file | Yes (with --import-icd10) | None | | `--xml` | Path to ICD-10 XML data file | Yes (with --import-icd10) | None | | `--icd10-only` | Only import ICD-10, skip other EMR data | No | False | | `--truncate` | Delete existing ICD-10 codes before importing | No | False | --- ## Obtaining ICD-10-CM XML Files ### Official Sources 1. **CDC (Centers for Disease Control and Prevention)** - URL: https://www.cdc.gov/nchs/icd/icd-10-cm.htm - Download the "ICD-10-CM Tabular List" XML files 2. **CMS (Centers for Medicare & Medicaid Services)** - URL: https://www.cms.gov/medicare/coding-billing/icd-10-codes - Download the complete ICD-10-CM code set ### Required Files You need two files: - `icd10cm-tabular-YYYY.xsd` (Schema definition) - `icd10cm-tabular-YYYY.xml` (Actual codes) Where `YYYY` is the year (e.g., 2026) --- ## Multi-Tenant Support The script automatically creates ICD-10 codes for **all tenants** in your database: ```python # Codes are created for each tenant for tenant in tenants: # Creates ICD-10 codes with tenant relationship Icd10.objects.create( tenant=tenant, code='E11.9', description='Type 2 diabetes mellitus without complications', ... ) ``` --- ## Performance Considerations ### Sample Mode (Default) - **Runtime**: ~30-60 seconds - **Database Impact**: Minimal (~2,000 total records) - **Recommended for**: Development, testing, demos ### Full ICD-10 Import Mode - **Runtime**: ~5-15 minutes (depending on system) - **Database Impact**: Significant (~70,000+ ICD-10 codes + ~2,000 other records) - **Recommended for**: Production, staging, comprehensive testing ### ICD-10 Only Mode - **Runtime**: ~3-10 minutes - **Database Impact**: ~70,000+ ICD-10 codes only - **Recommended for**: Updating ICD-10 codes without regenerating other data --- ## Error Handling ### Missing xmlschema Library ``` ❌ Error: xmlschema library not installed. Install it with: pip install xmlschema ``` **Solution**: `pip install xmlschema` ### Missing XML/XSD Files ``` ❌ Error: --xsd and --xml are required when using --import-icd10 ``` **Solution**: Provide both `--xsd` and `--xml` arguments ### No Tenants Found ``` ❌ No tenants found. Please run the core data generator first. ``` **Solution**: Run `python3 core_data.py` first ### XML Parsing Errors ``` ❌ Failed to parse XML: [error details] ``` **Solution**: Verify XML file integrity, ensure correct file paths --- ## Data Generated ### Sample ICD-10 Codes (Default Mode) The script creates ~35 sample codes covering: - Infectious diseases (A00-A04) - Neoplasms (C00-C04) - Circulatory diseases (I00-I06) - Respiratory diseases (J00-J04) - Digestive diseases (K00-K04) - Genitourinary diseases (N00-N04) - Symptoms and signs (R00-R04) ### Full ICD-10-CM Import Complete ICD-10-CM code set including: - All chapters (1-22) - All sections - All diagnosis codes - Parent-child relationships - Code descriptions - Chapter and section names --- ## Integration with Other Data Generators ### Recommended Execution Order 1. **Core Data** (Required first) ```bash python3 core_data.py ``` 2. **Patients Data** (Required before EMR) ```bash python3 patients_data.py ``` 3. **EMR Data** (This script) ```bash python3 emr_data.py # or with full ICD-10 import python3 emr_data.py --import-icd10 --xsd path/to/file.xsd --xml path/to/file.xml ``` 4. **Other Modules** (Optional, any order) ```bash python3 appointments_data.py python3 billing_data.py python3 pharmacy_data.py # etc. ``` --- ## Troubleshooting ### Issue: Duplicate ICD-10 Codes **Symptom**: UNIQUE constraint failed errors **Solution**: Use `--truncate` flag to delete existing codes first ### Issue: Slow Performance **Symptom**: Script takes very long to complete **Solution**: - Ensure database indexes are created - Use SSD storage - Consider using PostgreSQL instead of SQLite for large datasets ### Issue: Memory Errors **Symptom**: Out of memory errors during import **Solution**: - The script uses batch processing (1000 records at a time) - Increase available system memory - Close other applications --- ## Advanced Usage ### Custom Days Back for Encounters Modify the script to change the number of days of encounter history: ```python # In main() function, change: encounters = create_encounters(tenants, days_back=30) # To: encounters = create_encounters(tenants, days_back=90) # 90 days of history ``` ### Adjusting Data Volume Modify the random ranges in each function: ```python # Example: More encounters per day daily_encounters = random.randint(20, 50) # Default daily_encounters = random.randint(50, 100) # More data ``` --- ## Migration from Old System If you were using the separate `emr/management/commands/import_icd10.py` Django management command: ### Old Way (Django Management Command) ```bash python manage.py import_icd10 \ --xsd path/to/file.xsd \ --xml path/to/file.xml \ --truncate ``` ### New Way (Unified Script) ```bash python3 emr_data.py \ --icd10-only \ --xsd path/to/file.xsd \ --xml path/to/file.xml \ --truncate ``` **Benefits of new approach:** - No need for Django management command infrastructure - Consistent with other data generators - Can combine with EMR data generation - Simpler command-line interface --- ## Support For issues or questions: 1. Check this README 2. Review error messages carefully 3. Verify prerequisites are met 4. Check file paths are correct 5. Ensure database is accessible --- ## Version History - **v2.0** (Current): Merged ICD-10 XML import functionality - **v1.0**: Original EMR data generator with sample ICD-10 codes --- ## License This script is part of the Hospital Management System v4 project.