hospital-management/tools/markdown/EMR_DATA_GENERATOR_README.md
Marwan Alwali 263292f6be update
2025-11-04 00:50:06 +03:00

8.4 KiB

EMR Data Generator - Comprehensive Guide

This document explains how to use the unified emr_data.py script that combines sample EMR data generation with full ICD-10-CM XML import capabilities.

Overview

The emr_data.py script provides three modes of operation:

  1. Standard Mode: Generate sample EMR data with ~35 sample ICD-10 codes
  2. Full ICD-10 Import Mode: Generate sample EMR data + import complete ICD-10-CM from XML
  3. ICD-10 Only Mode: Import only ICD-10-CM codes, skip other EMR data

Prerequisites

Required

  • Python 3.8+
  • Django project properly configured
  • Existing tenants in the database (run core_data.py first)
  • Existing patients in the database (run patients_data.py first)

Optional (for ICD-10 XML import)

  • xmlschema library: pip install xmlschema
  • ICD-10-CM XML files (download from CDC/CMS)

Usage Examples

1. Standard Mode (Default)

Generate sample EMR data with sample ICD-10 codes:

python3 emr_data.py

What it creates:

  • Note templates (5 templates)
  • Patient encounters (~900-1000)
  • Vital signs (~1800-2000 records)
  • Problem lists (~140-150 entries)
  • Care plans (~70-80 plans)
  • Clinical notes (~800-900 notes)
  • Sample ICD-10 codes (~35 codes)
  • Clinical recommendations (~100-110)
  • Allergy alerts (~20-30)
  • Treatment protocols (5 protocols)
  • Clinical guidelines (10 guidelines)
  • Critical alerts (~5-10)
  • Diagnostic suggestions (~10-20)

2. Full ICD-10 Import Mode

Generate sample EMR data + import complete ICD-10-CM from XML:

python3 emr_data.py \
  --import-icd10 \
  --xsd /path/to/icd10cm-tabular-2026.xsd \
  --xml /path/to/icd10cm-tabular-2026.xml

What it creates:

  • All standard EMR data (as above)
  • Complete ICD-10-CM codes (~70,000+ codes) instead of sample codes

Optional: Truncate existing codes first

python3 emr_data.py \
  --import-icd10 \
  --xsd /path/to/icd10cm-tabular-2026.xsd \
  --xml /path/to/icd10cm-tabular-2026.xml \
  --truncate

3. ICD-10 Only Mode

Import only ICD-10-CM codes, skip all other EMR data generation:

python3 emr_data.py \
  --icd10-only \
  --xsd /path/to/icd10cm-tabular-2026.xsd \
  --xml /path/to/icd10cm-tabular-2026.xml

What it creates:

  • Only ICD-10-CM codes (~70,000+ codes)
  • Skips all other EMR data generation

With truncate:

python3 emr_data.py \
  --icd10-only \
  --xsd /path/to/icd10cm-tabular-2026.xsd \
  --xml /path/to/icd10cm-tabular-2026.xml \
  --truncate

Command-Line Arguments

Argument Description Required Default
--import-icd10 Import full ICD-10 codes from XML files No False
--xsd Path to ICD-10 XSD schema file Yes (with --import-icd10) None
--xml Path to ICD-10 XML data file Yes (with --import-icd10) None
--icd10-only Only import ICD-10, skip other EMR data No False
--truncate Delete existing ICD-10 codes before importing No False

Obtaining ICD-10-CM XML Files

Official Sources

  1. CDC (Centers for Disease Control and Prevention)

  2. CMS (Centers for Medicare & Medicaid Services)

Required Files

You need two files:

  • icd10cm-tabular-YYYY.xsd (Schema definition)
  • icd10cm-tabular-YYYY.xml (Actual codes)

Where YYYY is the year (e.g., 2026)


Multi-Tenant Support

The script automatically creates ICD-10 codes for all tenants in your database:

# Codes are created for each tenant
for tenant in tenants:
    # Creates ICD-10 codes with tenant relationship
    Icd10.objects.create(
        tenant=tenant,
        code='E11.9',
        description='Type 2 diabetes mellitus without complications',
        ...
    )

Performance Considerations

Sample Mode (Default)

  • Runtime: ~30-60 seconds
  • Database Impact: Minimal (~2,000 total records)
  • Recommended for: Development, testing, demos

Full ICD-10 Import Mode

  • Runtime: ~5-15 minutes (depending on system)
  • Database Impact: Significant (~70,000+ ICD-10 codes + ~2,000 other records)
  • Recommended for: Production, staging, comprehensive testing

ICD-10 Only Mode

  • Runtime: ~3-10 minutes
  • Database Impact: ~70,000+ ICD-10 codes only
  • Recommended for: Updating ICD-10 codes without regenerating other data

Error Handling

Missing xmlschema Library

❌ Error: xmlschema library not installed.
   Install it with: pip install xmlschema

Solution: pip install xmlschema

Missing XML/XSD Files

❌ Error: --xsd and --xml are required when using --import-icd10

Solution: Provide both --xsd and --xml arguments

No Tenants Found

❌ No tenants found. Please run the core data generator first.

Solution: Run python3 core_data.py first

XML Parsing Errors

❌ Failed to parse XML: [error details]

Solution: Verify XML file integrity, ensure correct file paths


Data Generated

Sample ICD-10 Codes (Default Mode)

The script creates ~35 sample codes covering:

  • Infectious diseases (A00-A04)
  • Neoplasms (C00-C04)
  • Circulatory diseases (I00-I06)
  • Respiratory diseases (J00-J04)
  • Digestive diseases (K00-K04)
  • Genitourinary diseases (N00-N04)
  • Symptoms and signs (R00-R04)

Full ICD-10-CM Import

Complete ICD-10-CM code set including:

  • All chapters (1-22)
  • All sections
  • All diagnosis codes
  • Parent-child relationships
  • Code descriptions
  • Chapter and section names

Integration with Other Data Generators

  1. Core Data (Required first)

    python3 core_data.py
    
  2. Patients Data (Required before EMR)

    python3 patients_data.py
    
  3. EMR Data (This script)

    python3 emr_data.py
    # or with full ICD-10 import
    python3 emr_data.py --import-icd10 --xsd path/to/file.xsd --xml path/to/file.xml
    
  4. Other Modules (Optional, any order)

    python3 appointments_data.py
    python3 billing_data.py
    python3 pharmacy_data.py
    # etc.
    

Troubleshooting

Issue: Duplicate ICD-10 Codes

Symptom: UNIQUE constraint failed errors Solution: Use --truncate flag to delete existing codes first

Issue: Slow Performance

Symptom: Script takes very long to complete Solution:

  • Ensure database indexes are created
  • Use SSD storage
  • Consider using PostgreSQL instead of SQLite for large datasets

Issue: Memory Errors

Symptom: Out of memory errors during import Solution:

  • The script uses batch processing (1000 records at a time)
  • Increase available system memory
  • Close other applications

Advanced Usage

Custom Days Back for Encounters

Modify the script to change the number of days of encounter history:

# In main() function, change:
encounters = create_encounters(tenants, days_back=30)
# To:
encounters = create_encounters(tenants, days_back=90)  # 90 days of history

Adjusting Data Volume

Modify the random ranges in each function:

# Example: More encounters per day
daily_encounters = random.randint(20, 50)  # Default
daily_encounters = random.randint(50, 100)  # More data

Migration from Old System

If you were using the separate emr/management/commands/import_icd10.py Django management command:

Old Way (Django Management Command)

python manage.py import_icd10 \
  --xsd path/to/file.xsd \
  --xml path/to/file.xml \
  --truncate

New Way (Unified Script)

python3 emr_data.py \
  --icd10-only \
  --xsd path/to/file.xsd \
  --xml path/to/file.xml \
  --truncate

Benefits of new approach:

  • No need for Django management command infrastructure
  • Consistent with other data generators
  • Can combine with EMR data generation
  • Simpler command-line interface

Support

For issues or questions:

  1. Check this README
  2. Review error messages carefully
  3. Verify prerequisites are met
  4. Check file paths are correct
  5. Ensure database is accessible

Version History

  • v2.0 (Current): Merged ICD-10 XML import functionality
  • v1.0: Original EMR data generator with sample ICD-10 codes

License

This script is part of the Hospital Management System v4 project.