8.4 KiB
EMR Data Generator - Comprehensive Guide
This document explains how to use the unified emr_data.py script that combines sample EMR data generation with full ICD-10-CM XML import capabilities.
Overview
The emr_data.py script provides three modes of operation:
- Standard Mode: Generate sample EMR data with ~35 sample ICD-10 codes
- Full ICD-10 Import Mode: Generate sample EMR data + import complete ICD-10-CM from XML
- ICD-10 Only Mode: Import only ICD-10-CM codes, skip other EMR data
Prerequisites
Required
- Python 3.8+
- Django project properly configured
- Existing tenants in the database (run
core_data.pyfirst) - Existing patients in the database (run
patients_data.pyfirst)
Optional (for ICD-10 XML import)
xmlschemalibrary:pip install xmlschema- ICD-10-CM XML files (download from CDC/CMS)
Usage Examples
1. Standard Mode (Default)
Generate sample EMR data with sample ICD-10 codes:
python3 emr_data.py
What it creates:
- Note templates (5 templates)
- Patient encounters (~900-1000)
- Vital signs (~1800-2000 records)
- Problem lists (~140-150 entries)
- Care plans (~70-80 plans)
- Clinical notes (~800-900 notes)
- Sample ICD-10 codes (~35 codes)
- Clinical recommendations (~100-110)
- Allergy alerts (~20-30)
- Treatment protocols (5 protocols)
- Clinical guidelines (10 guidelines)
- Critical alerts (~5-10)
- Diagnostic suggestions (~10-20)
2. Full ICD-10 Import Mode
Generate sample EMR data + import complete ICD-10-CM from XML:
python3 emr_data.py \
--import-icd10 \
--xsd /path/to/icd10cm-tabular-2026.xsd \
--xml /path/to/icd10cm-tabular-2026.xml
What it creates:
- All standard EMR data (as above)
- Complete ICD-10-CM codes (~70,000+ codes) instead of sample codes
Optional: Truncate existing codes first
python3 emr_data.py \
--import-icd10 \
--xsd /path/to/icd10cm-tabular-2026.xsd \
--xml /path/to/icd10cm-tabular-2026.xml \
--truncate
3. ICD-10 Only Mode
Import only ICD-10-CM codes, skip all other EMR data generation:
python3 emr_data.py \
--icd10-only \
--xsd /path/to/icd10cm-tabular-2026.xsd \
--xml /path/to/icd10cm-tabular-2026.xml
What it creates:
- Only ICD-10-CM codes (~70,000+ codes)
- Skips all other EMR data generation
With truncate:
python3 emr_data.py \
--icd10-only \
--xsd /path/to/icd10cm-tabular-2026.xsd \
--xml /path/to/icd10cm-tabular-2026.xml \
--truncate
Command-Line Arguments
| Argument | Description | Required | Default |
|---|---|---|---|
--import-icd10 |
Import full ICD-10 codes from XML files | No | False |
--xsd |
Path to ICD-10 XSD schema file | Yes (with --import-icd10) | None |
--xml |
Path to ICD-10 XML data file | Yes (with --import-icd10) | None |
--icd10-only |
Only import ICD-10, skip other EMR data | No | False |
--truncate |
Delete existing ICD-10 codes before importing | No | False |
Obtaining ICD-10-CM XML Files
Official Sources
-
CDC (Centers for Disease Control and Prevention)
- URL: https://www.cdc.gov/nchs/icd/icd-10-cm.htm
- Download the "ICD-10-CM Tabular List" XML files
-
CMS (Centers for Medicare & Medicaid Services)
- URL: https://www.cms.gov/medicare/coding-billing/icd-10-codes
- Download the complete ICD-10-CM code set
Required Files
You need two files:
icd10cm-tabular-YYYY.xsd(Schema definition)icd10cm-tabular-YYYY.xml(Actual codes)
Where YYYY is the year (e.g., 2026)
Multi-Tenant Support
The script automatically creates ICD-10 codes for all tenants in your database:
# Codes are created for each tenant
for tenant in tenants:
# Creates ICD-10 codes with tenant relationship
Icd10.objects.create(
tenant=tenant,
code='E11.9',
description='Type 2 diabetes mellitus without complications',
...
)
Performance Considerations
Sample Mode (Default)
- Runtime: ~30-60 seconds
- Database Impact: Minimal (~2,000 total records)
- Recommended for: Development, testing, demos
Full ICD-10 Import Mode
- Runtime: ~5-15 minutes (depending on system)
- Database Impact: Significant (~70,000+ ICD-10 codes + ~2,000 other records)
- Recommended for: Production, staging, comprehensive testing
ICD-10 Only Mode
- Runtime: ~3-10 minutes
- Database Impact: ~70,000+ ICD-10 codes only
- Recommended for: Updating ICD-10 codes without regenerating other data
Error Handling
Missing xmlschema Library
❌ Error: xmlschema library not installed.
Install it with: pip install xmlschema
Solution: pip install xmlschema
Missing XML/XSD Files
❌ Error: --xsd and --xml are required when using --import-icd10
Solution: Provide both --xsd and --xml arguments
No Tenants Found
❌ No tenants found. Please run the core data generator first.
Solution: Run python3 core_data.py first
XML Parsing Errors
❌ Failed to parse XML: [error details]
Solution: Verify XML file integrity, ensure correct file paths
Data Generated
Sample ICD-10 Codes (Default Mode)
The script creates ~35 sample codes covering:
- Infectious diseases (A00-A04)
- Neoplasms (C00-C04)
- Circulatory diseases (I00-I06)
- Respiratory diseases (J00-J04)
- Digestive diseases (K00-K04)
- Genitourinary diseases (N00-N04)
- Symptoms and signs (R00-R04)
Full ICD-10-CM Import
Complete ICD-10-CM code set including:
- All chapters (1-22)
- All sections
- All diagnosis codes
- Parent-child relationships
- Code descriptions
- Chapter and section names
Integration with Other Data Generators
Recommended Execution Order
-
Core Data (Required first)
python3 core_data.py -
Patients Data (Required before EMR)
python3 patients_data.py -
EMR Data (This script)
python3 emr_data.py # or with full ICD-10 import python3 emr_data.py --import-icd10 --xsd path/to/file.xsd --xml path/to/file.xml -
Other Modules (Optional, any order)
python3 appointments_data.py python3 billing_data.py python3 pharmacy_data.py # etc.
Troubleshooting
Issue: Duplicate ICD-10 Codes
Symptom: UNIQUE constraint failed errors
Solution: Use --truncate flag to delete existing codes first
Issue: Slow Performance
Symptom: Script takes very long to complete Solution:
- Ensure database indexes are created
- Use SSD storage
- Consider using PostgreSQL instead of SQLite for large datasets
Issue: Memory Errors
Symptom: Out of memory errors during import Solution:
- The script uses batch processing (1000 records at a time)
- Increase available system memory
- Close other applications
Advanced Usage
Custom Days Back for Encounters
Modify the script to change the number of days of encounter history:
# In main() function, change:
encounters = create_encounters(tenants, days_back=30)
# To:
encounters = create_encounters(tenants, days_back=90) # 90 days of history
Adjusting Data Volume
Modify the random ranges in each function:
# Example: More encounters per day
daily_encounters = random.randint(20, 50) # Default
daily_encounters = random.randint(50, 100) # More data
Migration from Old System
If you were using the separate emr/management/commands/import_icd10.py Django management command:
Old Way (Django Management Command)
python manage.py import_icd10 \
--xsd path/to/file.xsd \
--xml path/to/file.xml \
--truncate
New Way (Unified Script)
python3 emr_data.py \
--icd10-only \
--xsd path/to/file.xsd \
--xml path/to/file.xml \
--truncate
Benefits of new approach:
- No need for Django management command infrastructure
- Consistent with other data generators
- Can combine with EMR data generation
- Simpler command-line interface
Support
For issues or questions:
- Check this README
- Review error messages carefully
- Verify prerequisites are met
- Check file paths are correct
- Ensure database is accessible
Version History
- v2.0 (Current): Merged ICD-10 XML import functionality
- v1.0: Original EMR data generator with sample ICD-10 codes
License
This script is part of the Hospital Management System v4 project.