Commands
Table of contents
The tool is distributed as the Docker image ghcr.io/answerdigital/oxford-omop-data-mapper:latest. Each command below is shown as a full docker run invocation so it can be copied and pasted.
Every command reads its ConnectionString, VocabularyDirectory and BatchSize settings from appsettings.json or from -e environment variables. See the configuration guide for the full list.
In the examples below, replace /path/to/database/folder and /path/to/your/data with directories on the host. The database folder must be the one that contains (or will contain) the DuckDB omop.db file.
Init command
Creates the DuckDB database file and imports the Athena vocabulary.
Example
docker run \
-e ConnectionString="DataSource=/data/omop.db;memory_limit=4GB" \
-e VocabularyDirectory="/vocabulary" \
--rm \
-v /path/to/athena/extract:/vocabulary \
-v /path/to/database/folder:/data \
ghcr.io/answerdigital/oxford-omop-data-mapper:latest \
init
Remarks
init creates the dbo, cdm and omop_staging schemas, all OMOP CDM tables, the tool’s sequences and Oxford-specific concept tables, then COPYs the Athena CSVs (CONCEPT.csv, CONCEPT_ANCESTOR.csv, CONCEPT_CLASS.csv, CONCEPT_RELATIONSHIP.csv, CONCEPT_SYNONYM.csv, DOMAIN.csv, DRUG_STRENGTH.csv, RELATIONSHIP.csv, VOCABULARY.csv) from VocabularyDirectory into cdm.*.
Run init once against a fresh database location. To reset the database, delete the .db file and re-run init. See the Database Setup guide for more detail.
Docs command
Generates transformation documentation and writes it to a specified directory.
Example
Generates transformation documentation into the host directory mounted at /out.
docker run \
--rm \
-v /path/to/output:/out \
ghcr.io/answerdigital/oxford-omop-data-mapper:latest \
docs /out
Remarks
The documentation comprises:
- Markdown documents
- SVG diagrams
- Machine readable JSON transformation explanations
Each OMOP field has a document that describes how it is mapped from all known data sources. This could be a plain copy, or include complex transformations or lookups. If a transformation includes a SQL query, the query is included alongside an explanation of its output.
The docs command does not require a database connection.
JSON example
{
"omopTable": "Observation",
"origin": "SUS-APC",
"omopColumns": [
{
"name": "nhs_number",
"operation_description": "Value copied from `NHSNumber`",
"dataSource": [
{
"name": "NHSNumber",
"description": "Patient NHS Number",
"origins": [
{
"origin": "NHS NUMBER",
"url": "https://www.datadictionary.nhs.uk/data_elements/nhs_number.html"
}
]
}
],
"query": "\nselect \n\tl1.NHSNumber, \n\tmax(l1.CDSActivityDate) as CDSActivityDate, \n\tl1.CarerSupportIndicator,\n\tl5.HospitalProviderSpellNumber,\n\tl1.RecordConnectionIdentifier\nfrom omop_staging.cds_line01 l1\n\tleft outer join omop_staging.cds_line05 l5\n\t\ton l1.MessageId = l5.MessageId\nwhere NHSNumber is not null\n\tand CarerSupportIndicator is not null\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\ngroup by \n\tl1.NHSNumber, \n\tl1.CarerSupportIndicator,\n\tl5.HospitalProviderSpellNumber,\n\tl1.RecordConnectionIdentifier;\n\t",
"lookup_table_markdown": null
}
]
}
Stage command
Loads staging data from the file system into the omop_staging schema.
Options
| Option | Applies to | Remarks |
|---|---|---|
-t, --type <type> | all | Required. One of sus-apc, sus-op, sus-ae, cosd, sact, rtds, oxford-gp. |
<filename> (positional) | all except oxford-gp | Path to the source file inside the container. |
--ccmds <path> | sus-apc | Optional path to a matching CCMDS file for the same SUS APC extract. |
--allowed_nhs_number_list_path | all | Optional. File containing a list of allowed patient NHS numbers, one per line. Records for patients outside this list are dropped (National Data Opt-Out). |
--demographics <path> | oxford-gp | Required. Path to the Oxford GP demographics CSV. |
--appointments <path> | oxford-gp | Required. Path to the Oxford GP appointments CSV. |
--events <path> | oxford-gp | Required. Path to the Oxford GP events CSV. |
--medications <path> | oxford-gp | Required. Path to the Oxford GP medications CSV. |
Supported data formats include:
- SUS+ SEM CSV Extracts -
sus-apc,sus-op,sus-ae - COSD v8-1 / v9-0-1 XML -
cosd - SACT v3.0 -
sact - RTDS -
rtds - Oxford GP extract (demographics, appointments, events, medications CSVs) -
oxford-gp
Staged data is appended to existing staging tables. To clear the staging tables first, run stage clear.
Stage a SUS APC file
docker run \
-e ConnectionString="DataSource=/data/omop.db;memory_limit=4GB" \
--rm \
-v /path/to/database/folder:/data \
-v /path/to/your/data:/input \
ghcr.io/answerdigital/oxford-omop-data-mapper:latest \
stage load --type sus-apc "/input/April 2022 Submission.zip" --ccmds "/input/April 2022 CCMDS.zip"
Stage a SUS outpatient file
docker run \
-e ConnectionString="DataSource=/data/omop.db;memory_limit=4GB" \
--rm \
-v /path/to/database/folder:/data \
-v /path/to/your/data:/input \
ghcr.io/answerdigital/oxford-omop-data-mapper:latest \
stage load --type sus-op /input/OS_SEM_1234_Outpatient_Q1_12345678_aaaaaaaa.csv
Stage a SUS A&E file
docker run \
-e ConnectionString="DataSource=/data/omop.db;memory_limit=4GB" \
--rm \
-v /path/to/database/folder:/data \
-v /path/to/your/data:/input \
ghcr.io/answerdigital/oxford-omop-data-mapper:latest \
stage load --type sus-ae /input/OS_SEM_1234_AE_Q1_12345678_aaaaaaaa.csv
Stage a COSD file
docker run \
-e ConnectionString="DataSource=/data/omop.db;memory_limit=4GB" \
--rm \
-v /path/to/database/folder:/data \
-v /path/to/your/data:/input \
ghcr.io/answerdigital/oxford-omop-data-mapper:latest \
stage load --type cosd "/input/April 2022 Submission.zip"
Stage a SACT file
docker run \
-e ConnectionString="DataSource=/data/omop.db;memory_limit=4GB" \
--rm \
-v /path/to/database/folder:/data \
-v /path/to/your/data:/input \
ghcr.io/answerdigital/oxford-omop-data-mapper:latest \
stage load --type sact /input/SACT_v3-20200101-20200131.csv
Stage an RTDS file
docker run \
-e ConnectionString="DataSource=/data/omop.db;memory_limit=4GB" \
--rm \
-v /path/to/database/folder:/data \
-v /path/to/your/data:/input \
ghcr.io/answerdigital/oxford-omop-data-mapper:latest \
stage load --type rtds /input/Rtds.zip
Stage an Oxford GP extract
docker run \
-e ConnectionString="DataSource=/data/omop.db;memory_limit=4GB" \
--rm \
-v /path/to/database/folder:/data \
-v /path/to/your/data:/input \
ghcr.io/answerdigital/oxford-omop-data-mapper:latest \
stage load --type oxford-gp \
--demographics /input/demographics.csv \
--appointments /input/appointments.csv \
--events /input/events.csv \
--medications /input/medications.csv
Opt-out example
stage load --type sus-op /input/OS_SEM_1234_Outpatient_Q1_12345678_aaaaaaaa.csv \
--allowed_nhs_number_list_path /input/allowed_patients.txt
Records for any patient whose NHS number is not in allowed_patients.txt are excluded from staging.
Clear staging command
Clears the staging tables for a given type.
Example
docker run \
-e ConnectionString="DataSource=/data/omop.db;memory_limit=4GB" \
--rm \
-v /path/to/database/folder:/data \
ghcr.io/answerdigital/oxford-omop-data-mapper:latest \
stage clear --type sus-apc
Remarks
Supported --type flags: sus-apc, sus-op, sus-ae, cosd, sact, rtds, oxford-gp.
Transform command
Transforms the staged data (and records data provenance) into the OMOP tables.
Options
| Option | Remarks |
|---|---|
-t, --type <type> | Required. One of sus-apc, sus-op, sus-ae, cosd, sact, rtds, oxford-gp, oxford-prescribing, oxford-lab, oxford-death. |
-d, --duckdb-source <expr> | Optional. A DuckDB source expression used in place of the default staging table, eg "read_csv('/data/medication_results_short.csv', all_varchar=true)". Useful for transforming directly from a CSV without a separate staging step. |
--dry-run | Optional. Runs the transformation without writing to the OMOP tables. Defaults to false. |
--batch-size <n> | Optional. Number of records to process per batch. Defaults to 4000000. |
Example
docker run \
-e ConnectionString="DataSource=/data/omop.db;memory_limit=4GB" \
--rm \
-v /path/to/database/folder:/data \
ghcr.io/answerdigital/oxford-omop-data-mapper:latest \
transform --type sus-apc
Example - transform directly from a CSV
docker run \
-e ConnectionString="DataSource=/data/omop.db;memory_limit=4GB" \
--rm \
-v /path/to/database/folder:/data \
-v /path/to/your/data:/input \
ghcr.io/answerdigital/oxford-omop-data-mapper:latest \
transform --type oxford-prescribing \
--duckdb-source "read_csv('/input/medication_results_short.csv', all_varchar=true)"
Remarks
Note that oxford-prescribing, oxford-lab and oxford-death are transform-only types - there is no stage load path for them. They are typically used with --duckdb-source against a CSV, or against data materialised into the omop_staging schema by an earlier process.
The OMOP data provenance is recorded as each data set is transformed. See data provenance.
Finalise command
Finalises the OMOP dataset by:
- Pruning incomplete OMOP records.
- Rebuilding era tables (
condition_eraanddrug_era). - Applying additional data fixes.
Example
docker run \
-e ConnectionString="DataSource=/data/omop.db;memory_limit=4GB" \
--rm \
-v /path/to/database/folder:/data \
ghcr.io/answerdigital/oxford-omop-data-mapper:latest \
finalise
Remarks
Removes any person records that either have no gender or no ethnicity, and any locations that are not referenced by a retained person.