This file lists all the configuration properties available for an underlay, including defining the data mapping and the indexing and service deployment data pointers. This documentation is generated from annotations in the configuration classes.
- SZAttribute
- SZAttributeSearch
- SZBigQuery
- SZCorePlugin
- SZCriteriaOccurrence
- SZCriteriaRelationship
- SZCriteriaSelector
- SZCriteriaSelectorDisplay
- SZCriteriaSelectorModifier
- SZDataType
- SZDataflow
- SZEntity
- SZGroupItems
- SZHierarchy
- SZIndexData
- SZIndexer
- SZMetadata
- SZOccurrenceEntity
- SZPrepackagedCriteria
- SZPrimaryCriteriaRelationship
- SZPrimaryRelationship
- SZRollupCountsSql
- SZService
- SZSourceData
- SZSourceQuery
- SZTemporalQuery
- SZTextSearch
- SZUnderlay
- SZVisualization
Attribute or property of an entity.
Define an attribute for each column you want to display (e.g. condition.vocabulary_id
) or filter on (e.g. conditionOccurrence.person_id
).
required SZDataType
Data type of the attribute.
optional String
Field or column name in the all instances SQL file that maps to the display string of this attribute. If unset, we assume the attribute has only a value, no separate display.
A separate display field is useful for enum-type attributes, which often use a foreign-key to another table to get a readable string from a code (e.g. in OMOP, person.gender_concept_id
and concept.concept_name
).
optional Double
The maximum value to display when filtering on this attribute. This is useful when the underlying data has outliers that we want to exclude from the display, but not from the available data.
e.g. A person has an invalid date of birth that produces an age range that spans very large numbers. This causes the slider when filtering by age to span very large numbers also. Setting this property sets the right end of the slider. It does not remove the person with the invalid date of birth from the table. So if they have asthma, they would still show up in a cohort filtering on this condition.
The #szattributedisplayhintrangemin may be set as well, but they are not required to be set together. The #szattributeiscomputedisplayhint is also independent of this property. You can still calculate the actual maximum in the data, if you set this property.
optional Double
The minimum value to display when filtering on this attribute. This is useful when the underlying data has outliers that we want to exclude from the display, but not from the available data.
e.g. A person has an invalid date of birth that produces an age range that spans negative numbers. This causes the slider when filtering by age to span negative numbers also. Setting this property sets the left end of the slider. It does not remove the person with the invalid date of birth from the table. So if they have asthma, they would still show up in a cohort filtering on this condition.
The #szattributedisplayhintrangemax may be set as well, but they are not required to be set together. The #szattributeiscomputedisplayhint is also independent of this property. You can still calculate the actual minimum in the data, if you set this property.
optional boolean
When set to true, an indexing job will try to compute a display hint for this attribute (e.g. set of enum values and counts, range of numeric values). Not all data types are supported by the indexing job, yet.
Default value: false
optional boolean
True if the data type is repeated (e.g. an array of ints).
Default value: false
optional boolean
True if this attribute is suppressed for export (i.e. not available for selection in data feature sets).
Default value: false
required String
Name of the attribute.
This is the unique identifier for the attribute. In a single entity, the attribute names cannot overlap.
Name may not include spaces or special characters, only letters and numbers. The first character must be a letter.
optional SZDataType
Data type of the attribute at runtime.
If the runtime SQL wrapper is set, this field must also be set. The data type at runtime may be different from the data type at rest when the column is passed to a function at runtime. Otherwise, the data type at runtime will always match the attribute data type, so no need to specify it again here.
optional String
SQL function to apply at runtime (i.e. when running the query), instead of at indexing time. Useful for attributes we expect to be updated dynamically (e.g. a person's age).
For a simple function call that just wraps the column (e.g. UPPER(column)
), you can specify just the function name (e.g. UPPER
). For a more complicated function call, put ${fieldSql}
where the column name should be substituted (e.g. CAST(FLOOR(TIMESTAMP_DIFF(CURRENT_TIMESTAMP(), ${fieldSql}, DAY) / 365.25) AS INT64)
).
Note that BigQuery disallows query caching and pagination for certain non-deterministic functions. This negatively impacts query performance and prevents certain application behaviors that we want to support. So we workaround this for certain functions (CURRENT_DATE and CURRENT_TIMESTAMP) commonly used in runtime SQL wrappers by replacing the function with the current date/timestamp literal at runtime. You can include these functions normally in this property; the replacement will happen automatically.
optional SZSourceQuery
How to generate a query against the source data that includes this attribute.
If unspecified and exporting queries against the source data is supported for this entity is enabled (i.e. #szentitysourcequerytablename is specified), we assume the field name in the source table (#szentitysourcequerytablename) corresponding to this attribute is the same as the #szattributevaluefieldname.
optional String
Field or column name in the all instances SQL file that maps to the value of this attribute. If unset, we assume the field name is the same as the attribute name.
Configuration to optimize entity search by attributes.
Define the list of attributes to group together for optimization and specific is search for null attribute values is supported.
required List [ String ]
List of attributes grouped together for search optimization.
Order matter. Each entry is a list of attributes that are search for together. For example search is typically performed for contig and position together.
optional boolean
True if search for null values in attributes is supported.
Default value: false
Pointers to the source and index BigQuery datasets.
required String
Valid locations for BigQuery are listed in the GCP documentation.
optional List [ String ]
Comma separated list of all GCS bucket names that all export models can use. Only include the bucket name, not the gs:// prefix. Required if there are any export models that need to write to GCS.
These buckets must live in the query project specified above.
You can also specify these export buckets per-deployment, instead of per-underlay, by using the service application properties.
Example value: bq-export-uscentral1,bq-export-useast1
optional List [ String ]
Comma separated list of all BQ dataset ids that all export models can use. Required if there are any export models that need to export from BQ to GCS.
These datasets must live in the query project specified above.
You can also specify these export datasets per-deployment, instead of per-underlay, by using the service application properties.
Example value: service_export_us,service_export_uscentral1
required SZIndexData
Pointer to the index BigQuery dataset.
required String
Queries will be run in this project.
This is the project that will be billed for running queries. For the indexer, this project is also where the Dataflow jobs will be kicked off. Often this project will be the same project as the one where the index and/or source datasets live.
However, sometimes it will be different. For example, the source dataset may be a public dataset that we don't have billing access to. In that case, the indexer configuration must specify a different query project id. As another example, the source and index datasets may live in a project that is shared across service deployments. In that case, the service configurations may specify a different query project id for each deployment.
required SZSourceData
Pointer to the source BigQuery dataset.
Names of core plugins in the criteria selector and prepackaged criteria definitions.
required SZCorePlugin
Use plugin: "attribute"
.
required SZCorePlugin
Use plugin: "entityGroup"
.
required SZCorePlugin
Use plugin: "filterableGroup"
.
required SZCorePlugin
Use plugin: "multiAttribute"
.
required SZCorePlugin
Use plugin: "outputUnfiltered"
.
required SZCorePlugin
Use plugin: "survey"
.
required SZCorePlugin
Use plugin: "search"
.
required SZCorePlugin
Use plugin: "unhinted-value"
.
Criteria-Occurrence entity group configuration.
Define a version of this file for each entity group of this type. This entity group type defines a relationship between three entities. For each criteria entity instance and primary entity instance, there are one or more occurrence entity instances.
required String
Name of the criteria entity.
required String
Name of the entity group.
This is the unique identifier for the entity group. In a single underlay, the entity group names of any group type cannot overlap. Name may not include spaces or special characters, only letters and numbers. The first character must be a letter.
required Set [ SZCriteriaOccurrence$OccurrenceEntity ]
Set of occurrence entity configurations.
Most entity groups of this type will have a single occurrence entity (e.g. SNOMED condition code only maps to condition occurrences), but we also support the case of multiple (e.g. ICD9-CM condition code maps to condition, measurement, observation and procedure occurrences).
required SZPrimaryCriteriaRelationship
Relationship or join between the primary and criteria entities.
Relationship or join between an occurrence entity and the criteria entity (e.g. condition occurrence and ICD9-CM).
optional String
Name of the field or column name that maps to the criteria entity id. Required if the id pairs SQL is defined.
Example value: criteria_id
optional String
Attribute of the occurrence entity that is a foreign key to the id attribute of the criteria entity. If this property is set, then the id pairs SQL must be unset.
optional String
Name of the occurrence entity - criteria entity id pairs SQL file. File must be in the same directory as the entity group file. Name includes file extension. If this property is set, then the foreign key attribute must be unset.
There can be other columns selected in the SQL file (e.g. SELECT * FROM relationships
), but the occurrence and criteria entity ids are required.
Example value: occurrenceCriteria.sql
optional String
Name of the field or column name that maps to the occurrence entity id. Required if the id pairs SQL is defined.
Example value: occurrence_id
Criteria selector configuration.
Define a version of this file for each set of UI plugins + configuration.
required SZCriteriaSelectorDisplay
Display information.
required String
Display name.
required String
Name of a Java class that implements the FilterBuilder
interface. This class will take in the selector configuration and user selections and produce an EntityFilter
on either the primary entity (for a cohort) or another entity (for a data feature).
required boolean
True if this criteria selector should be displayed in the cohort builder.
required boolean
True if this criteria selector should be displayed in the data feature set builder.
required List [ SZCriteriaSelector$Modifier ]
Configuration for modifiers.
required String
Name of the criteria selector.
This is the unique identifier for the selector. The selector names cannot overlap within an underlay.
Name may not include spaces or special characters, only letters and numbers.
This name is stored in the application database for cohorts and data feature sets, so once there are artifacts associated with a criteria selector, you can't change the selector name.
required String
Name of the primary UI display plugin. (e.g. selector for condition, not any of the modifiers).
This plugin name is stored in the application database, so once there are cohorts or data features that use this selector, you can't change the plugin names.
The plugin can either be one of the core plugins (e.g. core/attribute, all possibilities are listed here, or a dataset-specific plugin (e.g. sd/biovu).
required String
Serialized configuration of the primary UI display plugin e.g. "{"attribute":"gender"}".
required String
Name of the file that contains the serialized configuration of the primary UI display plugin.
This file should be in the same directory as the criteria selector (e.g. gender.json
).
If this property is specified, the value of the pluginConfig
property is ignored.
required boolean
True if this criteria selector supports temporal queries.
Default value: false
Criteria selector display configuration.
required String
Category that the criteria selector is listed under when a user goes to
add a new criteria. (e.g. "Vitals")
required List [ String ]
Tags that the criteria selector should match when a user uses the dropdown in the add new
criteria page. (e.g. "Source Codes")
Criteria selector display configuration.
required String
Display name.
required String
Name of the criteria selector modifier.
This is the unique identifier for the modifier. The modifier names cannot overlap within a selector.
Name may not include spaces or special characters, only letters and numbers.
This name is stored in the application database for cohorts and data feature sets, so once there are artifacts associated with a modifier, you can't change the modifier name.
required String
Name of the modifier UI display plugin. (e.g. selector for condition visit type).
This plugin name is stored in the application database, so once there are cohorts or data features that use this modifier, you can't change the plugin names.
The plugin can either be one of the core plugins (e.g. core/attribute, all possibilities are listed here, or a dataset-specific plugin (e.g. sd/biovu).
required String
Serialized configuration of the modifier UI display plugin e.g. "{"attribute":"visitType"}".
required String
Name of the file that contains the serialized configuration of the modifier UI display plugin.
This file should be in the same directory as the criteria selector (e.g. visitType.json
).
If this property is specified, the value of the pluginConfig
property is ignored.
required boolean
True if this modifier supports temporal queries.
Default value: false
Supported data types. Each type corresponds to one or more data types in the underlying database.
required SZDataType
Maps to BigQuery BOOLEAN
data type.
required SZDataType
Maps to BigQuery DATE
data type.
required SZDataType
Maps to BigQuery NUMERIC
and FLOAT
data types.
required SZDataType
Maps to BigQuery INTEGER
data type.
required SZDataType
Maps to BigQuery STRING
data type.
required SZDataType
Maps to BigQuery TIMESTAMP
data type.
Properties to pass to Dataflow when kicking off jobs.
required String
Location where the Dataflow runners will be launched.
This must be compatible with the location of the source and index BigQuery datasets. Note the valid locations for BigQuery and Dataflow are not identical. In particular, BigQuery has multi-regions (e.g. US
) and Dataflow does not. If the BigQuery datasets are located in a region, the Dataflow location must match. If the BigQuery datasets are located in a multi-region, the Dataflow location must be one of the sub-regions (e.g. US
for BigQuery, us-central1
for Dataflow).
optional String
GCS directory where the Dataflow runners will write temporary files.
The bucket location must match the Dataflow location. This cannot be a path to a top-level bucket, it must contain at least one directory (e.g. gs://mybucket/temp/
not gs://mybucket
. If this property is unset, Dataflow will attempt to create a bucket in the correct location. This may fail if the credentials don't have permissions to create buckets. More information in the Dataflow pipeline basic options documentation and other related documentation.
required String
Email of the service account that the Dataflow runners will use.
The credentials used to kickoff the indexing must have the iam.serviceAccounts.actAs
permission on this service account. More details in the GCP documentation.
optional boolean
Specifies whether the Dataflow runners use external IP addresses.
If set to false, make sure that Private Google Access is enabled for the VPC sub-network that the Dataflow runners will use. More information in the Dataflow pipeline security and networking options documentation. We have seen noticeable improvements in speed of running indexing jobs with this set to false
.
Default value: true
optional String
Specifies which VPC sub-network the Dataflow runners use.
This property is the name of the sub-network (e.g. mysubnetwork), not the full URL path to it (e.g. https://www.googleapis.com/compute/v1/projects/my-cloud-project/regions/us-central1/subnetworks/mysubnetwork). If this property is unset, Dataflow will try to use a VPC network called "default".
If you have a custom-mode VPC network, you must set this property. Dataflow can only choose the sub-network automatically for auto-mode VPC networks. More information in the Dataflow network and subnetwork documentation.
Default value: true
optional String
Machine type of the Dataflow runners.
The available options are documented for GCP Compute Engine. If this property is unset, Dataflow will choose a machine type. More information in the Dataflow pipeline worker-level options documentation.
We have been using the n1-standard-4
machine type for all underlays so far. Given that the machine type Dataflow will choose may not remain the same in the future, recommend setting this property.
Entity configuration.
Define a version of this file for each entity.
required String
Name of the all instances SQL file.
File must be in the same directory as the entity file. Name includes file extension.
Example value: all.sql
required List [ SZAttribute ]
List of all the entity attributes.
The generated index table will preserve the order of the attributes as defined here. The list must include the id attribute.
optional String
Description of the entity.
optional String
Display name for the entity.
Unlike the entity name, it may include spaces and special characters.
optional Set [ SZHierarchy ]
List of hierarchies.
While the code supports multiple hierarchies, we currently only have examples with zero or one hierarchy.
required String
Name of the id attribute.
This must be a unique identifier for each entity instance. It must also have the INT64
data type.
required String
Name of the entity.
This is the unique identifier for the entity. In a single underlay, the entity names cannot overlap.
Name may not include spaces or special characters, only letters and numbers. The first character must be a letter.
optional List [ String ]
List of attributes to optimize for group by queries.
The typical use case for this is to optimize cohort breakdown queries on the primary entity. For example, to optimize breakdowns by age, race, gender, specify those attributes here. Order matters.
You can currently specify a maximum of four attributes, because we implement this using BigQuery clustering which has this limitation.
optional List [ SZAttributeSearch ]
List of search configs to optimize entity search by attributes.
The typical use case for this is to optimize attribute based search on large entity tables that cannot be optimised for search on multiple attribute fields. For example, to optimize search on the variant table using attributes values for gene and rs_number. Each entry is a list of attributes that are search for together. For example search is typically performed for contig and position together.
optional String
Full name of the table to use when exporting a query against the source data.
SQL substitutions are supported in this table name.
If unspecified, exporting a query against the source data is unsupported.
Example value: ${omopDataset}.condition_occurrence
optional SZTemporalQuery
How to generate a temporal query for this entity.
If unspecified, temporal queries that include this output entity are not allowed.
optional SZTextSearch
Text search configuration.
This is used when filtering a list of instances of this entity (e.g. list of conditions) by text. If unset, filtering by text is unsupported.
Group-Items entity group configuration.
Define a version of this file for each entity group of this type. This entity group type defines a one-to-many relationship between two entities. For each group entity instance, there are one or more items entity instances.
optional String
Attribute of the items entity that is a foreign key to the id attribute of the group entity.
If this property is set, then the id pairs SQL must be unset.
required String
Name of the group entity.
optional String
Name of the field or column name that maps to the group entity id.
Required if the id pairs SQL is defined.
Example value: group_id
optional String
Name of the group entity - items entity id pairs SQL file.
File must be in the same directory as the entity group file. Name includes file extension.
There can be other columns selected in the SQL file (e.g. SELECT * FROM relationships
), but the group and items entity ids are required. If this property is set, then the foreign key atttribute must be unset.
Example value: idPairs.sql
required String
Name of the items entity.
optional String
Name of the field or column name that maps to the items entity id.
Required if the id pairs SQL is defined.
Example value: items_id
required String
Name of the entity group.
This is the unique identifier for the entity group. In a single underlay, the entity group names of any group type cannot overlap. Name may not include spaces or special characters, only letters and numbers. The first character must be a letter.
optional SZRollupCountsSql
Pointer to SQL that returns entity id - rollup count (= number of related entity instances) pairs.
optional boolean
True to skip copying the id-pairs SQL into a new index table, and use the source SQL directly.
Ignored if the id pairs SQL is undefined.
Default value: false
Hierarchy for an entity.
required String
Name of the field or column name in the child parent id pairs SQL that maps to the child id.
Example value: child
required String
Name of the child parent id pairs SQL file.
File must be in the same directory as the entity file. Name includes file extension.
There can be other columns selected in the SQL file (e.g. SELECT * FROM relationships
), but the child and parent ids are required.
Example value: childParent.sql
optional boolean
When false, indexing jobs will not clean hierarchy nodes with both a zero item and rollup counts. When true, indexing jobs will clean hierarchy nodes with both a zero item and rollup counts.
Default value: false
optional boolean
An orphan node has no parents or children. When false, indexing jobs will filter out orphan nodes. When true, indexing jobs skip this filtering step and we keep the orphan nodes in the hierarchy.
Default value: false
required int
Maximum depth of the hierarchy. If there are branches of the hierarchy that are deeper than the number specified here, they will be truncated.
optional String
Name of the hierarchy.
This is the unique identifier for the hierarchy. In a single entity, the hierarchy names cannot overlap. Name may not include spaces or special characters, only letters and numbers. The first character must be a letter.
If there is only one hierarchy, the name is optional and, if unspecified, will be set to default
. If there are multiple hierarchies, the name is required for each one.
Default value: default
required String
Name of the field or column name in the child parent id pairs SQL that maps to the parent id.
Example value: parent
optional String
Name of the field or column name that maps to the root id.
If the root node ids SQL is defined, then this property is required. If the root node ids set is defined, then this property must be unset.
Example value: root_id
optional Set [ Long ]
Set of root ids. Indexing jobs will filter out any hierarchy root nodes that are not in this set. If the root node ids SQL is defined, then this property must be unset.
optional String
Name of the root id SQL file. File must be in the same directory as the entity file. Name includes file extension.
There can be other columns selected in the SQL file (e.g. SELECT * FROM roots
), but the root id is required. Indexing jobs will filter out any hierarchy root nodes that are not returned by this query. If the root node ids set is defined, then this property must be unset.
Example value: rootNode.sql
Pointer to the index BigQuery dataset.
required String
Dataset id of the index BigQuery dataset.
required String
Project id of the index BigQuery dataset.
optional String
Prefix for the generated index tables.
An underscore will be inserted between this prefix and the table name (e.g. prefix T
will generate a table called "T_ENT_person"). The prefix may not include spaces or special characters, only letters and numbers. The first character must be a letter. This can be useful when the index tables will be written to a dataset that includes other non-Tanagra tables.
Indexer configuration.
Define a version of this file for each place you will run indexing. If you later copy the index dataset to other places, you do not need a separate configuration for those.
required SZBigQuery
Pointers to the source and index BigQuery datasets.
required SZDataflow
Dataflow configuration.
Required for indexing jobs that use batch processing (e.g. computing the ancestor-descendant pairs for a hierarchy).
required String
Name of the underlay to index.
Name is specified in the underlay file, and also matches the name of the config/underlay sub-directory in the underlay sub-project resources.
Example value: cmssynpuf
Metadata for the underlay.
Information in this object is not used in the operation of the indexer or service, it is for display purposes only.
optional String
Description of the underlay.
required String
Display name for the underlay.
Unlike the underlay name, it may include spaces and special characters.
optional Map [ String, String ]
Key-value map of underlay properties.
Keys may not include spaces or special characters, only letters and numbers.
Occurrence entity configuration.
required Set [ String ]
Names of attributes that we want to calculate instance-level hints for.
Instance-level hints are ranges of possible values for a particular criteria instance. They are used to support criteria-specific modifiers (e.g. range of values for measurement code "glucose test").
required Set [ String ]
Names of attributes that we want to calculate instance-level hints for which values should be rolled up and included in their ancestors hints as well.
required SZCriteriaRelationship
Relationship or join between this occurrence entity and the criteria entity (e.g. condition occurrence and ICD9-CM).
required String
Name of occurrence entity.
required SZPrimaryRelationship
Relationship or join between this occurrence entity and the primary entity (e.g. condition occurrence and person).
Prepackaged criteria configuration.
required String
Name of the criteria selector this criteria is associated with.
The criteria selector must be defined for the underlay. (e.g. The condition selector must be defined in order to define a prepackaged data feature for condition = Type 2 Diabetes.)
required String
Display name.
required String
Name of the prepackaged criteria.
This is the unique identifier for the criteria. The criteria names cannot overlap within an underlay.
Name may not include spaces or special characters, only letters and numbers.
This name is stored in the application database for data feature sets, so once there are artifacts associated with a prepackaged criteria, you can't change the criteria name.
required String
Serialized data for the UI display plugin e.g. "{"conceptId":"201826"}".
required String
Name of the file that contains the serialized data for the UI display plugin.
This file should be in the same directory as the prepackaged criteria (e.g. condition.json
).
If this property is specified, the value of the pluginData
property is ignored.
Relationship or join between the primary and criteria entities (e.g. condition and person).
required String
Name of the field or column name that maps to the criteria entity id.
Example value: criteria_id
required String
Name of the primary entity - criteria entity id pairs SQL file. File must be in the same directory as the entity group file. Name includes file extension. There can be other columns selected in the SQL file (e.g. SELECT * FROM relationships
), but the primary and criteria entity ids are required.
Example value: primaryCriteria.sql
required String
Name of the field or column name that maps to the primary entity id.
Example value: primary_id
Relationship or join between an occurrence entity and the primary entity (e.g. condition occurrence and person).
optional String
Attribute of the occurrence entity that is a foreign key to the id attribute of the primary entity. If this property is set, then the id pairs SQL must be unset.
optional String
Name of the occurrence entity - primary entity id pairs SQL file. File must be in the same directory as the entity group file. Name includes file extension. If this property is set, then the foreign key attribute must be unset.
There can be other columns selected in the SQL file (e.g. SELECT * FROM relationships
), but the occurrence and primary entity ids are required.
Example value: occurrencePrimary.sql
optional String
Name of the field or column name that maps to the occurrence entity id. Required if the id pairs SQL is defined.
Example value: occurrence_id
optional String
Name of the field or column name that maps to the primary entity id. Required if the id pairs SQL is defined.
Example value: primary_id
Pointer to SQL that returns entity id - rollup count (= number of related entity instances) pairs (e.g. variant - number of people). Useful when there's an easy way to calculate these in SQL and we want to avoid ingesting the full entity - related entity relationship id pairs table into Dataflow.
required String
Name of the field or column name that maps to the entity id.
Example value: entity_id
required String
Name of the field or column name that maps to the rollup count per entity id.
Example value: rollup_count
required String
Name of the entity id - rollup counts (= number of items entity instances) pairs SQL file.
File must be in the same directory as the entity/group file. Name includes file extension.
There can be other columns selected in the SQL file (e.g. SELECT * FROM relationships
), but the entity id and rollup count fields are required.
Example value: rollupCounts.sql
Service configuration.
Define a version of this file for each place you will deploy the service. If you share the same index dataset across multiple service deployments, you need a separate configuration for each.
required SZBigQuery
Pointers to the source and index BigQuery datasets.
required String
Name of the underlay to make available in the service deployment.
If a single deployment serves multiple underlays, you need a separate configuration for each. Name is specified in the underlay file, and also matches the name of the config/underlay sub-directory in the underlay sub-project resources.
Example value: cmssynpuf
Pointer to the source BigQuery dataset.
required String
Dataset id of the source BigQuery dataset.
required String
Project id of the source BigQuery dataset.
optional Map [ String, String ]
Key-value map of substitutions to make in the input SQL files.
Wherever the keys appear in the input SQL files wrapped in braces and preceded by a dollar sign, they will be substituted by the values before running the queries. For example, [key] omopDataset
-> [value] bigquery-public-data.cms_synthetic_patient_data_omop
means ${omopDataset}
in any of the input SQL files will be replaced by bigquery-public-data.cms_synthetic_patient_data_omop
.
Keys may not include spaces or special characters, only letters and numbers. This is simple string substitution logic and does not handle more complicated cases, such as nested substitutions.
Information to generate a SQL query against the source dataset for a given attribute.
This query isn't actually run by the service, only generated as an export option (e.g. as part of a notebook file).
optional String
Name of the field to use for the attribute display in the source dataset.
If unspecified, exporting a query with this attribute against the source data will not include a separate display field.
The table can optionally be specified in #szsourcequerydisplayfieldtable.
Example value: concept_name
optional String
Full name of the table to JOIN with the main table (#szentitysourcequerytablename) to get the attribute display field in the source dataset.
SQL substitutions are supported in this table name.
If unspecified, and #szsourcequerydisplayfieldname is specified, then we assume that the source display field is also in the main table, same as the source value field.
The #szsourcequerydisplayfieldtablejoinfieldname is required if this property is specified.
Example value: ${omopDataset}.concept
optional String
Name of the field in the display table (#szsourcequerydisplayfieldtable) that is used to JOIN to the main table (#szentitysourcequerytablename) using the source value field (#szsourcequeryvaluefieldname).
This is required if the #szsourcequerydisplayfieldtable is specified.
Example value: concept_id
optional String
Name of the field to use for the attribute value in the source dataset table (#szentitysourcequerytablename).
If unspecified, we assume the field name in the source table (#szentitysourcequerytablename) corresponding to this attribute is the same as the #szattributevaluefieldname.
Example value: condition_concept_id
Information to build a temporal query with this entity.
required String
Name of the attribute to use for the visit date in a temporal query.
Example value: start_date
required String
Name of the attribute to use for the visit (occurrence) id in a temporal query.
Example value: visit_occurrence_id
Text search configuration for an entity.
optional Set [ String ]
Set of attributes to allow text search on. Text search on attributes not included here is unsupported.
optional String
Name of the field or column name that maps to the entity id. If the id text pairs SQL is defined, then this property is required.
Example value: id
optional String
Name of the id text pairs SQL file. File must be in the same directory as the entity file. Name includes file extension.
There can be other columns selected in the SQL file (e.g. SELECT * FROM synonyms
), but the entity id and text string is required. The SQL query may return multiple rows per entity id.
Example value: textSearch.sql
optional String
Name of the field or column name that maps to the text search string. If the id text pairs SQL is defined, then this property is required.
Example value: text
Underlay configuration.
Define a version of this file for each dataset. If you index and/or serve a dataset in multiple places or deployments, you only need one version of this file.
required Set [ String ]
List of paths of criteria-occurrence
type entity groups.
A criteria-occurrence
type entity group defines a relationship between three entities.
Path consists of two parts: [Data-Mapping Group]/[Entity Group Name] (e.g. omop/conditionPerson
).
[Data-Mapping Group] is the name of a sub-directory of the config/datamapping/ sub-directory in the underlay sub-project resources (e.g. omop
).
[Entity Group Name] is specified in the entity group file, and also matches the name of the sub-directory of the config/datamapping/[Data-Mapping Group]/entitygroup sub-directory in the underlay sub-project resources (e.g. conditionPerson
).
Using the path here instead of just the entity group name allows us to share entity group definitions across underlays. For example, the omop
data-mapping group contains template entity group definitions for standing up a new underlay.
required List [ String ]
List of paths of all the criteria selectors.
A criteria selector is an option for defining a filter on an entity (e.g. select a condition). It corresponds to one or more UI display plugins. (e.g. condition selector uses the entity group plugin for selecting the condition, the attribute plugin for selecting the visit type modifier, and the unhinted-value plugin for selecting the occurrence count modifier).
Path consists of two parts: [Display Group]/[Criteria Selector Name] (e.g. omop/gender
).
[Display Group] is the name of a sub-directory of the config/display/ sub-directory in the underlay sub-project resources (e.g. omop
).
[Criteria Selector Name] is specified in the selector file, and also matches the name of the sub-directory of the config/display/[Display Group]/criteriaselector sub-directory in the underlay sub-project resources (e.g. gender
).
Using the path here instead of just the selector name allows us to share selector definitions across underlays. For example, the omop
display group contains template selector definitions for standing up a new underlay.
required Set [ String ]
List of paths of all the entities.
An entity is any object that the UI might show a list of (e.g. list of persons, conditions, condition occurrences). The list must include the primary entity.
Path consists of two parts: [Data-Mapping Group]/[Entity Name] (e.g. omop/condition
).
[Data-Mapping Group] is the name of a sub-directory of the config/datamapping/ sub-directory in the underlay sub-project resources (e.g. omop
).
[Entity Name] is specified in the entity file, and also matches the name of the sub-directory of the config/datamapping/[Data-Mapping Group]/entity sub-directory in the underlay sub-project resources (e.g. condition
).
Using the path here instead of just the entity name allows us to share entity definitions across underlays. For example, the omop
data-mapping group contains template entity definitions for standing up a new underlay.
required Set [ String ]
List of paths of group-items
type entity groups.
A group-items
type entity group defines a relationship between two entities.
Path consists of two parts: [Data-Mapping Group]/[Entity Group Name] (e.g. omop/brandIngredient
).
[Data-Mapping Group] is the name of a sub-directory of the config/datamapping/ sub-directory in the underlay sub-project resources (e.g. omop
).
[Entity Group Name] is specified in the entity group file, and also matches the name of the sub-directory of the config/datamapping/[Data-Mapping Group]/entitygroup sub-directory in the underlay sub-project resources (e.g. brandIngredient
).
Using the path here instead of just the entity group name allows us to share entity group definitions across underlays. For example, the omop
data-mapping group contains template entity group definitions for standing up a new underlay.
required SZMetadata
Metadata for the underlay.
required String
Name of the underlay.
This is the unique identifier for the underlay. If you serve multiple underlays in a single service deployment, the underlay names cannot overlap. Name may not include spaces or special characters, only letters and numbers.
This name is stored in the application database for cohorts and data feature sets, so once there are artifacts associated with an underlay, you can't change the underlay name.
required Set [ String ]
List of paths of all the prepackaged data features.
A prepackaged data feature is a predefined data feature for exporting data (e.g. demographics). It contains data for zero or more UI display plugins. (e.g. type 2 diabetes data feature defines data for the entity group plugin).
Path consists of two parts: [Display Group]/[Prepackaged Data Feature Name] (e.g. omop/demographics
).
[Display Group] is the name of a sub-directory of the config/display/ sub-directory in the underlay sub-project resources (e.g. omop
).
[Prepackaged Data Feature Name] is specified in the prepackaged file, and also matches the name of the sub-directory of the config/display/[Display Group]/prepackagedcriteria sub-directory in the underlay sub-project resources (e.g. demographics
).
Using the path here instead of just the prepackaged criteria name allows us to share criteria definitions across underlays. For example, the omop
display group contains template criteria definitions for standing up a new underlay.
required String
Name of the primary entity.
A cohort contains instances of the primary entity (e.g. persons).
required String
Name of the UI config file.
File must be in the same directory as the underlay file. Name includes file extension.
Example value: ui.json
required List [ String ]
List of paths of all the visualizations.
A visualization contains all of the configuration to display a underlay or cohort level visualization in the UI.
Path consists of two parts: [Display Group]/[Visualization Name] (e.g. omop/peopleByAge
).
[Display Group] is the name of a sub-directory of the config/ui/ sub-directory in the underlay sub-project resources (e.g. omop
).
[Visualization Name] is specified in the visualization file, and also matches the name of the sub-directory of the config/ui/[Display Group]/viz sub-directory in the underlay sub-project resources (e.g. peopleByAge
).
Using the path here instead of just the visualization name allows us to share visualization definitions across underlays. For example, the omop
visualization group contains template visualization definitions for standing up a new underlay.
Configuration for a single visualization.
required String
Serialized configuration of the visualization. VizConfig protocol buffer as JSON.
required String
Name of the file that contains the serialized configuration of the visualization.
This file should be in the same directory as the visualization (e.g. gender.json
).
If this property is specified, the value of the config
property is ignored.
required String
Name of the visualization.
This is the unique identifier for the vizualization. The vizualization names cannot overlap within an underlay.
Name may not include spaces or special characters, only letters and numbers.
required String
Name of the visualization UI plugin.
required String
Serialized configuration of the visualization UI plugin as JSON.
required String
Name of the file that contains the serialized configuration of the visualization UI plugin.
This file should be in the same directory as the visualization (e.g. gender.json
).
If this property is specified, the value of the pluginConfig
property is ignored.
required String
Visible title of the visualization.