Skip to content

Commit

Permalink
Merge branch 'master' into feature/synthetic-source-recovery-challenge
Browse files Browse the repository at this point in the history
  • Loading branch information
salvatore-campagna committed Jan 10, 2025
2 parents e2a6d71 + 2b30bc2 commit 790d1a5
Show file tree
Hide file tree
Showing 35 changed files with 1,820 additions and 6 deletions.
2 changes: 1 addition & 1 deletion elastic/logs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -220,7 +220,7 @@ The following parameters are available:
* `corpora_uri_base` (default: `https://rally-tracks.elastic.co`) - Specify the base location of the datasets used by this track.
* `lifecycle` (default: unset to fall back on Serverless detection) - Specifies the lifecycle management feature to use for data streams. Use `ilm` for index lifecycle management or `dlm` for data lifecycle management. By default, `dlm` will be used for benchmarking Serverless Elasticsearch.
* `workflow-request-cache` (default: `true`) - Explicit control of request cache query parameter in searches executed in a workflow. This can be further overriden at an operation level with `request-cache` parameter.
* `synthetic_source_keep` (default: unset) - Allows overriding the default synthetic source behaviour for all field types with the following values: `none` (equivalent to unset) - no source is stored, `arrays` - source stored as is only for multi-value (array) fields.
* `synthetic_source_keep` (default: unset): If specified, configures the `index.mapping.synthetic_source_keep` index setting.
* `source_mode` (default: unset) - Specifies the source mode to be used.
* `use_synthetic_source_recovery` (default: unset): Whether synthetic recovery source will be used.
* `recovery_target_index` (required) - The target index for fetching shard changes via the recovery API.
Expand Down
7 changes: 7 additions & 0 deletions elastic/logs/templates/component/auditbeat-mappings.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,13 @@
}
},
"refresh_interval": "30s",
{% if route_on_sort_fields | default(false) is true %}
"sort": {
"field": [ "host.name", "kubernetes.pod.uid", "log.logger", "@timestamp" ],
"order": [ "asc", "asc", "asc", "desc" ]
},
"logsdb.route_on_sort_fields": true,
{% endif %}
{# non-serverless-index-settings-marker-start #}{%- if build_flavor != "serverless" or serverless_operator == true -%}
"max_docvalue_fields_search": 200,
"number_of_shards": 1,
Expand Down
7 changes: 7 additions & 0 deletions elastic/logs/templates/component/[email protected]
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@
{%- if disable_pipelines is not true %}
"default_pipeline": "logs-apache.access-1.18.0",
{%- endif %}
{% if route_on_sort_fields | default(false) is true %}
"sort": {
"field": [ "host.name", "user_agent.name", "log.file.path", "@timestamp" ],
"order": [ "asc", "asc", "asc", "desc" ]
},
"logsdb.route_on_sort_fields": true,
{% endif %}
"mapping": {
"total_fields": {
"limit": "10000"
Expand Down
7 changes: 7 additions & 0 deletions elastic/logs/templates/component/[email protected]
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@
{%- if disable_pipelines is not true %}
"default_pipeline": "logs-apache.error-1.18.0",
{%- endif %}
{% if route_on_sort_fields | default(false) is true %}
"sort": {
"field": [ "host.name", "user_agent.name", "log.file.path", "@timestamp" ],
"order": [ "asc", "asc", "asc", "desc" ]
},
"logsdb.route_on_sort_fields": true,
{% endif %}
"mapping": {
"total_fields": {
"limit": "10000"
Expand Down
7 changes: 7 additions & 0 deletions elastic/logs/templates/component/[email protected]
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@
{%- if disable_pipelines is not true %}
"default_pipeline": "logs-kafka.log-1.13.0",
{%- endif %}
{% if route_on_sort_fields | default(false) is true %}
"sort": {
"field": [ "host.name", "event.type", "kafka.log.component", "@timestamp" ],
"order": [ "asc", "asc", "asc", "desc" ]
},
"logsdb.route_on_sort_fields": true,
{% endif %}
"mapping": {
"total_fields": {
"limit": "10000"
Expand Down
7 changes: 7 additions & 0 deletions elastic/logs/templates/component/[email protected]
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@
{%- if disable_pipelines is not true %}
"default_pipeline": "logs-mysql.error-1.19.0",
{%- endif %}
{% if route_on_sort_fields | default(false) is true %}
"sort": {
"field": [ "host.name", "event.type", "log.file.path", "@timestamp" ],
"order": [ "asc", "asc", "asc", "desc" ]
},
"logsdb.route_on_sort_fields": true,
{% endif %}
"mapping": {
"total_fields": {
"limit": "10000"
Expand Down
7 changes: 7 additions & 0 deletions elastic/logs/templates/component/[email protected]
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@
{%- if disable_pipelines is not true %}
"default_pipeline": "logs-mysql.slowlog-1.19.0",
{%- endif %}
{% if route_on_sort_fields | default(false) is true %}
"sort": {
"field": [ "host.name", "user.name", "log.file.path", "@timestamp" ],
"order": [ "asc", "asc", "asc", "desc" ]
},
"logsdb.route_on_sort_fields": true,
{% endif %}
"mapping": {
"total_fields": {
"limit": "10000"
Expand Down
7 changes: 7 additions & 0 deletions elastic/logs/templates/component/[email protected]
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@
{%- if disable_pipelines is not true %}
"default_pipeline": "logs-nginx.access-1.20.0",
{%- endif %}
{% if route_on_sort_fields | default(false) is true %}
"sort": {
"field": [ "host.name", "user_agent.name", "log.file.path", "@timestamp" ],
"order": [ "asc", "asc", "asc", "desc" ]
},
"logsdb.route_on_sort_fields": true,
{% endif %}
"mapping": {
"total_fields": {
"limit": "10000"
Expand Down
7 changes: 7 additions & 0 deletions elastic/logs/templates/component/[email protected]
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@
{%- if disable_pipelines is not true %}
"default_pipeline": "logs-nginx.error-1.20.0",
{%- endif %}
{% if route_on_sort_fields | default(false) is true %}
"sort": {
"field": [ "host.name", "input.type", "log.file.path", "@timestamp" ],
"order": [ "asc", "asc", "asc", "desc" ]
},
"logsdb.route_on_sort_fields": true,
{% endif %}
"mapping": {
"total_fields": {
"limit": "10000"
Expand Down
7 changes: 7 additions & 0 deletions elastic/logs/templates/component/[email protected]
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@
{%- if disable_pipelines is not true %}
"default_pipeline": "logs-postgresql.log-1.20.0",
{%- endif %}
{% if route_on_sort_fields | default(false) is true %}
"sort": {
"field": [ "host.name", "error.code", "event.code", "@timestamp" ],
"order": [ "asc", "asc", "asc", "desc" ]
},
"logsdb.route_on_sort_fields": true,
{% endif %}
"mapping": {
"total_fields": {
"limit": "10000"
Expand Down
7 changes: 7 additions & 0 deletions elastic/logs/templates/component/[email protected]
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@
{%- if disable_pipelines is not true %}
"default_pipeline": "logs-redis.log-1.15.0",
{%- endif %}
{% if route_on_sort_fields | default(false) is true %}
"sort": {
"field": [ "host.name", "redis.log.role", "log.level", "@timestamp" ],
"order": [ "asc", "asc", "asc", "desc" ]
},
"logsdb.route_on_sort_fields": true,
{% endif %}
"mapping": {
"total_fields": {
"limit": "10000"
Expand Down
7 changes: 7 additions & 0 deletions elastic/logs/templates/component/[email protected]
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@
{%- if disable_pipelines is not true %}
"default_pipeline": "logs-redis.slowlog-1.15.0",
{%- endif %}
{% if route_on_sort_fields | default(false) is true %}
"sort": {
"field": [ "host.name", "redis.slowlog.key", "@timestamp" ],
"order": [ "asc", "asc", "desc" ]
},
"logsdb.route_on_sort_fields": true,
{% endif %}
"mapping": {
"total_fields": {
"limit": "10000"
Expand Down
7 changes: 7 additions & 0 deletions elastic/logs/templates/component/[email protected]
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@
{%- if disable_pipelines is not true %}
"default_pipeline": "logs-system.auth-1.58.1",
{%- endif %}
{% if route_on_sort_fields | default(false) is true %}
"sort": {
"field": [ "host.name", "event.code", "log.file.path", "@timestamp" ],
"order": [ "asc", "asc", "asc", "desc" ]
},
"logsdb.route_on_sort_fields": true,
{% endif %}
"mapping": {
"total_fields": {
"limit": "10000"
Expand Down
7 changes: 7 additions & 0 deletions elastic/logs/templates/component/[email protected]
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@
{%- if disable_pipelines is not true %}
"default_pipeline": "logs-system.syslog-1.58.1",
{%- endif %}
{% if route_on_sort_fields | default(false) is true %}
"sort": {
"field": [ "host.name", "event.code", "log.file.path", "@timestamp" ],
"order": [ "asc", "asc", "asc", "desc" ]
},
"logsdb.route_on_sort_fields": true,
{% endif %}
"mapping": {
"total_fields": {
"limit": "10000"
Expand Down
17 changes: 15 additions & 2 deletions elastic/logs/templates/component/track-custom-mappings.json
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
{
"template" : {
"mappings" : {
{% if index_mode | default('standard') is equalto 'standard' %}
"runtime": {
"rally.doc_size": {
"type": "long"
},
"rally.message_size": {
"type": "long"
}
},
},{% endif %}
"properties" : {
"event": {
"properties": {
Expand All @@ -21,7 +22,19 @@
"format": "strict_date_optional_time"
}
}
}
}{% if index_mode | default('standard') is equalto 'logsdb' %},
"rally": {
"properties" : {
"doc_size": {
"type": "long",
"index": false
},
"message_size": {
"type": "long",
"index": false
}
}
}{% endif %}
}
}
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,10 @@
{% if index_mode %}
"index": {
"mode": {{ index_mode | tojson }}
{% if synthetic_source_keep and synthetic_source_keep != 'none' %}
{% if use_synthetic_source_recovery %}
,"recovery.use_synthetic_source": {{use_synthetic_source_recovery | tojson}}
{% endif %}
{% if synthetic_source_keep %}
,"mapping.synthetic_source_keep": {{ synthetic_source_keep | tojson }}
{% endif %}
}
Expand Down
2 changes: 1 addition & 1 deletion elastic/security/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ The following parameters are available:
* `wait_for_status` (default: `green`) - The track creates Data Streams prior to indexing. All created Data Streams must at least reach this status before indexing commences. Reduce to `yellow` for clusters where green isn't possible e.g. single node.
* `corpora_uri_base` (default: `https://rally-tracks.elastic.co`) - Specify the base location of the datasets used by this track.
* `index_mode` (default: unset) - A parameter meant to be used internally which defines one of the available indexing modes, "standard", "logsdb" or "time_series". If not set, "standard" is used.
* `synthetic_source_keep` (default: unset) - Allows overriding the default synthetic source behaviour for all field types with the following values: `none` (equivalent to unset) - no source is stored, `arrays` - source stored as is only for multi-value (array) fields.
* `synthetic_source_keep` (default: unset): If specified, configures the `index.mapping.synthetic_source_keep` index setting.
* `source_mode` (default: unset) - Specifies the source mode to be used.
* `use_synthetic_source_recovery` (default: unset): Whether synthetic recovery source will be used.
* `recovery_target_index` (required) - The target index for fetching shard changes via the recovery API.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,10 @@
{% if index_mode %}
"index": {
"mode": {{ index_mode | tojson }},
{% if synthetic_source_keep and synthetic_source_keep != 'none' %}
{% if use_synthetic_source_recovery %}
"recovery.use_synthetic_source": {{use_synthetic_source_recovery | tojson}},
{% endif %}
{% if synthetic_source_keep %}
"mapping.synthetic_source_keep": {{ synthetic_source_keep | tojson}},
{% endif %}
"sort.field": [ "host.id", "@timestamp" ],
Expand Down
52 changes: 52 additions & 0 deletions joins/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
## JOINS track

This track contains an artificial dataset intended to test JOIN operations with different key cardinalities.

The dataset can be generated using the scripts in the `_tools` directory.

### Example Documents

Main index:

```json
{
"id": 56,
"@timestamp": 946728056,
"key_1000": "56",
"key_100000": "56",
"key_200000": "56",
"key_500000": "56",
"key_1000000": "56",
"key_5000000": "56",
"key_100000000": "56",
"field_0": "text with value 0_56",
"field_1": "text with value 1_56",
"field_2": "text with value 2_56",
...
"field_99": "text with value 99_56"
}
```

The cardinality of the keys is the same as the key name, eg. `key_1000` will have 1000 different values in the dataset,
from `0` to `999` (unless the dataset is not big enough to contain all the keys of a given cardinality,
eg. with a dataset of 1000 documents, `key_100000000` will contain only 1000 distinct keys, one per document).
The IDs and the timestamps are sequential.

### Parameters

This track allows to overwrite the following parameters using `--track-params`:

* `bulk_size` (default: 10000)
* `bulk_indexing_clients` (default: 8): Number of clients that issue bulk indexing requests.
* `ingest_percentage` (default: 100): A number between 0 and 100 that defines how much of the document corpus should be ingested. It will be applied to the main index and to the large join indexes (ie. not to join indexes with up to 500K documents)
* `number_of_replicas` (default: 1): This only applies to the main index (not to lookup indexes)
* `number_of_shards` (default: 5): This only applies to the main index (not to lookup indexes)
* `source_mode` (default: stored): Should the `_source` be `stored` to disk exactly as sent (the default), thrown away (`disabled`), or reconstructed on the fly (`synthetic`)
* `index_settings`: A list of index settings. Index settings defined elsewhere need to be overridden explicitly.
* `cluster_health` (default: "green"): The minimum required cluster health.
* `include_non_serverless_index_settings` (default: true for non-serverless clusters, false for serverless clusters): Whether to include non-serverless index settings.


### License

According to the [Open Data Law](https://opendata.cityofnewyork.us/open-data-law/) this data is available as public domain.
Loading

0 comments on commit 790d1a5

Please sign in to comment.