Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ppl project table command #936

Open
wants to merge 22 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
6aa2a21
add project table command to allow materializing queries into a concr…
YANG-DB Nov 20, 2024
95e9504
update tests & command spec changes
YANG-DB Nov 25, 2024
b8e02fc
update tests & command spec changes
YANG-DB Nov 26, 2024
11900d9
Merge branch 'main' into ppl-projection-command
YANG-DB Nov 26, 2024
7d643ae
update tests & simplify grammar
YANG-DB Nov 26, 2024
73b4a05
add support for options table spec
YANG-DB Nov 26, 2024
6de1a20
add support for location table spec
YANG-DB Nov 26, 2024
36038aa
update documentation & examples
YANG-DB Nov 26, 2024
9a84325
update tests with projected join query
YANG-DB Nov 26, 2024
726ae24
update tests with projected join query
YANG-DB Nov 26, 2024
10eb8a1
update tests with projected partitioning verification of correctness
YANG-DB Nov 27, 2024
066eaca
Merge branch 'main' into ppl-projection-command
YANG-DB Dec 2, 2024
550a238
Merge branch 'main' into ppl-projection-command
YANG-DB Dec 4, 2024
a061844
Merge branch 'main' into ppl-projection-command
YANG-DB Dec 7, 2024
a90f9b1
Merge branch 'main' into ppl-projection-command
YANG-DB Dec 9, 2024
c9c5b14
Merge branch 'main' into ppl-projection-command
YANG-DB Dec 12, 2024
bfbb555
update syntax from `project` to `view`
YANG-DB Dec 13, 2024
0d55aa8
update syntax from `project` to `view`
YANG-DB Dec 13, 2024
e6bb6b2
Merge branch 'main' into ppl-projection-command
YANG-DB Dec 16, 2024
2337358
Merge branch 'main' into ppl-projection-command
YANG-DB Dec 21, 2024
bf90692
Merge branch 'main' into ppl-projection-command
YANG-DB Jan 11, 2025
85ed116
Merge branch 'main' into ppl-projection-command
YANG-DB Jan 15, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions docs/ppl-lang/PPL-Example-Commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -282,6 +282,23 @@ source = table | where ispresent(a) |
- `source=accounts | parse address '(?<streetNumber>\d+) (?<street>.+)' | eval streetNumberInt = cast(streetNumber as integer) | where streetNumberInt > 500 | sort streetNumberInt | fields streetNumber, street`
- Limitation: [see limitations](ppl-parse-command.md#limitations)

#### **view**
[See additional command details](ppl-view-command.md)

```sql
view newTableName using csv | source = table | where fieldA > value | stats count(fieldA) by fieldB

view ageDistribByCountry using parquet OPTIONS('parquet.bloom.filter.enabled'='true', 'parquet.bloom.filter.enabled#age'='false') partitioned by (age, country) |
source = table | stats avg(age) as avg_city_age by country, state, city | eval new_avg_city_age = avg_city_age - 1 | stats
avg(new_avg_city_age) as avg_state_age by country, state | where avg_state_age > 18 | stats avg(avg_state_age) as
avg_adult_country_age by country

view ageDistribByCountry using parquet OPTIONS('parquet.bloom.filter.enabled'='true', 'parquet.bloom.filter.enabled#age'='false') partitioned by (age, country) location 's://demo-app/my-bucket'|
source = table | stats avg(age) as avg_city_age by country, state, city | eval new_avg_city_age = avg_city_age - 1 | stats
avg(new_avg_city_age) as avg_state_age by country, state | where avg_state_age > 18 | stats avg(avg_state_age) as
avg_adult_country_age by country
```

#### **Grok**
[See additional command details](ppl-grok-command.md)

Expand Down
2 changes: 2 additions & 0 deletions docs/ppl-lang/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,8 @@ For additional examples see the next [documentation](PPL-Example-Commands.md).
- [`grok command`](ppl-grok-command.md)

- [`parse command`](ppl-parse-command.md)
-
- [`view command`](ppl-view-command.md)

- [`patterns command`](ppl-patterns-command.md)

Expand Down
88 changes: 88 additions & 0 deletions docs/ppl-lang/ppl-view-command.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
## PPL `view` command

### Description
Using `view` command to materialize a query into a dedicated view:
In some cases it is required to construct a view (materialized into a view) of the query results.
This view can be later used as a source of continued queries for further slicing and dicing the data, in addition such tables can be also saved into a MV table that are pushed into OpenSearch and can be used for visualization and enhanced performant queries.

The command can also function as an ETL process where the original datasource will be transformed and ingested into the output view using the ppl transformation and aggregation operators

**### Syntax
`VIEW (IF NOT EXISTS)? viewName (USING datasource)? (OPTIONS optionsList)? (PARTITIONED BY partitionColumnNames)? location?`

- **viewName**
Specifies a view name, which may be optionally qualified with a database name.

- **USING datasource**
Data Source is the input format used to create the table. Data source can be CSV, TXT, ORC, JDBC, PARQUET, etc.

- **OPTIONS optionsList**
Specifies a set of key-value pairs used to configure the data source. These options vary depending on the chosen data source and may include properties such as file paths, authentication details, format-specific parameters, etc.

- **PARTITIONED BY**
Specifies the columns on which the data should be partitioned. Partitioning splits the data into separate logical divisions based on distinct values of the specified column(s), which can optimize query performance.

- **location**
Specifies the physical location where the view or table data is stored. This could be a path in a distributed file system like HDFS, S3 Object storage or a local filesystem.

- **QUERY****
The outcome view (viewName) is populated using the data from the select statement.

### Usage Guidelines
The view command produces a view based on the resulting rows returned from the query.
Any query can be used in the `AS <query>` statement and attention must be used to the volume and compute that may incur due to such queries.

As a precautions an `explain cost | source = table | ... ` can be run prior to the `view` statement to have a better estimation.

### Examples:
```sql
view newTableName using csv | source = table | where fieldA > value | stats count(fieldA) by fieldB

view ipRanges using parquet | source = table | where isV6 = true | eval inRange = case(cidrmatch(ipAddress, '2003:db8::/32'), 'in' else 'out') | fields ip, inRange

view avgBridgesByCountry using json | source = table | fields country, bridges | flatten bridges | fields country, length | stats avg(length) as avg by country

view ageDistribByCountry using parquet partitioned by (age, country) |
source = table | stats avg(age) as avg_city_age by country, state, city | eval new_avg_city_age = avg_city_age - 1 | stats
avg(new_avg_city_age) as avg_state_age by country, state | where avg_state_age > 18 | stats avg(avg_state_age) as
avg_adult_country_age by country

view ageDistribByCountry using parquet OPTIONS('parquet.bloom.filter.enabled'='true', 'parquet.bloom.filter.enabled#age'='false') partitioned by (age, country) |
source = table | stats avg(age) as avg_city_age by country, state, city | eval new_avg_city_age = avg_city_age - 1 | stats
avg(new_avg_city_age) as avg_state_age by country, state | where avg_state_age > 18 | stats avg(avg_state_age) as
avg_adult_country_age by country

view ageDistribByCountry using parquet OPTIONS('parquet.bloom.filter.enabled'='true', 'parquet.bloom.filter.enabled#age'='false') partitioned by (age, country) location 's://demo-app/my-bucket'|
source = table | stats avg(age) as avg_city_age by country, state, city | eval new_avg_city_age = avg_city_age - 1 | stats
avg(new_avg_city_age) as avg_state_age by country, state | where avg_state_age > 18 | stats avg(avg_state_age) as
avg_adult_country_age by country

```

### Effective SQL push-down query
The view command is translated into an equivalent SQL `create table <viewName> [Using <datasuorce>] As <statement>` as shown here:

```sql
CREATE TABLE [ IF NOT EXISTS ] table_identifier
[ ( col_name1 col_type1 [ COMMENT col_comment1 ], ... ) ]
USING data_source
[ OPTIONS ( key1=val1, key2=val2, ... ) ]
[ PARTITIONED BY ( col_name1, col_name2, ... ) ]
[ CLUSTERED BY ( col_name3, col_name4, ... )
[ SORTED BY ( col_name [ ASC | DESC ], ... ) ]
INTO num_buckets BUCKETS ]
[ LOCATION path ]
[ COMMENT table_comment ]
[ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ]
[ AS select_statement ]
```


```sql
SELECT customer exploded_productId
FROM table
LATERAL VIEW explode(productId) AS exploded_productId
```

### References
- https://spark.apache.org/docs/3.5.3/sql-ref-syntax-ddl-create-table-datasource.html
Loading
Loading