-
Notifications
You must be signed in to change notification settings - Fork 16
Querying Data from Table Storage
To use the FSharp.Azure.Storage querying functions open the TableStorage
module:
open FSharp.Azure.Storage.Table
To provide an idiomatic F# experience when querying table storage, FSharp.Azure.Storage supports the use of record types when querying. For example, the following record type would be used to read a table with columns that match the field names:
type Game =
{ Name : string
Developer : string
HasMultiplayer : bool
Notes : string }
We will use this record type in the examples below. We will also assume, for the sake of these examples, that the Developer field is also used as the PartitionKey and the Name field is used as the RowKey. To understand how FSharp.Azure.Storage conceptually separates record fields away from the PK and RK table properties, please read Modifying Data in Table Storage.
FSharp.Azure.Storage also supports querying class types that implement the Microsoft.Azure.Cosmos.Table.ITableEntity
interface.
The easiest way to use the FSharp.Azure.Storage API is to define a quick helper function that allows you to query for rows from a particular table:
open Microsoft.Azure.Cosmos.Table
let account = CloudStorageAccount.Parse "UseDevelopmentStorage=true;" //Or your connection string here
let tableClient = account.CreateCloudTableClient()
let fromGameTable q = fromTable tableClient "Games" q
The fromGameTable
function fixes the tableClient
and table name parameters of the fromTable
function, so you don't have to keep passing them. This technique is very common when using the FSharp.Azure.Storage API.
Here's how we'd query for all rows in the "Games" table:
let games = Query.all<Game> |> fromGameTable
games
above is of type seq<Game * EntityMetadata>
. The EntityMetadata
type contains the Etag and Timestamp of each Game. Here's how you might work with that:
let gameRecords = games |> Seq.map fst
let etags = games |> Seq.map (fun game, metadata -> metadata.Etag)
The etags in particular are useful when updating those records in table storage, because they allow you to utilise Azure Table Storage's optimistic concurrency protection to ensure nothing else has changed the record since you queried for it. See Modifying Data in Table Storage for more information.
If you'd like the ETag and Timestamp values to be written to fields on your record type as well, you can add Etag
and Timestamp
attributes to the record fields like so:
type Game =
{ Name : string
Developer : string
HasMultiplayer : bool
Notes : string
[<Timestamp>] Modified : DateTimeOffset option
[<Etag>] Etag : string option }
Note that these fields will be ignored when you save back to table storage, as they are system-controlled properties.
The Query.where
function allows you to use an F# quotation of a lambda to specify what conditions you want to filter by. The lambda you specify must be of type:
'T -> SystemProperties -> bool
The SystemProperties
type allows you to construct filters against system properties such as the Partition Key and Row Key, which are the only two properties that are indexed by Table Storage, and therefore the ones over which you will most likely be performing filtering.
For example, this is how we'd get an individual record by PartitionKey and RowKey:
let halo4, metadata =
Query.all<Game>
|> Query.where <@ fun g s -> s.PartitionKey = "343 Industries" && s.RowKey = "Halo 4" @>
|> fromGameTable
|> Seq.head
You can, however, query over properties on your record type too. Be aware that queries over those properties are not indexed by Table Storage and as such will suffer performance penalties.
For example, if we wanted to find all multiplayer games made by Valve, we could write:
let multiplayerValveGames =
Query.all<Game>
|> Query.where <@ fun g s -> s.PartitionKey = "Valve" && g.HasMultiplayer @>
|> fromGameTable
The following operators/functions are supported for use inside the where lambda:
- The
=
,<>
,<
,<=
,>
,>=
operators - The
not
function
Table storage allows you to limit the query results to be only the first 'n' results it finds. Naturally, FSharp.Azure.Storage supports this.
Here's an example query that limits the results to the first 5 multiplayer games made by Valve:
let multiplayerValveGames =
Query.all<Game>
|> Query.where <@ fun g s -> s.PartitionKey = "Valve" && g.HasMultiplayer @>
|> Query.take 5
|> fromGameTable
Azure table storage may not return all the results that match the query in one go. Instead it may split the results over multiple segments, each of which must be queried for separately and sequentially. According to the documentation, table storage will start segmenting results if:
- The resultset contains more than 1000 items
- The query took longer than five seconds
- The query crossed a partition boundary
FSharp.Azure.Storage supports handling query segmentation manually as well as automatically. The fromTable
function we used in the previous examples returns a seq
that will automatically query for additional segments as you iterate.
If you want to handle segmentation manually, you can use the fromTableSegmented
function instead of fromTable
. First, define a helper function:
let fromGameTableSegmented c q = fromTableSegmented tableClient "Games" c q
The fromGameTableSegmented
function will have the type:
TableContinuationToken option -> EntityQuery<'T> -> List<'T * EntityMetadata> * TableContinuationToken option
This means it takes an optional continuation token and the query, and returns the list of results in that segment, and optionally the continuation token used to access the next segment, if any.
Here's an example that gets the first two segments of query results:
let query = Query.all<Game>
let games1, segmentToken1 =
query |> fromGameTableSegmented None //None means querying for the first segment (ie. no continuation)
//We're making the assumption segmentToken1 here is not None and therefore
//there is another segment to read. In practice, this is a very poor assumption
//to make, since segmentation is performed arbitrarily by table storage
if segmentToken1.IsNone then failwith "No segment 2!"
let games2, segmentToken2 =
query |> fromGameTableSegmented segmentToken1
In practice, you'd probably write a recursive function or a loop to iterate through the segments until a certain condition, or use the asynchronous sequences support (see below).
FSharp.Azure.Storage also supports asynchronous equivalents of fromTable
and fromTableSegmented
. To use them, you would first create your helper functions:
let fromGameTableAsync q = fromTableAsync tableClient "Games" q
let fromGameTableSegmentedAsync c q = fromTableSegmentedAsync tableClient "Games" c q
fromTableAsync
automatically and asynchronously makes requests for all the segments and returns all the results in a single seq
. Note that unlike fromTable
, all segments are queried for during the asynchronous operation, not during sequence iteration. (This is because seq
doesn't support asynchronous iteration.)
Here's an example of using fromTableAsync
:
let valveGames =
Query.all<Game>
|> Query.where <@ fun g s -> s.PartitionKey = "Valve" @>
|> fromGameTableAsync
|> Async.RunSynchronously
And finally, an example using the asynchronous segmentation variant:
let asyncOp = async {
let query = Query.all<Game>
let! games1, segmentToken1 =
query |> fromGameTableSegmentedAsync None //None means querying for the first segment (ie. no continuation)
//We're making the assumption segmentToken1 here is not None and therefore
//there is another segment to read. In practice, this is a very poor assumption
//to make, since segmentation is performed arbitrarily by table storage
if segmentToken1.IsNone then failwith "No segment 2!"
let! games2, segmentToken2 =
query |> fromGameTableSegmentedAsync segmentToken1
return games1 @ games2
}
let games = asyncOp |> Async.RunSynchronously
If you need to query a large number of rows and you don't want to materialise them all in memory simultaneously, you can stream through them by using the fromTableSegmentedAsyncSeq
function. This function returns an AsyncSeq
(from the FSharp.Control.AsyncSeq
library), where each item yielded by the sequence is a segment's worth of rows.
let fromGameTableSegmentedAsyncSeq q = fromTableSegmentedAsyncSeq tableClient "Games" q
let printAllGames = async {
do! Query.all<Game>
|> fromGameTableSegmentedAsyncSeq
|> AsyncSeq.iter (fun rows ->
for (game, metadata) in rows do
printfn "%A" game
)
}
For more information about what you can do with asynchronous sequences, see the FSharp.Control.AsyncSeq
documentation.