Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strings option 5 - Use branded strings with extended prototype #7

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -164,15 +164,30 @@ Whilst we still can't accept string literals on their own, the tagged template i

Having `bytes` and `str` behave like a primitive value type (value equality) whilst not _actually_ being a primitive is not strictly semantically compatible with EcmaScript however the lowercase type names (plus factory with no `new` keyword) communicates the intention of it being a primitive value type and there is an existing precedence of introducing new value types to the language in a similar pattern (`bigint` and `BigInt`). Essentially - if EcmaScript were to have a primitive bytes type, this is most likely what it would look like.


### Option 5 - Use branded strings with extended prototype

Option 2 has the developer experience that will be the most familiar to developers (coming from TypeScript or TEALScript), but suffers from semantic incompatability. In paticular, index-based functions would not work as expected (or be very expensive to implement) because EcmaScript indexes strings by characters, not bytes.

For example, `'á'[0]` would return `'á'` in EcmaScript, but would return `0xC3` in TEALScript because it gets the first byte (and this character is a two byte sequence).

To solve this, we could extend the prototype of `string` to have byte-specific functions. For example, `.getByte(i)` instead of `[i]` and `.sliceBytes(i)` instead of `.slice(i)`. If a developer tries to use the character-based functions, the compiler can throw an error. We can also show an error in the IDE via TypeScript plugins.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To solve this, we could extend the prototype of `string` to have byte-specific functions. For example, `.getByte(i)` instead of `[i]` and `.sliceBytes(i)` instead of `.slice(i)`. If a developer tries to use the character-based functions, the compiler can throw an error. We can also show an error in the IDE via TypeScript plugins.
To work around this, we could extend the prototype of `string` to have byte-specific functions. For example, `.getByte(i)` instead of `[i]` and `.sliceBytes(i)` instead of `.slice(i)`. If a developer tries to use the character-based functions, the compiler can throw an error. We can also show an error in the IDE via TypeScript plugins.


If the AVM were to ever support character-based operations, we could enable the character-based functions.

The main downside of this approach is "extra" methods in the `string` prototype that are not applicable to the AVM. This, however, is currently how TEALScript functions with many native types and it has not been a problem for developers (provided the error is clear). As mentioned, this can also be solved at the IDE level via TypeScript plugins.

## Preferred option

Option 3 can be excluded because the requirement for a `new` keyword feels unnatural for representing a primitive value type.

Option 1 and 2 are not preferred as they make maintaining semantic compatability with EcmaScript impractical.

Option 5 offers the most familiar developer experience at the expensive of extra methods in the prototype.

Option 4 gives us the most natural feeling api whilst still giving us full control over the api surface. It doesn't support the `+` operator, but supports interpolation and `.concat` which gives us most of what `+` provides other than augmented assignment (ie. `+=`).

We should select an appropriate name for the type representing an AVM string. It should not conflict with the semantically incompatible EcmaScript type `string`.
Option 4 would also require us to select an appropriate name for the type representing an AVM string. It should not conflict with the semantically incompatible EcmaScript type `string`.
- `str`/`Str`:
- ✅ Short
- ✅ obvious what it is
Expand All @@ -189,8 +204,8 @@ We should select an appropriate name for the type representing an AVM string. It
- ✅ very obvious what it is
- ✅ obvious how it differs to `string`


Option 5 would be the preferred option if we were to prioritize developer experience whereas option 4 would be best if we priotized control over the prototype.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prioritize developer experience

This feels a little subjective. What aspects of the developer experience are prioritized by this option? The obvious one I can see is saving a couple of characters in declaration, but at the expense of having to explicitly type variables

const demo: bytes = "my bytes"
vs.
const demo = Str`my bytes`

someFunc("my bytes")
vs.
someFunc(Str`my bytes`)

Are there any others?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "a couple of characters in declaration" is a bit reductive of the impact it has on the developer experience. When developers use strings they expect to be able to put literals between quotation marks. Adding any friction to that can be a bit jarring. We saw this with PyTeal Bytes constructor frequently.

That being said, I agree that "developer experience" is too subjective, so I've changed it to familiarity


## Selected option

Option 4 has been selected as the best option
TBD