The text types are those types in addition to string
which support text/string manipulation.
Following the principle of UTF-8 Everywhere, strings are encoded in UTF-8 by default in memory and all other places. The encodings classes must be used to read and write in other encodings.
To optimally support cross platform development, newlines are represented in code as simply \n
.
Libraries should convert them when needed for output (for example to the console). Text reading
libraries should support the new lines types. The default is Mixed
which can accept a mix of any
of the standard newline forms and converts them to \n
. When writing text, libraries should support
the new line types. The default is line feed \n
, but other types can be selected, including
native.
The standard string type. This encodes strings in UTF-8 and supports string literal values. It is
designed to ensure that developers do not do things that do not work for unicode strings. As such,
it exposes iterators of grapheme clusters, scalar_value
s and byte
s. It does not allow slicing by
index, but instead encourages properly slicing a string at textually appropriate positions.
The string
type is a constant copy struct. Internally, it contains a byte length and a reference
to the UTF8 bytes. This allows a string slice to reference a subsection of a larger string.
The String_Builder
type provides an efficient way to build up and work with mutable strings that
can then be converted to string
. String_Builder
should be internally represented as either a
growing Raw_Bounded_List[byte]
or maybe as a Raw_Bounded_List[Raw_Bounded_List[byte]]
that
stores multiple string chunks. C# uses a linked list stored in reverse order. That is the end of the
string is in the first node and as you traverse the nodes you move toward the beginning. However,
this does not provided constant time access to all the data of the String_Builder
.
A 32-bit type used for a Unicode Code Point. Code points
support user literals of a single character such as "'c'
" or "'♠'
". As defined in the Unicode
standard, a code point is any value in the Unicode codespace. That is, the range of integers from 0
to 0x10FFFF.
A 32-bit type used for a Unicode Scalar Value.
Scalar values support user literals of a single character such as "'c'
" or "'♠'
". As defined in
the Unicode standard, a scalar value is any Unicode code point except high-surrogate and
low-surrogate code points. In other words, the ranges of integers 0 to 0xD7FF and 0xE000 to 0x10FFFF
inclusive.