Add basic string interpolation / string templates #743

dvdvgt · 2024-12-12T11:50:46Z

Fixes #722 and adds basic support for string templates.

There are a few caveats to keep in mind with the current initial implementation:

All arguments already have to be strings. It is basically just syntactic sugar so that you can write "${a.show} and ${b.show}" instead of a.show ++ " and " ++ b.show where a and b are some 'showable' values. Internally, however, this is just how it is desugared.
String templates cannot be used to match string literals in pattern matches

The generated JS code looks a bit weird:

const v_r_897 = $effekt.println(((((((((("GET ") + (domain_0)))) + (((("/users/") + (user_0))))))) + (((("/resource/") + (('' + resourceId_0))))))));

If one forgets to call show on arguments which are not already of type string, the error messages are hideous and possibly confusing:

error

-There are multiple overloads, which all fail to check:
-Possible overload: effekt::println of type Bool => Unit
-  Expected String but got Int.
-  Expected Bool but got String.
-
-Possible overload: list::println of type List[Bool] => Unit
-  Expected String but got Int.
-  Expected List[Bool] but got String.
-
-Possible overload: array::println of type Array[Int] => Unit
-  Expected String but got Int.
-  Expected Array[Int] but got String.
-
-Possible overload: effekt::println of type String => Unit
-  Expected String but got Int.
-
-Possible overload: list::println of type List[Int] => Unit
-  Expected String but got Int.
-  Expected List[Int] but got String.
-
-Possible overload: effekt::println of type Byte => Unit
-  Expected String but got Int.
-  Expected Byte but got String.
-
-Possible overload: list::println of type List[String] => Unit
-  Expected String but got Int.
-  Expected List[String] but got String.
-
-Possible overload: list::println of type List[Double] => Unit
-  Expected String but got Int.
-  Expected List[Double] but got String.
-
-Possible overload: array::println of type Array[String] => Unit
-  Expected String but got Int.
-  Expected Array[String] but got String.
-
-Possible overload: effekt::println of type Ordering => Unit
-  Expected String but got Int.
-  Expected Ordering but got String.
-
-Possible overload: array::println of type Array[Bool] => Unit
-  Expected String but got Int.
-  Expected Array[Bool] but got String.
-
-Possible overload: effekt::println of type Unit => Unit
-  Expected String but got Int.
-  Expected Unit but got String.
-
-Possible overload: option::println of type Option[Int] => Unit
-  Expected String but got Int.
-  Expected Option[Int] but got String.
-
-Possible overload: option::println of type Option[Double] => Unit
-  Expected String but got Int.
-  Expected Option[Double] but got String.
-
-Possible overload: array::println of type Array[Double] => Unit
-  Expected String but got Int.
-  Expected Array[Double] but got String.
-
-Possible overload: effekt::println of type Double => Unit
-  Expected String but got Int.
-  Expected Double but got String.
-
-Possible overload: effekt::println of type Int => Unit
-  Expected String but got Int.
-  Expected Int but got String.
-
-Possible overload: option::println of type Option[Bool] => Unit
-  Expected String but got Int.
-  Expected Option[Bool] but got String.
-
-  println("GET ${domain}/users/${user}/resource/${resourceId}")

Apart from that, the changes are really lightweight because of the early desugaring in the parser.

I am looking forward to your feedback and ideas.

jiribenes · 2024-12-12T12:03:05Z

This is a great first implementation, but I'm not sure that desugaring in parser is the best choice here :/
We talked about it at a meeting when #722 was created, but I don't remember what we agreed on.

Personally, I'd much rather see templates up until the transformation to core so that we can:

have reasonable error messages in typer
later on resolve typeclasses or something for the splices
possibly have a more performant impl (?)

dvdvgt · 2024-12-12T12:16:48Z

Personally, I'd much rather see templates up until the transformation to core so that we can:

have reasonable error messages in typer

later on resolve typeclasses or something for the splices

possibly have a more performant impl (?)

You raise some very good concerns and I completely agree. I will try to go the 'non-lazy route' and I actually completely thread through templates to core.

jiribenes · 2024-12-12T12:25:55Z

Don't get me wrong, having a version is still super helpful; I truly have no idea how difficult it really is to do the thing I outlined above. But I hope it's worth it, especially wrt technical debt. :)

b-studios · 2024-12-12T15:33:33Z

I am not completely against desugaring in parser, but maybe we could actually desugar

"foo ${exp1} bar ${exp2} baz"

to something like:

splice(["foo ", " bar ", "baz"], [exp1, exp2])

this way, as a user I can still overload splice for types other than exp1 exp2: String :) Not that don't have enough overloaded functions already...

dvdvgt · 2024-12-12T16:05:05Z

I just pushed a prototype that threads through string templates all the way until code generation.

This has the added benefit that we remain somewhat flexible for later changes, as Jiří mentioned. Furthermore, we can re-use the string interpolation capabilities of each respective backend, e.g. currently the JS string templates are generated as template literals. As a trade-off, we need to add a TemplateStr AST node to Core, CPS and to each backend. As I have already heard, not everybody is a proponent of further expanding Core ;)

Note: I currently only fully implemented it for JS. If we come to an agreement that this is the right way to proceed, I will implement for the remaining backends.

(I already started working on this before Jonathan's remark and I'd rather discuss this implementation briefly before scrapping it right away :) )

jiribenes · 2024-12-12T16:23:21Z

I am not completely against desugaring in parser, but maybe we could actually desugar
"foo ${exp1} bar ${exp2} baz"
to something like:
splice(["foo ", " bar ", "baz"], [exp1, exp2])
this way, as a user I can still overload splice for types other than exp1 exp2: String :) Not that don't have enough overloaded functions already...

This relates to some strategy we'll need to come up with in order to represent custom string interpolators (which is very outside of the scope of this PR, I believe...)

Then I'd expect some desugaring like, for example:

val name = "Joe"
val age = 42

////

val s: SQL::Stmt = 
  sql"SELECT * FROM Person p WHERE p.last_name = ${name} AND p.age = ${age}"

// ~>

val s: SQL::Stmt = sql { // sql: { prog: () => Unit / { Splice[String], Splice[Int] } }: SQL::Stmt
  do literal("SELECT * FROM Person p WHERE p.last_name = ")
  do splice(name)
  do literal(" AND p.age = ")
  do splice(age)
}

dvdvgt · 2025-01-15T16:05:21Z

There are three things keeping this from merging:

The test fails on LLVM with error code 139 (segmentation fault)
For some reason the test now also fails on JS even though the output is apparently the same as the expected. Maybe some weird non-displayable characters are inserted by accident?
I would like to include the splice module in the prelude however, I am prompted with Effekt Compiler Crash: null

jiribenes · 2025-01-15T16:09:48Z

I would like to include the splice module in the prelude however, I am prompted with Effekt Compiler Crash: null

I'll merge #780 ASAP, then it should at least tell you what's wrong :)
(You can also review it and merge it ;D)

Resolves the `Effect Compiler Crash: null` part of #760. The actual problem there was that `e.getMessage` can return `null` and it does for... _StackOverflowException_ 🥳. Looking at the docs, it's recommended to use `e.toString`. While I was at it, I modified stack trace printing, now we do it properly by using `.printStackTrace`. This is very related to #731, I just want to get this out quickly to unblock other problems.

jiribenes · 2025-01-17T12:14:07Z

libraries/common/stringbuffer.effekt

+      // resize buffer to strip trailing zeros that otherwise would be converted into 0x00 characters
+      buffer = bytearray::resize(buffer, pos)
      val str = buffer.toString()
+      // after flushing, the stringbuffer should be empty again
      buffer = bytearray::allocate(initialCapacity)


I think this could be made somewhat more efficient -- resize copies into a newly allocated bytearray, but toString doesn't (since on LLVM strings are bytearrays)
So it should be possible to just clear the buffer without allocating it again?

Suggested change

// resize buffer to strip trailing zeros that otherwise would be converted into 0x00 characters

buffer = bytearray::resize(buffer, pos)

val str = buffer.toString()

// after flushing, the stringbuffer should be empty again

buffer = bytearray::allocate(initialCapacity)

// resize buffer to strip trailing zeros that otherwise would be converted into 0x00 characters

val str = bytearray::resize(buffer, pos).toString()

// after flushing, the stringbuffer should be empty again

buffer = buffer.fill(0)

where buffer.fill(0) means memset(buffer, 0) (which we don't have yet) :)

Yes, you could just clear the buffer, however, then the buffer's size will not be reset to a (potentially) smaller size and instead just buffer's previous size is kept. This probably would not be a big deal, though.

libraries/common/stringbuffer.effekt

Closes #783. And closes #778. I believe the way we currently handle size calculations for stack pushes/pops of non-structural types has some flaws for non-pointer values. Some changes in how we use the GEP instruction will probably fix this, but I find its syntax really confusing. I have a proof-of-concept alternate fix at the end. To bypass the calculation issues, we can also just change the size of a byte to 8 byte instead. This *hotfix* fixes the segfault in the original issue as well as all of my minified versions. Some additional explanations of the issue using an example program: ```scala def foo { action: Byte => Unit }: Unit = action(42.toByte) def main() = { foo { b => // does not segfault when printlns are swapped println(42) println(b) } } ``` <details> <summary>Pseudo-flow with relevant (for the segfault) lines marked</summary> ``` main(stack) main(stack): alloc(3 * 8 = 24) push returnAddress_1 push sharer_5 push eraser_7 blockLit = vtable[blockLit_3289_clause_13] foo(blockLit, stack) foo(blockLit, stack): alloc(3 * 8 = 24) push returnAddress_40 push sharer_5 push eraser_7 literal = 42 byte = toByte(literal) vtable = blockLit[0] closure = blockLit[1] fp = *vtable // = blockLit_3289_clause_13 fp(closure, byte, stack) blockLit_3289_clause_13(closure, byte, stack): alloc(3 * 8 + 1 = 25) // <--------------------------- push byte push returnAddress_14 push sharer_26 push eraser_30 literal = 42 printlnInt(literal, stack) printlnInt(value, stack): // _4 str = show(value) unit = println(str) dealloc(3 * 8 = 24) pop _ pop _ pop returnAddress // _14 returnAddress(unit, stack) // <--------------------------- returnAddress_14(ret, stack): dealloc(1) pop byte erase(ret) alloc(3 * 8 = 24) push returnAddress_17 push sharer_5 push eraser_7 printlnByte(byte, stack) printlnByte(byte, stack): str = show(byte) unit = println(str) dealloc(24) pop _ pop _ pop returnAddress // _17 returnAddress(unit, stack) returnAddress_17(unit, stack): dealloc(24) pop _ pop _ pop returnAddress // _40 returnAddress(unit, stack) returnAddress_40(unit, stack) dealloc(24) pop _ pop _ pop returnAddress // _1 returnAddress(unit, stack) returnAddress_1(unit, stack): dealloc(24) pop _ pop _ pop returnAddress returnAddress(unit, stack) ``` </details> The corresponding LLVM code of the relevant sections: ```llvm define tailcc void @blockLit_3289_clause_13(%Object %closure, i8 %b_2365, %Stack %stack) { entry: ; new blockLit_3289_clause_13, 1 parameters %stackPointer_33 = call ccc %StackPointer @stackAllocate(%Stack %stack, i64 25) ; <--------------- %b_2365_pointer_34 = getelementptr {i8}, %StackPointer %stackPointer_33, i64 0, i32 0 store i8 %b_2365, ptr %b_2365_pointer_34, !noalias !2 %returnAddress_pointer_35 = getelementptr {{i8}, %FrameHeader}, %StackPointer %stackPointer_33, i64 0, i32 1, i32 0 %sharer_pointer_36 = getelementptr {{i8}, %FrameHeader}, %StackPointer %stackPointer_33, i64 0, i32 1, i32 1 %eraser_pointer_37 = getelementptr {{i8}, %FrameHeader}, %StackPointer %stackPointer_33, i64 0, i32 1, i32 2 store ptr @returnAddress_14, ptr %returnAddress_pointer_35, !noalias !2 store ptr @sharer_26, ptr %sharer_pointer_36, !noalias !2 store ptr @eraser_30, ptr %eraser_pointer_37, !noalias !2 ; literalInt longLiteral_3291, n=42 %longLiteral_3291 = add i64 42, 0 ; substitution ; substitution [value_3 !-> longLiteral_3291] ; jump println_4 musttail call tailcc void @println_4(i64 %longLiteral_3291, %Stack %stack) ret void } ... define tailcc void @println_4(i64 %value_3, %Stack %stack) { entry: ; definition println_4, environment length 1 ; foreignCall pureApp_3284 : String(), foreign show_14, 1 values %pureApp_3284 = call ccc %Pos @show_14(i64 %value_3) ; foreignCall pureApp_3283 : Positive(), foreign println_1, 1 values %pureApp_3283 = call ccc %Pos @println_1(%Pos %pureApp_3284) ; substitution ; substitution [v_r_3164_3278 !-> pureApp_3283] ; return, 1 values %stackPointer_56 = call ccc %StackPointer @stackDeallocate(%Stack %stack, i64 24) %returnAddress_pointer_57 = getelementptr %FrameHeader, %StackPointer %stackPointer_56, i64 0, i32 0 %returnAddress_55 = load %ReturnAddress, ptr %returnAddress_pointer_57, !noalias !2 musttail call tailcc void %returnAddress_55(%Pos %pureApp_3283, %Stack %stack) ; <------------------------- ; here is the segfault, since returnAddress_55 is missing one byte! ret void } ``` --- A fix I came up with that does not involve making the size of a byte large: ``` llvm define tailcc void @blockLit_3289_clause_13(%Object %closure, %Byte %b_2365, %Stack %stack) { entry: ; new blockLit_3289_clause_13, 1 parameters ; SPLIT the allocation into two segments <----------------------- %stackPointar_33 = call ccc %StackPointer @stackAllocate(%Stack %stack, i64 1) %b_2365_pointer_34 = getelementptr %Byte, %StackPointer %stackPointar_33, i64 0 store %Byte %b_2365, ptr %b_2365_pointer_34, !noalias !2 %stackPointer_33 = call ccc %StackPointer @stackAllocate(%Stack %stack, i64 24) %returnAddress_pointer_35 = getelementptr %FrameHeader, %StackPointer %stackPointer_33, i64 0, i32 0 %sharer_pointer_36 = getelementptr %FrameHeader, %StackPointer %stackPointer_33, i64 0, i32 1 %eraser_pointer_37 = getelementptr %FrameHeader, %StackPointer %stackPointer_33, i64 0, i32 2 store ptr @returnAddress_14, ptr %returnAddress_pointer_35, !noalias !2 store ptr @sharer_26, ptr %sharer_pointer_36, !noalias !2 store ptr @eraser_30, ptr %eraser_pointer_37, !noalias !2 ; literalInt longLiteral_3291, n=42 %longLiteral_3291 = add i64 42, 0 ; substitution ; substitution [value_3 !-> longLiteral_3291] ; jump println_4 musttail call tailcc void @println_4(i64 %longLiteral_3291, %Stack %stack) ret void } ... ``` --------- Co-authored-by: Philipp Schuster <[email protected]>

b-studios · 2025-01-21T16:47:07Z

Regarding the prelude, we have an undocumented invariant: all imports of prelude modules have to be in the prelude as well...

That is, the prelude in the JS runner needs to look like this:

override def prelude: List[String] = List("effekt", "option", "list", "result", "exception", "array", "string", "ref", "bytearray", "stringbuffer", "splice")

and bytearray, and stringbuffer also need to import effekt and dependencies that they need from the prelude manually.

That is, the following changes would be necessary (+ LLVM Runner)

diff --git a/effekt/jvm/src/main/scala/effekt/Runner.scala b/effekt/jvm/src/main/scala/effekt/Runner.scala
index ab26700e..6dbaac0b 100644
--- a/effekt/jvm/src/main/scala/effekt/Runner.scala
+++ b/effekt/jvm/src/main/scala/effekt/Runner.scala
@@ -151,7 +151,7 @@ object JSNodeRunner extends Runner[String] {
 
   override def includes(path: File): List[File] = List(path / ".." / "js")
 
-  override def prelude: List[String] = List("effekt", "option", "list", "result", "exception", "array", "string", "ref")
+  override def prelude: List[String] = List("effekt", "option", "list", "result", "exception", "array", "string", "ref", "bytearray", "stringbuffer", "splice")
 
   def checkSetup(): Either[String, Unit] =
     if canRunExecutable("node", "--version") then Right(())
diff --git a/examples/pos/string_interpolation.effekt b/examples/pos/string_interpolation.effekt
index 60aeb0b0..54bf9a20 100644
--- a/examples/pos/string_interpolation.effekt
+++ b/examples/pos/string_interpolation.effekt
@@ -1,6 +1,3 @@
-import splice
-import stringbuffer
-
 type Expr {
   Var(id: String)
   Abs(param: String, body: Expr)
@@ -38,4 +35,3 @@ def main() = {
   val fixpoint = Abs("f", App(Abs("x", App(Var("f"), App(Var("x"), Var("x")))), Abs("x", App(Var("f"), App(Var("x"), Var("x"))))))
   println(pretty"Fix point combinator: ${fixpoint}")
 }
-
diff --git a/libraries/common/bytearray.effekt b/libraries/common/bytearray.effekt
index d7552919..d7232e44 100644
--- a/libraries/common/bytearray.effekt
+++ b/libraries/common/bytearray.effekt
@@ -1,5 +1,7 @@
 module bytearray
 
+import effekt
+
 /**
  * A memory managed, mutable, fixed-length array of bytes.
  */
diff --git a/libraries/common/splice.effekt b/libraries/common/splice.effekt
index 9ba74d15..7de68c0d 100644
--- a/libraries/common/splice.effekt
+++ b/libraries/common/splice.effekt
@@ -1,5 +1,7 @@
 module splice
 
+import effekt
+
 import stringbuffer
 
 effect literal(s: String): Unit
diff --git a/libraries/common/stringbuffer.effekt b/libraries/common/stringbuffer.effekt
index 3b6d3acb..bfa57e62 100644
--- a/libraries/common/stringbuffer.effekt
+++ b/libraries/common/stringbuffer.effekt
@@ -1,5 +1,6 @@
 module stringbuffer
 
+import effekt
 import bytearray
 
 interface StringBuffer {

dvdvgt added 2 commits December 12, 2024 12:37

first draft

d4661c3

add tests

8135ab6

add rudimentary desugaring in parser

3526cea

dvdvgt force-pushed the feature/string-interpolation branch 2 times, most recently from 187d5b0 to 3ff4df3 Compare January 9, 2025 21:43

add splice module to prelude

9ba3039

dvdvgt force-pushed the feature/string-interpolation branch from 3ff4df3 to 9ba3039 Compare January 9, 2025 21:45

dvdvgt added 2 commits January 9, 2025 22:45

splice handler and stringbuffer

2353267

allow s"..." syntax

497b538

dvdvgt force-pushed the feature/string-interpolation branch from 639f835 to 497b538 Compare January 10, 2025 18:35

dvdvgt added 4 commits January 13, 2025 13:05

ignore new test in chez backend

77d0e7f

resolve error in WebTests

b52629a

rework buffer's growth heuristic

1017d8a

rewrite to impl stringbuffer as a handler

fe5eddf

dvdvgt force-pushed the feature/string-interpolation branch from 3bef638 to 5ac9604 Compare January 15, 2025 15:47

add second test case using custom interpolation impl

e4f24a6

dvdvgt force-pushed the feature/string-interpolation branch from 5ac9604 to e4f24a6 Compare January 15, 2025 16:01

jiribenes and others added 2 commits January 17, 2025 10:50

fix flushing so that no 0x00 chars appear

cc4ca22

jiribenes reviewed Jan 17, 2025

View reviewed changes

libraries/common/stringbuffer.effekt Outdated Show resolved Hide resolved

dvdvgt added the blocked Blocked on other issues / PRs label Jan 17, 2025

fix unneeded allocation

2e36f3d

dvdvgt mentioned this pull request Jan 17, 2025

Segmentation fault (LLVM) using bytearrays #783

Closed

marvinborner and others added 2 commits January 21, 2025 16:58

Merge branch 'master' into feature/string-interpolation

e7c16c7

dvdvgt removed the blocked Blocked on other issues / PRs label Jan 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add basic string interpolation / string templates #743

Add basic string interpolation / string templates #743

dvdvgt commented Dec 12, 2024

jiribenes commented Dec 12, 2024

dvdvgt commented Dec 12, 2024

jiribenes commented Dec 12, 2024

b-studios commented Dec 12, 2024 •

edited

Loading

dvdvgt commented Dec 12, 2024 •

edited

Loading

jiribenes commented Dec 12, 2024 •

edited

Loading

dvdvgt commented Jan 15, 2025 •

edited

Loading

jiribenes commented Jan 15, 2025 •

edited

Loading

jiribenes Jan 17, 2025

dvdvgt Jan 17, 2025

b-studios commented Jan 21, 2025 •

edited

Loading

Add basic string interpolation / string templates #743

Are you sure you want to change the base?

Add basic string interpolation / string templates #743

Conversation

dvdvgt commented Dec 12, 2024

jiribenes commented Dec 12, 2024

dvdvgt commented Dec 12, 2024

jiribenes commented Dec 12, 2024

b-studios commented Dec 12, 2024 • edited Loading

dvdvgt commented Dec 12, 2024 • edited Loading

jiribenes commented Dec 12, 2024 • edited Loading

dvdvgt commented Jan 15, 2025 • edited Loading

jiribenes commented Jan 15, 2025 • edited Loading

jiribenes Jan 17, 2025

Choose a reason for hiding this comment

dvdvgt Jan 17, 2025

Choose a reason for hiding this comment

b-studios commented Jan 21, 2025 • edited Loading

b-studios commented Dec 12, 2024 •

edited

Loading

dvdvgt commented Dec 12, 2024 •

edited

Loading

jiribenes commented Dec 12, 2024 •

edited

Loading

dvdvgt commented Jan 15, 2025 •

edited

Loading

jiribenes commented Jan 15, 2025 •

edited

Loading

b-studios commented Jan 21, 2025 •

edited

Loading