Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic string interpolation / string templates #743

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

dvdvgt
Copy link
Collaborator

@dvdvgt dvdvgt commented Dec 12, 2024

Fixes #722 and adds basic support for string templates.

There are a few caveats to keep in mind with the current initial implementation:

  • All arguments already have to be strings. It is basically just syntactic sugar so that you can write "${a.show} and ${b.show}" instead of a.show ++ " and " ++ b.show where a and b are some 'showable' values. Internally, however, this is just how it is desugared.
  • String templates cannot be used to match string literals in pattern matches
  • The generated JS code looks a bit weird:
    const v_r_897 = $effekt.println(((((((((("GET ") + (domain_0)))) + (((("/users/") + (user_0))))))) + (((("/resource/") + (('' + resourceId_0))))))));
    
  • If one forgets to call show on arguments which are not already of type string, the error messages are hideous and possibly confusing:
error
-There are multiple overloads, which all fail to check:
-Possible overload: effekt::println of type Bool => Unit
-  Expected String but got Int.
-  Expected Bool but got String.
-
-Possible overload: list::println of type List[Bool] => Unit
-  Expected String but got Int.
-  Expected List[Bool] but got String.
-
-Possible overload: array::println of type Array[Int] => Unit
-  Expected String but got Int.
-  Expected Array[Int] but got String.
-
-Possible overload: effekt::println of type String => Unit
-  Expected String but got Int.
-
-Possible overload: list::println of type List[Int] => Unit
-  Expected String but got Int.
-  Expected List[Int] but got String.
-
-Possible overload: effekt::println of type Byte => Unit
-  Expected String but got Int.
-  Expected Byte but got String.
-
-Possible overload: list::println of type List[String] => Unit
-  Expected String but got Int.
-  Expected List[String] but got String.
-
-Possible overload: list::println of type List[Double] => Unit
-  Expected String but got Int.
-  Expected List[Double] but got String.
-
-Possible overload: array::println of type Array[String] => Unit
-  Expected String but got Int.
-  Expected Array[String] but got String.
-
-Possible overload: effekt::println of type Ordering => Unit
-  Expected String but got Int.
-  Expected Ordering but got String.
-
-Possible overload: array::println of type Array[Bool] => Unit
-  Expected String but got Int.
-  Expected Array[Bool] but got String.
-
-Possible overload: effekt::println of type Unit => Unit
-  Expected String but got Int.
-  Expected Unit but got String.
-
-Possible overload: option::println of type Option[Int] => Unit
-  Expected String but got Int.
-  Expected Option[Int] but got String.
-
-Possible overload: option::println of type Option[Double] => Unit
-  Expected String but got Int.
-  Expected Option[Double] but got String.
-
-Possible overload: array::println of type Array[Double] => Unit
-  Expected String but got Int.
-  Expected Array[Double] but got String.
-
-Possible overload: effekt::println of type Double => Unit
-  Expected String but got Int.
-  Expected Double but got String.
-
-Possible overload: effekt::println of type Int => Unit
-  Expected String but got Int.
-  Expected Int but got String.
-
-Possible overload: option::println of type Option[Bool] => Unit
-  Expected String but got Int.
-  Expected Option[Bool] but got String.
-
-  println("GET ${domain}/users/${user}/resource/${resourceId}")

Apart from that, the changes are really lightweight because of the early desugaring in the parser.

I am looking forward to your feedback and ideas.

@jiribenes
Copy link
Contributor

This is a great first implementation, but I'm not sure that desugaring in parser is the best choice here :/
We talked about it at a meeting when #722 was created, but I don't remember what we agreed on.

Personally, I'd much rather see templates up until the transformation to core so that we can:

  • have reasonable error messages in typer
  • later on resolve typeclasses or something for the splices
  • possibly have a more performant impl (?)

@dvdvgt
Copy link
Collaborator Author

dvdvgt commented Dec 12, 2024

Personally, I'd much rather see templates up until the transformation to core so that we can:

  • have reasonable error messages in typer
  • later on resolve typeclasses or something for the splices
  • possibly have a more performant impl (?)

You raise some very good concerns and I completely agree. I will try to go the 'non-lazy route' and I actually completely thread through templates to core.

@jiribenes
Copy link
Contributor

Don't get me wrong, having a version is still super helpful; I truly have no idea how difficult it really is to do the thing I outlined above. But I hope it's worth it, especially wrt technical debt. :)

@b-studios
Copy link
Collaborator

b-studios commented Dec 12, 2024

I am not completely against desugaring in parser, but maybe we could actually desugar

"foo ${exp1} bar ${exp2} baz"

to something like:

splice(["foo ", " bar ", "baz"], [exp1, exp2])

this way, as a user I can still overload splice for types other than exp1 exp2: String :) Not that don't have enough overloaded functions already...

@dvdvgt
Copy link
Collaborator Author

dvdvgt commented Dec 12, 2024

I just pushed a prototype that threads through string templates all the way until code generation.

This has the added benefit that we remain somewhat flexible for later changes, as Jiří mentioned. Furthermore, we can re-use the string interpolation capabilities of each respective backend, e.g. currently the JS string templates are generated as template literals. As a trade-off, we need to add a TemplateStr AST node to Core, CPS and to each backend. As I have already heard, not everybody is a proponent of further expanding Core ;)

Note: I currently only fully implemented it for JS. If we come to an agreement that this is the right way to proceed, I will implement for the remaining backends.

(I already started working on this before Jonathan's remark and I'd rather discuss this implementation briefly before scrapping it right away :) )

@jiribenes
Copy link
Contributor

jiribenes commented Dec 12, 2024

I am not completely against desugaring in parser, but maybe we could actually desugar

"foo ${exp1} bar ${exp2} baz"

to something like:

splice(["foo ", " bar ", "baz"], [exp1, exp2])

this way, as a user I can still overload splice for types other than exp1 exp2: String :) Not that don't have enough overloaded functions already...

This relates to some strategy we'll need to come up with in order to represent custom string interpolators (which is very outside of the scope of this PR, I believe...)

Then I'd expect some desugaring like, for example:

val name = "Joe"
val age = 42

////

val s: SQL::Stmt = 
  sql"SELECT * FROM Person p WHERE p.last_name = ${name} AND p.age = ${age}"

// ~>

val s: SQL::Stmt = sql { // sql: { prog: () => Unit / { Splice[String], Splice[Int] } }: SQL::Stmt
  do literal("SELECT * FROM Person p WHERE p.last_name = ")
  do splice(name)
  do literal(" AND p.age = ")
  do splice(age)
}

@dvdvgt dvdvgt force-pushed the feature/string-interpolation branch 2 times, most recently from 187d5b0 to 3ff4df3 Compare January 9, 2025 21:43
@dvdvgt dvdvgt force-pushed the feature/string-interpolation branch from 3ff4df3 to 9ba3039 Compare January 9, 2025 21:45
@dvdvgt dvdvgt force-pushed the feature/string-interpolation branch from 639f835 to 497b538 Compare January 10, 2025 18:35
@dvdvgt dvdvgt force-pushed the feature/string-interpolation branch from 3bef638 to 5ac9604 Compare January 15, 2025 15:47
@dvdvgt dvdvgt force-pushed the feature/string-interpolation branch from 5ac9604 to e4f24a6 Compare January 15, 2025 16:01
@dvdvgt
Copy link
Collaborator Author

dvdvgt commented Jan 15, 2025

There are three things keeping this from merging:

  • The test fails on LLVM with error code 139 (segmentation fault)
  • For some reason the test now also fails on JS even though the output is apparently the same as the expected. Maybe some weird non-displayable characters are inserted by accident?
  • I would like to include the splice module in the prelude however, I am prompted with Effekt Compiler Crash: null

@jiribenes
Copy link
Contributor

jiribenes commented Jan 15, 2025

I would like to include the splice module in the prelude however, I am prompted with Effekt Compiler Crash: null

I'll merge #780 ASAP, then it should at least tell you what's wrong :)
(You can also review it and merge it ;D)

jiribenes and others added 2 commits January 17, 2025 10:50
Resolves the `Effect Compiler Crash: null` part of #760.
The actual problem there was that `e.getMessage` can return `null` and
it does for... _StackOverflowException_ 🥳.
Looking at the docs, it's recommended to use `e.toString`.

While I was at it, I modified stack trace printing, now we do it
properly by using `.printStackTrace`.

This is very related to #731, I just want to get this out quickly to
unblock other problems.
Comment on lines 39 to 43
// resize buffer to strip trailing zeros that otherwise would be converted into 0x00 characters
buffer = bytearray::resize(buffer, pos)
val str = buffer.toString()
// after flushing, the stringbuffer should be empty again
buffer = bytearray::allocate(initialCapacity)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be made somewhat more efficient -- resize copies into a newly allocated bytearray, but toString doesn't (since on LLVM strings are bytearrays)
So it should be possible to just clear the buffer without allocating it again?

Suggested change
// resize buffer to strip trailing zeros that otherwise would be converted into 0x00 characters
buffer = bytearray::resize(buffer, pos)
val str = buffer.toString()
// after flushing, the stringbuffer should be empty again
buffer = bytearray::allocate(initialCapacity)
// resize buffer to strip trailing zeros that otherwise would be converted into 0x00 characters
val str = bytearray::resize(buffer, pos).toString()
// after flushing, the stringbuffer should be empty again
buffer = buffer.fill(0)

where buffer.fill(0) means memset(buffer, 0) (which we don't have yet) :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you could just clear the buffer, however, then the buffer's size will not be reset to a (potentially) smaller size and instead just buffer's previous size is kept. This probably would not be a big deal, though.

@dvdvgt dvdvgt added the blocked Blocked on other issues / PRs label Jan 17, 2025
marvinborner and others added 2 commits January 21, 2025 16:58
Closes #783. And closes #778.

I believe the way we currently handle size calculations for stack
pushes/pops of non-structural types has some flaws for non-pointer
values. Some changes in how we use the GEP instruction will probably fix
this, but I find its syntax really confusing. I have a proof-of-concept
alternate fix at the end.

To bypass the calculation issues, we can also just change the size of a
byte to 8 byte instead. This *hotfix* fixes the segfault in the original
issue as well as all of my minified versions.

Some additional explanations of the issue using an example program:

```scala
def foo { action: Byte => Unit }: Unit =
  action(42.toByte)

def main() = {
  foo { b =>
    // does not segfault when printlns are swapped
    println(42)
    println(b)
  }
}
```

<details>

<summary>Pseudo-flow with relevant (for the segfault) lines
marked</summary>

```
main(stack)

main(stack):
alloc(3 * 8 = 24)
push returnAddress_1
push sharer_5
push eraser_7
blockLit = vtable[blockLit_3289_clause_13]
foo(blockLit, stack)

foo(blockLit, stack):
alloc(3 * 8 = 24)
push returnAddress_40
push sharer_5
push eraser_7
literal = 42
byte = toByte(literal)
vtable = blockLit[0]
closure = blockLit[1]
fp = *vtable // = blockLit_3289_clause_13
fp(closure, byte, stack)

blockLit_3289_clause_13(closure, byte, stack):
alloc(3 * 8 + 1 = 25) // <---------------------------
push byte
push returnAddress_14
push sharer_26
push eraser_30
literal = 42
printlnInt(literal, stack)

printlnInt(value, stack): // _4
str = show(value)
unit = println(str)
dealloc(3 * 8 = 24)
pop _
pop _
pop returnAddress // _14
returnAddress(unit, stack) // <---------------------------

returnAddress_14(ret, stack):
dealloc(1)
pop byte
erase(ret)
alloc(3 * 8 = 24)
push returnAddress_17
push sharer_5
push eraser_7
printlnByte(byte, stack)

printlnByte(byte, stack):
str = show(byte)
unit = println(str)
dealloc(24)
pop _
pop _
pop returnAddress // _17
returnAddress(unit, stack)

returnAddress_17(unit, stack):
dealloc(24)
pop _
pop _
pop returnAddress // _40
returnAddress(unit, stack)

returnAddress_40(unit, stack)
dealloc(24)
pop _
pop _
pop returnAddress // _1
returnAddress(unit, stack)

returnAddress_1(unit, stack):
dealloc(24)
pop _
pop _
pop returnAddress
returnAddress(unit, stack)
```

</details>

The corresponding LLVM code of the relevant sections:

```llvm
define tailcc void @blockLit_3289_clause_13(%Object %closure, i8 %b_2365, %Stack %stack) {

    entry:

        ; new blockLit_3289_clause_13, 1 parameters
        %stackPointer_33 = call ccc %StackPointer @stackAllocate(%Stack %stack, i64 25) ; <---------------
        %b_2365_pointer_34 = getelementptr {i8}, %StackPointer %stackPointer_33, i64 0, i32 0
        store i8 %b_2365, ptr %b_2365_pointer_34, !noalias !2
        %returnAddress_pointer_35 = getelementptr {{i8}, %FrameHeader}, %StackPointer %stackPointer_33, i64 0, i32 1, i32 0
        %sharer_pointer_36 = getelementptr {{i8}, %FrameHeader}, %StackPointer %stackPointer_33, i64 0, i32 1, i32 1
        %eraser_pointer_37 = getelementptr {{i8}, %FrameHeader}, %StackPointer %stackPointer_33, i64 0, i32 1, i32 2
        store ptr @returnAddress_14, ptr %returnAddress_pointer_35, !noalias !2
        store ptr @sharer_26, ptr %sharer_pointer_36, !noalias !2
        store ptr @eraser_30, ptr %eraser_pointer_37, !noalias !2

        ; literalInt longLiteral_3291, n=42
        %longLiteral_3291 = add i64 42, 0

        ; substitution

        ; substitution [value_3 !-> longLiteral_3291]

        ; jump println_4
        musttail call tailcc void @println_4(i64 %longLiteral_3291, %Stack %stack)
        ret void
}

...

define tailcc void @println_4(i64 %value_3, %Stack %stack) {

    entry:

        ; definition println_4, environment length 1

        ; foreignCall pureApp_3284 : String(), foreign show_14, 1 values
        %pureApp_3284 = call ccc %Pos @show_14(i64 %value_3)

        ; foreignCall pureApp_3283 : Positive(), foreign println_1, 1 values
        %pureApp_3283 = call ccc %Pos @println_1(%Pos %pureApp_3284)

        ; substitution

        ; substitution [v_r_3164_3278 !-> pureApp_3283]

        ; return, 1 values
        %stackPointer_56 = call ccc %StackPointer @stackDeallocate(%Stack %stack, i64 24)
        %returnAddress_pointer_57 = getelementptr %FrameHeader, %StackPointer %stackPointer_56, i64 0, i32 0
        %returnAddress_55 = load %ReturnAddress, ptr %returnAddress_pointer_57, !noalias !2
        musttail call tailcc void %returnAddress_55(%Pos %pureApp_3283, %Stack %stack) ; <-------------------------
        ; here is the segfault, since returnAddress_55 is missing one byte!
        ret void
}
```

---

A fix I came up with that does not involve making the size of a byte
large:

``` llvm
define tailcc void @blockLit_3289_clause_13(%Object %closure, %Byte %b_2365, %Stack %stack) {

    entry:
        ; new blockLit_3289_clause_13, 1 parameters
        ; SPLIT the allocation into two segments <-----------------------
        %stackPointar_33 = call ccc %StackPointer @stackAllocate(%Stack %stack, i64 1)
        %b_2365_pointer_34 = getelementptr %Byte, %StackPointer %stackPointar_33, i64 0
        store %Byte %b_2365, ptr %b_2365_pointer_34, !noalias !2

        %stackPointer_33 = call ccc %StackPointer @stackAllocate(%Stack %stack, i64 24)

        %returnAddress_pointer_35 = getelementptr %FrameHeader, %StackPointer %stackPointer_33, i64 0, i32 0
        %sharer_pointer_36 = getelementptr %FrameHeader, %StackPointer %stackPointer_33, i64 0, i32 1
        %eraser_pointer_37 = getelementptr %FrameHeader, %StackPointer %stackPointer_33, i64 0, i32 2
        store ptr @returnAddress_14, ptr %returnAddress_pointer_35, !noalias !2
        store ptr @sharer_26, ptr %sharer_pointer_36, !noalias !2
        store ptr @eraser_30, ptr %eraser_pointer_37, !noalias !2

        ; literalInt longLiteral_3291, n=42
        %longLiteral_3291 = add i64 42, 0

        ; substitution

        ; substitution [value_3 !-> longLiteral_3291]

        ; jump println_4
        musttail call tailcc void @println_4(i64 %longLiteral_3291, %Stack %stack)
        ret void
}

...
```

---------

Co-authored-by: Philipp Schuster <[email protected]>
@dvdvgt dvdvgt removed the blocked Blocked on other issues / PRs label Jan 21, 2025
@b-studios
Copy link
Collaborator

b-studios commented Jan 21, 2025

Regarding the prelude, we have an undocumented invariant: all imports of prelude modules have to be in the prelude as well...

That is, the prelude in the JS runner needs to look like this:

override def prelude: List[String] = List("effekt", "option", "list", "result", "exception", "array", "string", "ref", "bytearray", "stringbuffer", "splice")

and bytearray, and stringbuffer also need to import effekt and dependencies that they need from the prelude manually.

That is, the following changes would be necessary (+ LLVM Runner)

diff --git a/effekt/jvm/src/main/scala/effekt/Runner.scala b/effekt/jvm/src/main/scala/effekt/Runner.scala
index ab26700e..6dbaac0b 100644
--- a/effekt/jvm/src/main/scala/effekt/Runner.scala
+++ b/effekt/jvm/src/main/scala/effekt/Runner.scala
@@ -151,7 +151,7 @@ object JSNodeRunner extends Runner[String] {
 
   override def includes(path: File): List[File] = List(path / ".." / "js")
 
-  override def prelude: List[String] = List("effekt", "option", "list", "result", "exception", "array", "string", "ref")
+  override def prelude: List[String] = List("effekt", "option", "list", "result", "exception", "array", "string", "ref", "bytearray", "stringbuffer", "splice")
 
   def checkSetup(): Either[String, Unit] =
     if canRunExecutable("node", "--version") then Right(())
diff --git a/examples/pos/string_interpolation.effekt b/examples/pos/string_interpolation.effekt
index 60aeb0b0..54bf9a20 100644
--- a/examples/pos/string_interpolation.effekt
+++ b/examples/pos/string_interpolation.effekt
@@ -1,6 +1,3 @@
-import splice
-import stringbuffer
-
 type Expr {
   Var(id: String)
   Abs(param: String, body: Expr)
@@ -38,4 +35,3 @@ def main() = {
   val fixpoint = Abs("f", App(Abs("x", App(Var("f"), App(Var("x"), Var("x")))), Abs("x", App(Var("f"), App(Var("x"), Var("x"))))))
   println(pretty"Fix point combinator: ${fixpoint}")
 }
-
diff --git a/libraries/common/bytearray.effekt b/libraries/common/bytearray.effekt
index d7552919..d7232e44 100644
--- a/libraries/common/bytearray.effekt
+++ b/libraries/common/bytearray.effekt
@@ -1,5 +1,7 @@
 module bytearray
 
+import effekt
+
 /**
  * A memory managed, mutable, fixed-length array of bytes.
  */
diff --git a/libraries/common/splice.effekt b/libraries/common/splice.effekt
index 9ba74d15..7de68c0d 100644
--- a/libraries/common/splice.effekt
+++ b/libraries/common/splice.effekt
@@ -1,5 +1,7 @@
 module splice
 
+import effekt
+
 import stringbuffer
 
 effect literal(s: String): Unit
diff --git a/libraries/common/stringbuffer.effekt b/libraries/common/stringbuffer.effekt
index 3b6d3acb..bfa57e62 100644
--- a/libraries/common/stringbuffer.effekt
+++ b/libraries/common/stringbuffer.effekt
@@ -1,5 +1,6 @@
 module stringbuffer
 
+import effekt
 import bytearray
 
 interface StringBuffer {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Basic string templates / interpolation in normal strings
4 participants