Skip to content

xgo: monkey patching in go using ‐toolexec

xhd2015 edited this page Apr 5, 2024 · 6 revisions

Original Blog: https://blog.xhd2015.xyz/posts/xgo-monkey-patching-in-go-using-toolexec/

Overview

In this blog, I will break down the implementation details in xgo.

In case you don't know, the xgo project is at https://github.com/xhd2015/xgo.

What it does is simply adding trap in the beginning of each go function, resulting in the concept called Trap, then built on this technique, other functionalities like Mock, Patch and Trace are introduced.

What is Trap?

A trap is a code snippet inserted into the beginning of a function body. Given a function named greet:

func greet(s string) string {
    return "hello " + s
}

After instrumented by xgo, the code that seen by the compiler will be:

import "runtime"

func greet(s string) (r0 string){
    stop, post := runtime.__xgo_trap(greet, &s, &r0)
    if stop {
        return
    }
    defer post()
    return "hello " +s
}

The difference can be visualized by a diagram:

image

As shown in the diagram, once the function is called, it's control flow is first transferred to Trap, then a list of interceptors will examine if current call should be mocked, modified, recorded or stopped according to their purpose.

The idea is simple, but arises a few questions:

  • how can the go compiler see the instrumented code?
  • what is the heck of import runtime?

These two questions reflect xgo's two basic parts: instrumenting the compiler and instrumenting the runtime.

Let's take a look at the first one.

How can the go compiler see the instrumented code?

To let go compiler see the code that's different from it's original source, something in the middle must happen.

Luckily, the go build has a flag called -toolexec:

$ go help build
...
-toolexec 'cmd args'
        a program to use to invoke toolchain programs like vet and asm.
        For example, instead of running asm, the go command will run
        'cmd args /path/to/asm <arguments for asm>'.
        The TOOLEXEC_IMPORTPATH environment variable will be set,
        matching 'go list -f {{.ImportPath}}' for the package being built.
...

If you google go toolexec, they even have an example: https://go.dev/src/cmd/go/testdata/script/toolexec.txt.

In short, the -toolexec flag lets the user intercept each compile and link command invoked by go, and perform some kind of instrumentation if needed, illustrated as below:

image

Notice that when you add the -toolexec=my_tool flag to go build, instead of directly calling compile args and link args, it will forward these calls to my_tool <cmd> args

So xgo utilize this flag to intercept the compile command, forwarding all compiles into the instrumented compiler.

The instrumented compiler then will insert these trap calls to each function, providing a chance for runtime to capture function calls before they are really made.

What is the heck of import runtime?

Now, the compiler has added the trap call for us, how do we know what kind of checks needed to be done?

We cannot make every package depends on xgo because they probably do not need it.

Well, here the runtime is also instrumented to forward the call to xgo. Because in go, every package implicitly depends on the runtime package. The control flow is illustrated as below:

image

It's actual dependency injection. This way, no code must explicitly depend on xgo.

The above code can be found at patch/trap_runtime/xgo_trap.go and runtime/trap/trap.go

To make the trap extensible, xgo has abstracted a concept called interceptor. It has the following signature:

type Interceptor struct {
	Pre  func(ctx context.Context, f *core.FuncInfo, args core.Object, result core.Object) (data interface{}, err error)
	Post func(ctx context.Context, f *core.FuncInfo, args core.Object, result core.Object, data interface{}) error
}

An interceptor is made up of two sub functions, called Pre and Post.

  • Pre is called before the function's logic,
  • Post is called after the function's logic, using a defer statement

Wrap up

Let's wrap up all the things we've talked.

When you run xgo test ./, it does the following things:

  1. find the GOROOT,
  2. copy the GOROOT into ~/.xgo/go-instruments/GOROOT to prepare for instrumenting,
  3. apply patch to ~/.xgo/go-instruments/GOROOT, both for compiler and runtime,
  4. build the instrumented compiler,
  5. invoke go build with extra flag: go build -toolexec=exec_tool ./,
  6. the exec_tool then forward all compile command to the instrumented compiler,
  7. once all compilation finished, go invoke link to generate the executable, and you get a instrumented binary!

Pros and Cons

As a result, xgo has gained both prons and cons from the above mechanism. Pros:

  • Concurrent safety: it does not replace function which requires modifying global address, thus each goroutine can setup its own interceptor and remove them individually,
  • Compatibility: it rewrite source code instead of architectural instruction, so it is OS and arch agnostic,
  • Extensibility: it provides general interceptor, so its use is not limited to mock, all thing you do with GRPC interceptor can be borrowed here, like tracing(already implemented), caching, logging...

Cons:

  • The user need to install xgo to enable trapping

Thanks for your reading, the core implementation of the xgo is all above. How do you think of it? Leave a comment here and let us discuss!