🇫🇷 Version Française disponible ici
I am currently working on a project written in Go where closures are massively used. Knowing that calling a function is not a neutral thing, I wonder here about the impact of these closures on code performance. I therefore propose to understand how all this works and to make a comparison with a more classic error management method based on "goto".
Here are three approaches to manage a sequence of operations where we want to stop at the first error:
Closure: The closure fully uses the context capture system
func test_closure(str string) {
var err error
func() {
if len(str) > 3 {
err = globalErr
return
}
if len(str) > 4 {
err = globalErr
return
}
} ()
if err != nil {
globalErrorCount++
}
}
Lambda: similar to a closure but does not use the context capture system at all.
func test_lambda(str string) {
var err error
err = func(str_test string) error {
if len(str_test) > 3 {
return globalErr
}
if len(str_test) > 4 {
return globalErr
}
return nil
} (str)
if err != nil {
globalErrorCount++
}
}
Goto: Using a "goto" to interrupt the flow.
func test_goto(str string) {
var err error
if len(str) > 3 {
err = globalErr
goto process_error
}
if len(str) > 4 {
err = globalErr
goto process_error
}
process_error:
if err != nil {
globalErrorCount++
}
}
Each function is tested in a case where the operation is performed without error, and in a case where the operation is performed with an error from the first processing. The test of each type of operation is measured during 1,000,000,000 iterations in order to have an average measurement.
Here are the results. This is the duration of execution of the 1,000,000,000 iterations and the difference from the fastest in percentage.
Closure | Lambda | Goto | |
---|---|---|---|
Return error | 3.87s +7.82% | 4.4s +23.6% | 3.56s |
Return no error | 4.37s +23.4% | 4.06s +14.7% | 3.54s |
Closure or lambda systems are significantly slower. Let's see why:
A closure is a function that maintains a reference to its lexical environment. The lexical environment includes the variables accessible at the place where the closure is defined.
Technically, the compiler analyzes the closure and determines the variables it must use. For these variables to be accessible, they will be allocated in the heap rather than in the stack.
Calling a function is not something without impact. The program must sometimes save its registers in the stack (the Go compiler optimizes this), position the call parameters in registers according to the OS convention and finally execute the CALL instruction which will save the function's return point and jump to the first instruction of the function. Furthermore, a CALL can imply a flush of the CPU's instruction pipeline.
The heap is a large memory area accessible from the entire program. In order to manage variables positioned in the heap, we generally use a memory allocation system. In the case of Go, we also use a "garbage collector". Memory in the stack is simpler to allocate, its quantity is known in advance by the compiler and it is enough to change the value of the stack pointer.
What is allocated in the stack is faster to allocate and is not subject to the "garbage collector". On the other hand, the stack space is limited.
Here is the assembly code generated for these three methods. We can note that the lambda and closure methods result in two functions while the goto method has only one:
Here is the code of each decompiled function.
Closure
TEXT main.test_closure(SB)
comp.go:119: CMPQ SP, 0x10(R14)
comp.go:119: JBE 0x10b49ed
comp.go:119: PUSHQ BP
comp.go:119: MOVQ SP, BP
comp.go:119: SUBQ $0x28, SP
comp.go:119: MOVQ AX, 0x38(SP)
comp.go:119: MOVQ BX, 0x40(SP)
comp.go:120: MOVUPS X15, 0x18(SP)
comp.go:131: MOVQ 0x38(SP), AX
comp.go:131: MOVQ 0x40(SP), BX
comp.go:131: LEAQ 0x18(SP), CX
comp.go:131: CALL main.test_closure.func1(SB)
comp.go:133: CMPQ 0x18(SP), $0x0
comp.go:133: JNE 0x10b49dc
comp.go:133: JMP 0x10b49e5
comp.go:134: INCQ main.globalErrorCount(SB)
comp.go:134: JMP 0x10b49e7
comp.go:133: JMP 0x10b49e7
comp.go:136: ADDQ $0x28, SP
comp.go:136: POPQ BP
comp.go:136: RET
comp.go:119: MOVQ AX, 0x8(SP)
comp.go:119: MOVQ BX, 0x10(SP)
comp.go:119: CALL runtime.morestack_noctxt.abi0(SB)
comp.go:119: MOVQ 0x8(SP), AX
comp.go:119: MOVQ 0x10(SP), BX
comp.go:119: JMP main.test_closure(SB)
TEXT main.test_closure.func1(SB)
comp.go:122: CMPQ SP, 0x10(R14)
comp.go:122: JBE 0x10b4ad3
comp.go:122: PUSHQ BP
comp.go:122: MOVQ SP, BP
comp.go:122: SUBQ $0x8, SP
comp.go:122: MOVQ AX, 0x18(SP)
comp.go:122: MOVQ BX, 0x20(SP)
comp.go:122: MOVQ CX, 0x28(SP)
comp.go:123: MOVQ BX, 0(SP)
comp.go:123: CMPQ BX, $0x3
comp.go:123: JG 0x10b4a4d
comp.go:123: JMP 0x10b4a87
comp.go:124: MOVQ main.globalErr+8(SB), AX
comp.go:124: MOVQ main.globalErr(SB), DX
comp.go:124: MOVQ DX, 0(CX)
comp.go:124: CMPL runtime.writeBarrier(SB), $0x0
comp.go:124: JE 0x10b4a69
comp.go:124: JMP 0x10b4a6b
comp.go:124: JMP 0x10b4a7d
comp.go:124: CALL runtime.gcWriteBarrier2(SB)
comp.go:124: MOVQ AX, 0(R11)
comp.go:124: MOVQ 0x8(CX), DX
comp.go:124: MOVQ DX, 0x8(R11)
comp.go:124: JMP 0x10b4a7d
comp.go:124: MOVQ AX, 0x8(CX)
comp.go:125: ADDQ $0x8, SP
comp.go:125: POPQ BP
comp.go:125: RET
comp.go:127: MOVQ BX, 0(SP)
comp.go:127: CMPQ BX, $0x4
comp.go:127: JG 0x10b4a93
comp.go:127: JMP 0x10b4acd
comp.go:128: MOVQ main.globalErr+8(SB), AX
comp.go:128: MOVQ main.globalErr(SB), DX
comp.go:128: MOVQ DX, 0(CX)
comp.go:128: CMPL runtime.writeBarrier(SB), $0x0
comp.go:128: JE 0x10b4aaf
comp.go:128: JMP 0x10b4ab1
comp.go:128: JMP 0x10b4ac3
comp.go:128: CALL runtime.gcWriteBarrier2(SB)
comp.go:128: MOVQ AX, 0(R11)
comp.go:128: MOVQ 0x8(CX), DX
comp.go:128: MOVQ DX, 0x8(R11)
comp.go:128: JMP 0x10b4ac3
comp.go:128: MOVQ AX, 0x8(CX)
comp.go:129: ADDQ $0x8, SP
comp.go:129: POPQ BP
comp.go:129: RET
comp.go:131: ADDQ $0x8, SP
comp.go:131: POPQ BP
comp.go:131: RET
comp.go:122: MOVQ AX, 0x8(SP)
comp.go:122: MOVQ BX, 0x10(SP)
comp.go:122: MOVQ CX, 0x18(SP)
comp.go:122: CALL runtime.morestack_noctxt.abi0(SB)
comp.go:122: MOVQ 0x8(SP), AX
comp.go:122: MOVQ 0x10(SP), BX
comp.go:122: MOVQ 0x18(SP), CX
comp.go:122: JMP main.test_closure.func1(SB)
Lambda
TEXT main.test_lambda(SB)
comp.go:138: CMPQ SP, 0x10(R14)
comp.go:138: JBE 0x10b4b4f
comp.go:138: PUSHQ BP
comp.go:138: MOVQ SP, BP
comp.go:138: SUBQ $0x20, SP
comp.go:138: MOVQ AX, 0x30(SP)
comp.go:138: MOVQ BX, 0x38(SP)
comp.go:139: MOVUPS X15, 0x10(SP)
comp.go:149: MOVQ 0x30(SP), AX
comp.go:149: MOVQ 0x38(SP), BX
comp.go:149: CALL main.test_lambda.func1(SB)
comp.go:141: MOVQ AX, 0x10(SP)
comp.go:141: MOVQ BX, 0x18(SP)
comp.go:151: TESTQ AX, AX
comp.go:151: JNE 0x10b4b3e
comp.go:151: JMP 0x10b4b47
comp.go:152: INCQ main.globalErrorCount(SB)
comp.go:152: JMP 0x10b4b49
comp.go:151: JMP 0x10b4b49
comp.go:154: ADDQ $0x20, SP
comp.go:154: POPQ BP
comp.go:154: RET
comp.go:138: MOVQ AX, 0x8(SP)
comp.go:138: MOVQ BX, 0x10(SP)
comp.go:138: CALL runtime.morestack_noctxt.abi0(SB)
comp.go:138: MOVQ 0x8(SP), AX
comp.go:138: MOVQ 0x10(SP), BX
comp.go:138: JMP main.test_lambda(SB)
TEXT main.test_lambda.func1(SB)
comp.go:141: PUSHQ BP
comp.go:141: MOVQ SP, BP
comp.go:141: SUBQ $0x18, SP
comp.go:141: MOVQ AX, 0x28(SP)
comp.go:141: MOVQ BX, 0x30(SP)
comp.go:141: MOVUPS X15, 0x8(SP)
comp.go:142: MOVQ 0x30(SP), CX
comp.go:142: MOVQ CX, 0(SP)
comp.go:142: CMPQ CX, $0x3
comp.go:142: JG 0x10b4c49
comp.go:142: JMP 0x10b4c67
comp.go:143: MOVQ main.globalErr(SB), AX
comp.go:143: MOVQ main.globalErr+8(SB), BX
comp.go:143: MOVQ AX, 0x8(SP)
comp.go:143: MOVQ BX, 0x10(SP)
comp.go:143: ADDQ $0x18, SP
comp.go:143: POPQ BP
comp.go:143: RET
comp.go:145: MOVQ 0x30(SP), CX
comp.go:145: MOVQ CX, 0(SP)
comp.go:145: CMPQ CX, $0x4
comp.go:145: JG 0x10b4c78
comp.go:145: JMP 0x10b4c96
comp.go:146: MOVQ main.globalErr(SB), AX
comp.go:146: MOVQ main.globalErr+8(SB), BX
comp.go:146: MOVQ AX, 0x8(SP)
comp.go:146: MOVQ BX, 0x10(SP)
comp.go:146: ADDQ $0x18, SP
comp.go:146: POPQ BP
comp.go:146: RET
comp.go:148: MOVUPS X15, 0x8(SP)
comp.go:148: XORL AX, AX
comp.go:148: XORL BX, BX
comp.go:148: ADDQ $0x18, SP
comp.go:148: POPQ BP
comp.go:148: RET
Goto
TEXT main.test_goto(SB)
comp.go:156: PUSHQ BP
comp.go:156: MOVQ SP, BP
comp.go:156: SUBQ $0x18, SP
comp.go:156: MOVQ AX, 0x28(SP)
comp.go:156: MOVQ BX, 0x30(SP)
comp.go:157: MOVUPS X15, 0x8(SP)
comp.go:159: MOVQ 0x30(SP), AX
comp.go:159: MOVQ AX, 0(SP)
comp.go:159: CMPQ AX, $0x3
comp.go:159: JG 0x10b4ba9
comp.go:159: JMP 0x10b4bc3
comp.go:160: MOVQ main.globalErr(SB), AX
comp.go:160: MOVQ main.globalErr+8(SB), CX
comp.go:160: MOVQ AX, 0x8(SP)
comp.go:160: MOVQ CX, 0x10(SP)
comp.go:161: JMP 0x10b4bf0
comp.go:163: MOVQ 0x30(SP), AX
comp.go:163: MOVQ AX, 0(SP)
comp.go:163: CMPQ AX, $0x4
comp.go:163: JG 0x10b4bd4
comp.go:163: JMP 0x10b4bee
comp.go:164: MOVQ main.globalErr(SB), AX
comp.go:164: MOVQ main.globalErr+8(SB), CX
comp.go:164: MOVQ AX, 0x8(SP)
comp.go:164: MOVQ CX, 0x10(SP)
comp.go:165: JMP 0x10b4bf0
comp.go:170: JMP 0x10b4bf0
comp.go:170: CMPQ 0x8(SP), $0x0
comp.go:170: JNE 0x10b4bfa
comp.go:170: JMP 0x10b4c03
comp.go:171: INCQ main.globalErrorCount(SB)
comp.go:171: JMP 0x10b4c05
comp.go:170: JMP 0x10b4c05
comp.go:173: ADDQ $0x18, SP
comp.go:173: POPQ BP
comp.go:173: RET
Here is what we can note about the compiled code:
Here is an old debate that has no reason to be. This debate still lives because many people repeat ad nauseam the diatribe "You should never use 'goto' because it can always be replaced by a structure based on loops". Unfortunately these good people don't know why they repeat that, and their assurance creates followers who in turn preach the good word.
This old debate comes from Dijkstra who considers that "goto" tends to make code unreadable, and that it can always be replaced by something else, which he defends in this article https://homepages.cwi.nl/~storm/teaching/reader/Dijkstra68.pdf. It's an interesting point of view. Only, Dijkstra does algorithmics and not development, in algorithmics, we free ourselves from error case management, resource liberation, connection loss, and other concepts that are easily treated by a "goto". When we develop, we cannot free ourselves from these cases, so "goto" is a particularly elegant way to treat them.
Furthermore, if the preacher makes an appeal to authority by citing "Dijkstra", it will be easy to find other authority figures who support the contrary such as for example "Torvalds" https://lkml.org/lkml/2003/1/12/128.
Yes, the abuse of "goto" makes code difficult to read. Should we therefore radicalize and proscribe "goto" from all code? The important thing is that the use or not of this practice corresponds to your issues. Personally, I have never encountered an issue that requires proscribing the reasonable use of "goto".
All the code of these tests are compiled by disabling optimizations. If we activate them (they are activated by default) the compiler detects that these 3 functions do the same thing and it reduces them to their smallest expression, notably by removing closures and lambda functions. It therefore produces the same code 3 times.
TEXT main.test_closure(SB)
comp.go:119: MOVQ AX, 0x8(SP)
comp.go:119: NOPL
comp.go:123: CMPQ BX, $0x3
comp.go:123: JLE 0x108b395
comp.go:124: MOVQ main.globalErr(SB), AX
comp.go:131: JMP 0x108b397
comp.go:131: XORL AX, AX
comp.go:133: TESTQ AX, AX
comp.go:133: JE 0x108b3a3
comp.go:134: INCQ main.globalErrorCount(SB)
comp.go:136: RET
TEXT main.test_lambda(SB)
comp.go:138: MOVQ AX, 0x8(SP)
comp.go:138: NOPL
comp.go:142: CMPQ BX, $0x3
comp.go:142: JLE 0x108b3d5
comp.go:149: MOVQ main.globalErr(SB), AX
comp.go:149: JMP 0x108b3d7
comp.go:149: XORL AX, AX
comp.go:151: TESTQ AX, AX
comp.go:151: JE 0x108b3e3
comp.go:152: INCQ main.globalErrorCount(SB)
comp.go:154: RET
TEXT main.test_goto(SB)
comp.go:156: MOVQ AX, 0x8(SP)
comp.go:159: CMPQ BX, $0x3
comp.go:159: JLE 0x108b414
comp.go:160: MOVQ main.globalErr(SB), AX
comp.go:161: JMP 0x108b416
comp.go:161: XORL AX, AX
comp.go:170: TESTQ AX, AX
comp.go:170: JE 0x108b422
comp.go:171: INCQ main.globalErrorCount(SB)
comp.go:173: RET
Although the compiler is very efficient at detecting situations to optimize, isn't it preferable to directly write economical code, rather than writing expansive code for which we hope for optimization by the compiler?
For the rest, this test mainly measures the time of calling a function. In the case where the code contained in the closure is substantial, this call time will become negligible compared to the execution time. In this case, I would say that code readability should be prioritized.
When I read code with a lambda function or a closure, I cannot read the code linearly, I am forced to jump to the end of the function (→ and ←) in order to know if I define a function that will be called later or immediately and in this second case, what are the arguments that are passed to it.
On the other hand, reading a function based on goto is perfectly linear and therefore more understandable.
The source files: