Cost of closures in GO: closure, lambda or goto

I am currently working on a project written in Go where closures are massively used. Knowing that calling a function is not a neutral thing, I wonder here about the impact of these closures on code performance. I therefore propose to understand how all this works and to make a comparison with a more classic error management method based on "goto".

Here are three approaches to manage a sequence of operations where we want to stop at the first error:

Closure: The closure fully uses the context capture system

func test_closure(str string) {
	var err error

	func() {
		if len(str) > 3 {
			err = globalErr
			return
		}
		if len(str) > 4 {
			err = globalErr
			return
		}
	} ()

	if err != nil {
		globalErrorCount++
	}
}

Lambda: similar to a closure but does not use the context capture system at all.

func test_lambda(str string) {
	var err error

	err = func(str_test string) error {
		if len(str_test) > 3 {
			return globalErr
		}
		if len(str_test) > 4 {
			return globalErr
		}
		return nil
	} (str)

	if err != nil {
		globalErrorCount++
	}
}

Goto: Using a "goto" to interrupt the flow.

func test_goto(str string) {
	var err error

	if len(str) > 3 {
		err = globalErr
		goto process_error
	}
	if len(str) > 4 {
		err = globalErr
		goto process_error
	}

process_error:

	if err != nil {
		globalErrorCount++
	}
}

Each function is tested in a case where the operation is performed without error, and in a case where the operation is performed with an error from the first processing. The test of each type of operation is measured during 1,000,000,000 iterations in order to have an average measurement.

Here are the results. This is the duration of execution of the 1,000,000,000 iterations and the difference from the fastest in percentage.

	Closure	Lambda	Goto
Return error	3.87s `+7.82%`	4.4s `+23.6%`	3.56s
Return no error	4.37s `+23.4%`	4.06s `+14.7%`	3.54s

Closure or lambda systems are significantly slower. Let's see why:

What is a closure and how does it work?

A closure is a function that maintains a reference to its lexical environment. The lexical environment includes the variables accessible at the place where the closure is defined.

Technically, the compiler analyzes the closure and determines the variables it must use. For these variables to be accessible, they will be allocated in the heap rather than in the stack.

What does a function call imply?

Calling a function is not something without impact. The program must sometimes save its registers in the stack (the Go compiler optimizes this), position the call parameters in registers according to the OS convention and finally execute the CALL instruction which will save the function's return point and jump to the first instruction of the function. Furthermore, a CALL can imply a flush of the CPU's instruction pipeline.

What is the difference between the heap and the stack?

The heap is a large memory area accessible from the entire program. In order to manage variables positioned in the heap, we generally use a memory allocation system. In the case of Go, we also use a "garbage collector". Memory in the stack is simpler to allocate, its quantity is known in advance by the compiler and it is enough to change the value of the stack pointer.

What is allocated in the stack is faster to allocate and is not subject to the "garbage collector". On the other hand, the stack space is limited.

Here is the assembly code generated for these three methods. We can note that the lambda and closure methods result in two functions while the goto method has only one:

Analysis of compiled code

Here is the code of each decompiled function.

Closure

TEXT main.test_closure(SB)
  comp.go:119:  CMPQ SP, 0x10(R14)			
  comp.go:119:  JBE 0x10b49ed				
  comp.go:119:  PUSHQ BP				
  comp.go:119:  MOVQ SP, BP				
  comp.go:119:  SUBQ $0x28, SP				
  comp.go:119:  MOVQ AX, 0x38(SP)			
  comp.go:119:  MOVQ BX, 0x40(SP)			
  comp.go:120:  MOVUPS X15, 0x18(SP)			
  comp.go:131:  MOVQ 0x38(SP), AX			
  comp.go:131:  MOVQ 0x40(SP), BX			
  comp.go:131:  LEAQ 0x18(SP), CX			
  comp.go:131:  CALL main.test_closure.func1(SB)	
  comp.go:133:  CMPQ 0x18(SP), $0x0			
  comp.go:133:  JNE 0x10b49dc				
  comp.go:133:  JMP 0x10b49e5				
  comp.go:134:  INCQ main.globalErrorCount(SB)		
  comp.go:134:  JMP 0x10b49e7				
  comp.go:133:  JMP 0x10b49e7				
  comp.go:136:  ADDQ $0x28, SP				
  comp.go:136:  POPQ BP					
  comp.go:136:  RET					
  comp.go:119:  MOVQ AX, 0x8(SP)			
  comp.go:119:  MOVQ BX, 0x10(SP)			
  comp.go:119:  CALL runtime.morestack_noctxt.abi0(SB)	
  comp.go:119:  MOVQ 0x8(SP), AX			
  comp.go:119:  MOVQ 0x10(SP), BX			
  comp.go:119:  JMP main.test_closure(SB)		

TEXT main.test_closure.func1(SB)
  comp.go:122:  CMPQ SP, 0x10(R14)			
  comp.go:122:  JBE 0x10b4ad3				
  comp.go:122:  PUSHQ BP				
  comp.go:122:  MOVQ SP, BP				
  comp.go:122:  SUBQ $0x8, SP				
  comp.go:122:  MOVQ AX, 0x18(SP)			
  comp.go:122:  MOVQ BX, 0x20(SP)			
  comp.go:122:  MOVQ CX, 0x28(SP)			
  comp.go:123:  MOVQ BX, 0(SP)				
  comp.go:123:  CMPQ BX, $0x3				
  comp.go:123:  JG 0x10b4a4d				
  comp.go:123:  JMP 0x10b4a87				
  comp.go:124:  MOVQ main.globalErr+8(SB), AX		
  comp.go:124:  MOVQ main.globalErr(SB), DX		
  comp.go:124:  MOVQ DX, 0(CX)				
  comp.go:124:  CMPL runtime.writeBarrier(SB), $0x0	
  comp.go:124:  JE 0x10b4a69				
  comp.go:124:  JMP 0x10b4a6b				
  comp.go:124:  JMP 0x10b4a7d				
  comp.go:124:  CALL runtime.gcWriteBarrier2(SB)	
  comp.go:124:  MOVQ AX, 0(R11)				
  comp.go:124:  MOVQ 0x8(CX), DX			
  comp.go:124:  MOVQ DX, 0x8(R11)			
  comp.go:124:  JMP 0x10b4a7d				
  comp.go:124:  MOVQ AX, 0x8(CX)			
  comp.go:125:  ADDQ $0x8, SP				
  comp.go:125:  POPQ BP					
  comp.go:125:  RET					
  comp.go:127:  MOVQ BX, 0(SP)				
  comp.go:127:  CMPQ BX, $0x4				
  comp.go:127:  JG 0x10b4a93				
  comp.go:127:  JMP 0x10b4acd				
  comp.go:128:  MOVQ main.globalErr+8(SB), AX		
  comp.go:128:  MOVQ main.globalErr(SB), DX		
  comp.go:128:  MOVQ DX, 0(CX)				
  comp.go:128:  CMPL runtime.writeBarrier(SB), $0x0	
  comp.go:128:  JE 0x10b4aaf				
  comp.go:128:  JMP 0x10b4ab1				
  comp.go:128:  JMP 0x10b4ac3				
  comp.go:128:  CALL runtime.gcWriteBarrier2(SB)	
  comp.go:128:  MOVQ AX, 0(R11)				
  comp.go:128:  MOVQ 0x8(CX), DX			
  comp.go:128:  MOVQ DX, 0x8(R11)			
  comp.go:128:  JMP 0x10b4ac3				
  comp.go:128:  MOVQ AX, 0x8(CX)			
  comp.go:129:  ADDQ $0x8, SP				
  comp.go:129:  POPQ BP					
  comp.go:129:  RET					
  comp.go:131:  ADDQ $0x8, SP				
  comp.go:131:  POPQ BP					
  comp.go:131:  RET					
  comp.go:122:  MOVQ AX, 0x8(SP)			
  comp.go:122:  MOVQ BX, 0x10(SP)			
  comp.go:122:  MOVQ CX, 0x18(SP)			
  comp.go:122:  CALL runtime.morestack_noctxt.abi0(SB)	
  comp.go:122:  MOVQ 0x8(SP), AX			
  comp.go:122:  MOVQ 0x10(SP), BX			
  comp.go:122:  MOVQ 0x18(SP), CX			
  comp.go:122:  JMP main.test_closure.func1(SB)

Lambda

TEXT main.test_lambda(SB)
  comp.go:138:  CMPQ SP, 0x10(R14)			
  comp.go:138:  JBE 0x10b4b4f				
  comp.go:138:  PUSHQ BP				
  comp.go:138:  MOVQ SP, BP				
  comp.go:138:  SUBQ $0x20, SP				
  comp.go:138:  MOVQ AX, 0x30(SP)			
  comp.go:138:  MOVQ BX, 0x38(SP)			
  comp.go:139:  MOVUPS X15, 0x10(SP)			
  comp.go:149:  MOVQ 0x30(SP), AX			
  comp.go:149:  MOVQ 0x38(SP), BX			
  comp.go:149:  CALL main.test_lambda.func1(SB)		
  comp.go:141:  MOVQ AX, 0x10(SP)			
  comp.go:141:  MOVQ BX, 0x18(SP)			
  comp.go:151:  TESTQ AX, AX				
  comp.go:151:  JNE 0x10b4b3e				
  comp.go:151:  JMP 0x10b4b47				
  comp.go:152:  INCQ main.globalErrorCount(SB)		
  comp.go:152:  JMP 0x10b4b49				
  comp.go:151:  JMP 0x10b4b49				
  comp.go:154:  ADDQ $0x20, SP				
  comp.go:154:  POPQ BP					
  comp.go:154:  RET					
  comp.go:138:  MOVQ AX, 0x8(SP)			
  comp.go:138:  MOVQ BX, 0x10(SP)			
  comp.go:138:  CALL runtime.morestack_noctxt.abi0(SB)	
  comp.go:138:  MOVQ 0x8(SP), AX			
  comp.go:138:  MOVQ 0x10(SP), BX			
  comp.go:138:  JMP main.test_lambda(SB)		

TEXT main.test_lambda.func1(SB)
  comp.go:141:  PUSHQ BP			
  comp.go:141:  MOVQ SP, BP			
  comp.go:141:  SUBQ $0x18, SP			
  comp.go:141:  MOVQ AX, 0x28(SP)		
  comp.go:141:  MOVQ BX, 0x30(SP)		
  comp.go:141:  MOVUPS X15, 0x8(SP)		
  comp.go:142:  MOVQ 0x30(SP), CX		
  comp.go:142:  MOVQ CX, 0(SP)			
  comp.go:142:  CMPQ CX, $0x3			
  comp.go:142:  JG 0x10b4c49			
  comp.go:142:  JMP 0x10b4c67			
  comp.go:143:  MOVQ main.globalErr(SB), AX	
  comp.go:143:  MOVQ main.globalErr+8(SB), BX	
  comp.go:143:  MOVQ AX, 0x8(SP)		
  comp.go:143:  MOVQ BX, 0x10(SP)		
  comp.go:143:  ADDQ $0x18, SP			
  comp.go:143:  POPQ BP				
  comp.go:143:  RET				
  comp.go:145:  MOVQ 0x30(SP), CX		
  comp.go:145:  MOVQ CX, 0(SP)			
  comp.go:145:  CMPQ CX, $0x4			
  comp.go:145:  JG 0x10b4c78			
  comp.go:145:  JMP 0x10b4c96			
  comp.go:146:  MOVQ main.globalErr(SB), AX	
  comp.go:146:  MOVQ main.globalErr+8(SB), BX	
  comp.go:146:  MOVQ AX, 0x8(SP)		
  comp.go:146:  MOVQ BX, 0x10(SP)		
  comp.go:146:  ADDQ $0x18, SP			
  comp.go:146:  POPQ BP				
  comp.go:146:  RET				
  comp.go:148:  MOVUPS X15, 0x8(SP)		
  comp.go:148:  XORL AX, AX			
  comp.go:148:  XORL BX, BX			
  comp.go:148:  ADDQ $0x18, SP			
  comp.go:148:  POPQ BP				
  comp.go:148:  RET

Goto

TEXT main.test_goto(SB)
  comp.go:156:  PUSHQ BP			
  comp.go:156:  MOVQ SP, BP			
  comp.go:156:  SUBQ $0x18, SP			
  comp.go:156:  MOVQ AX, 0x28(SP)		
  comp.go:156:  MOVQ BX, 0x30(SP)		
  comp.go:157:  MOVUPS X15, 0x8(SP)		
  comp.go:159:  MOVQ 0x30(SP), AX		
  comp.go:159:  MOVQ AX, 0(SP)			
  comp.go:159:  CMPQ AX, $0x3			
  comp.go:159:  JG 0x10b4ba9			
  comp.go:159:  JMP 0x10b4bc3			
  comp.go:160:  MOVQ main.globalErr(SB), AX	
  comp.go:160:  MOVQ main.globalErr+8(SB), CX	
  comp.go:160:  MOVQ AX, 0x8(SP)		
  comp.go:160:  MOVQ CX, 0x10(SP)		
  comp.go:161:  JMP 0x10b4bf0			
  comp.go:163:  MOVQ 0x30(SP), AX		
  comp.go:163:  MOVQ AX, 0(SP)			
  comp.go:163:  CMPQ AX, $0x4			
  comp.go:163:  JG 0x10b4bd4			
  comp.go:163:  JMP 0x10b4bee			
  comp.go:164:  MOVQ main.globalErr(SB), AX	
  comp.go:164:  MOVQ main.globalErr+8(SB), CX	
  comp.go:164:  MOVQ AX, 0x8(SP)		
  comp.go:164:  MOVQ CX, 0x10(SP)		
  comp.go:165:  JMP 0x10b4bf0			
  comp.go:170:  JMP 0x10b4bf0			
  comp.go:170:  CMPQ 0x8(SP), $0x0		
  comp.go:170:  JNE 0x10b4bfa			
  comp.go:170:  JMP 0x10b4c03			
  comp.go:171:  INCQ main.globalErrorCount(SB)	
  comp.go:171:  JMP 0x10b4c05			
  comp.go:170:  JMP 0x10b4c05			
  comp.go:173:  ADDQ $0x18, SP			
  comp.go:173:  POPQ BP				
  comp.go:173:  RET

Here is what we can note about the compiled code:

For closure and lambda

We can observe the use of two distinct functions (main.test_closure and main.test_closure.func1), which confirms the creation of an anonymous function.
We see the use of CALL and RET instructions which are expensive in execution time.
The generated assembly code contains more instructions, and will therefore probably be longer to execute (this is not always true).

For Goto

In the case of goto, we only see one function (main.test_goto), which shows a simpler structure.
The JMP instructions in the Goto version are the direct implementation of goto.

The goto controversy

Here is an old debate that has no reason to be. This debate still lives because many people repeat ad nauseam the diatribe "You should never use 'goto' because it can always be replaced by a structure based on loops". Unfortunately these good people don't know why they repeat that, and their assurance creates followers who in turn preach the good word.

This old debate comes from Dijkstra who considers that "goto" tends to make code unreadable, and that it can always be replaced by something else, which he defends in this article https://homepages.cwi.nl/~storm/teaching/reader/Dijkstra68.pdf. It's an interesting point of view. Only, Dijkstra does algorithmics and not development, in algorithmics, we free ourselves from error case management, resource liberation, connection loss, and other concepts that are easily treated by a "goto". When we develop, we cannot free ourselves from these cases, so "goto" is a particularly elegant way to treat them.

Furthermore, if the preacher makes an appeal to authority by citing "Dijkstra", it will be easy to find other authority figures who support the contrary such as for example "Torvalds" https://lkml.org/lkml/2003/1/12/128.

Yes, the abuse of "goto" makes code difficult to read. Should we therefore radicalize and proscribe "goto" from all code? The important thing is that the use or not of this practice corresponds to your issues. Personally, I have never encountered an issue that requires proscribing the reasonable use of "goto".

Compiler optimization

All the code of these tests are compiled by disabling optimizations. If we activate them (they are activated by default) the compiler detects that these 3 functions do the same thing and it reduces them to their smallest expression, notably by removing closures and lambda functions. It therefore produces the same code 3 times.

TEXT main.test_closure(SB)
  comp.go:119:  MOVQ AX, 0x8(SP)		
  comp.go:119:  NOPL				
  comp.go:123:  CMPQ BX, $0x3			
  comp.go:123:  JLE 0x108b395			
  comp.go:124:  MOVQ main.globalErr(SB), AX	
  comp.go:131:  JMP 0x108b397			
  comp.go:131:  XORL AX, AX			
  comp.go:133:  TESTQ AX, AX			
  comp.go:133:  JE 0x108b3a3			
  comp.go:134:  INCQ main.globalErrorCount(SB)	
  comp.go:136:  RET				

TEXT main.test_lambda(SB)
  comp.go:138:  MOVQ AX, 0x8(SP)		
  comp.go:138:  NOPL				
  comp.go:142:  CMPQ BX, $0x3			
  comp.go:142:  JLE 0x108b3d5			
  comp.go:149:  MOVQ main.globalErr(SB), AX	
  comp.go:149:  JMP 0x108b3d7			
  comp.go:149:  XORL AX, AX			
  comp.go:151:  TESTQ AX, AX			
  comp.go:151:  JE 0x108b3e3			
  comp.go:152:  INCQ main.globalErrorCount(SB)	
  comp.go:154:  RET				

TEXT main.test_goto(SB)
  comp.go:156:  MOVQ AX, 0x8(SP)		
  comp.go:159:  CMPQ BX, $0x3			
  comp.go:159:  JLE 0x108b414			
  comp.go:160:  MOVQ main.globalErr(SB), AX	
  comp.go:161:  JMP 0x108b416			
  comp.go:161:  XORL AX, AX			
  comp.go:170:  TESTQ AX, AX			
  comp.go:170:  JE 0x108b422			
  comp.go:171:  INCQ main.globalErrorCount(SB)	
  comp.go:173:  RET

What to take away from all this

Although the compiler is very efficient at detecting situations to optimize, isn't it preferable to directly write economical code, rather than writing expansive code for which we hope for optimization by the compiler?

For the rest, this test mainly measures the time of calling a function. In the case where the code contained in the closure is substantial, this call time will become negligible compared to the execution time. In this case, I would say that code readability should be prioritized.

When I read code with a lambda function or a closure, I cannot read the code linearly, I am forced to jump to the end of the function (→ and ←) in order to know if I define a function that will be called later or immediately and in this second case, what are the arguments that are passed to it.

On the other hand, reading a function based on goto is perfectly linear and therefore more understandable.

The source files: