Objective-C Blocks

David Stes
Molenstraat 5
2018 Antwerp, Flanders, Belgium
email: stes@pandora.be

March 4, 1998

Abstract:

This paper proposes an implementation of Blocks for Objective-C. The language and runtime, as described in [#!Cox86!#] doesn't include Blocks, but work was done on adding support for Blocks in [#!Cox91!#]. We try to give a detailed account of our experiences, discussing both syntax and semantics of an implementation where Blocks do not have to be evaluated in LIFO (last in first out) order and support non-local returns. We also discuss an important application to exception handling in Objective-C.

Introduction

In Smalltalk, one would iterate over the contents of a group of objects, using the do: method which takes a Block, i.e. a piece of Smalltalk code, written inline, between square brackets, as argument. The Block is evaluated for each element :

aCltn do: [:each | self remove: each].

Blocks are heavily used in Smalltalk, for flow control. Objective-C is using the conventional C keywords, if, while, for, break, continue etc. for this purpose. However, there are still very good reasons to add support for Blocks, such as error handling using Blocks, as we will show.

Our implementation works out some of the ideas that are outlined in the TaskMaster paper, [#!Cox91!#], a paper that is concerned with support for threads, processes and exceptions in Objective-C. Taskmaster pointed out the need for support for Blocks in Objective-C, but no implementation was available, up to now. Our article complements Taskmaster in other ways, since we present a different approach to Objective-C non-LIFO blocks.

The implementation described here, was tested with the Portable Object Compiler, a new, freeware compiler for Objective-C. We are developing this compiler and distribute it under the terms of the GNU Library General Public License. It's as far as we know, the first Objective-C compiler that has a complete implementation of Blocks.

Choosing a Syntax for Objective-C Blocks

The Smalltalk square bracket syntax for Blocks,

aCltn do: [:each | self remove: each].

is not available for use in Objective-C because, in Objective-C, one mixes plain C with a special syntax for sending messages, which precisely consists of using square brackets. For example,

int i;
for(i=0;i<[aCltn size];i++) { ... }

The example shows that a pair of receiver and method selector (and arguments, if any), are placed between square brackets, and this is the syntax that is used in Objective-C to send a message.

Square brackets are, in Objective-C, not used for creating a Block. Instead, for Blocks, the proposal is to use curly braces, as in,

[aCltn do: { :each | [self remove: each];} ];

The braced group, argument of the do: method, is a Block with one argument.

Curly braces are used in C for compound statements, and since there is some similarity, we believe that this syntax for Blocks fits nicely with the rest of the C and Objective-C language.

Terminology

A Block can either have no arguments, it can have one argument, or it can have two or more arguments.

If the Block has no argument, it is evaluated by sending a value message to it. Evaluation of the Block means that the statements are executed. The methods value: and value:value do the same thing, but they allow to pass Objects as arguments to the Block, when it's evaluated.

A Block can be thought of as an anonymous method, with arguments, local variables and statements, but existing at runtime as data, as an instance of the Block class.

Ideally, instances of the Block class can be used whereever ordinary Objects are used. It should be possible to add Blocks to a Collection object, to assign a Block to some variable, to use it as an argument of a message and so on.

Syntax for Arguments

The syntax proposed in Taskmaster ([#!Cox91!#]) for these Block objects, was, (we list three different Blocks, with respectively no, one and two arguments) :

{ int i; foo(...); }
{ id a; | int i;foo(...); }
{ id a,b; | int i;foo(...); }

A vertical bar separates arguments from local variables and statements, as in Smalltalk.

When there's no argument (as in the first example out of three), no vertical bar is needed.

We implemented a syntax like this for our compiler, and another one where arguments were declared using ANSI C style, comma separated, lists, but in the end, after experimentation, we believe that arguments of a Block are in practice usually objects (i.e. variables that are of type id , the type that stands in Objective-C for an Object instance).

Therefore, we propose a new syntax with a default (or implicit) argument type of id. This is consistent with the defaults for method return and argument types : arguments of Objective-C methods default to type id, much as arguments of C functions default to int.

These considerations led us to the following syntax, which turns out to be close to the Smalltalk syntax; our proposal is to use :

{ int i; foo(...); }
{ :a | int i;foo(...); }
{ :a :b | int i;foo(...); }

In the case of a Block without arguments, there is no difference with the TaskMaster proposal. Arguments are preceded by a colon, much as is the case for method selectors in Objective-C.

An extension that we did not implement, because we currently have no use for it, might be to allow a cast in front of the argument, in the same manner as it is done sometimes for arguments of ordinary Objective-C methods :

{ :(BOOL)a | int i;foo(...); }

The point is that the default type, when the cast is omitted, should in any case be id. In a later section, about return values of a Block, we discuss the type issue in greater detail.

Examples

A first example uses the same Block to iterate over two different groups of objects, and print all objects on the stderr :

static void
printThem(firstCltn,otherCltn)
  id firstCltn,otherCltn;
{
    id aBlock = { :each | [each printOn:stderr]; };
    [firstCltn do:aBlock];
    [otherCltn do:aBlock];
}

Another example to count the number of Objects in a collection that are the same as some given object :

static unsigned
countThem(aCltn,anObject,count)
    id aCltn,anObject;
    unsigned count;
{
    [aCltn do: { :each | if (anObject == each) count++; }];
    return count;
}

Curly Braces

It is not obvious that the choice to use curly braces for Blocks, works. By this we mean, whether it can be implemented in a way that is compatible with the C language. Objective-C is a tool for C programmers and for C development, and the goal is to be fully compatible with the C language (see [#!Kernighan88!#]).

In fact, the Taskmaster paper, [#!Cox91!#], is printing braces in boldface, to indicate that ideally the syntax would be using ordinary braces, but that it was not yet clear, at that time, if and how it could be done, so the possibility was left open to use some other delimiter.

Consider the following example, where the right hand side of the assignment consists out of an ordinary C expression, which happens to be a Block without arguments, in the second case :

int count = 100;
id aBlock = { [aStream echo:text]; };

C initializers also use curly braces, so the following code is valid and traditional Objective-C. It is totally unrelated to Blocks, but qua syntax very close to the previous example :

int counters[1] = { 100 };
id points[1] = { [Point new] };

This is the C syntax for initializing a C array, which happens to be a C array consisting of a single, Objective-C Point object, in the second assignment.

An approach needs to be developed so that the parser, after processing a C declarator, knows whether to expect a C initializer for an array, or an expression (which might happen to be a Block).

The rule is that the curly brace syntax for Blocks is allowed in two cases:

In the case of the Portable Object Compiler, the implementation is based on yacc feedback to the lex lexical analyzer (see [#!LexYacc92!#]), using embedded actions, for example :

MessageExpression : '[' {okBlock=1;} Receiver Arguments ']'

The parser tells the lexical analyzer (using a flag called okBlock) that a Block can follow.

In the case of a pointer to, or an array of id types, the okBlock flag is set to false, so that a curly brace stands again for a conventional C initializer, not related in any way to Blocks :

id array[2] = { [Point new],[Point new] };

In our experience, the relaxed rules, allowing Blocks as (1) receiver or argument in a message and (2) as initializer of an id variable, cover most cases using a natural syntax.

Self Referencing Blocks

In an earlier implementation, we had introduced a variable, called thisBlock, which was to Blocks, what self is to object instances, and what _cmd is to methods : an implicit, hidden argument that referred to the Block itself that was being defined. Blocks could use the thisBlock variable to send messages to themselves.

In the current implementation, however, we believe that thisBlock is a redundant concept because a Block can easily refer to locals from the enclosing compound statement, and the C initializer syntax was augmented so that it's very convenient to initialize some variable to a Block. This means that a variable can easily be initialized to a Block that references itself :

id p = {:each | [each print];[[each subclasses] do:p];};

Given this definition of p, the following statement would print all subclasses of Object by recursively evaluating the Block p :

[p value:Object];

There is no longer a need for an implicit argument such as thisBlock to achieve the same thing.

Returning from a Block

Blocks can have a return value, which is the value that is returned by methods such as value, when the Block is evaluated.

On the other hand, it is also possible to return from the method (or function) in which the Block is being defined, from within a Block. This is called non-local return.

In Smalltalk, in the first case, one would simply write the expression that needs to be returned when the Block is evaluated. For example,

symbols <- vars collect: [ :x  | x asSymbol ].

On the other hand, in the second case, of a non-local return, in Smalltalk, one would use the up-arrow (usually entered from the keyword as a carret) :

self do: [ :each | (each=anObject) ifTrue:[^true]].
^false

An Objective-C Block can consist of a single expression, and in this case, the return value of the Block, consists of that expression, as in:

symbols = [vars collect:{:x|[x asSymbol]}];

The above example shows a message expression, [x asSymbol], that is placed inside curly braces, to indicate that it's a Block.

An extension, that we did not implement, is that the type of the return value could be set to be the same as that of the expression. This allows for arbitrary C types, and there is no ambiguity, since a C compiler always associates a type to expressions.

The type associated to the expression of the example, [x asSymbol], is id, so the return value is an Object, which is the case that we implemented.

However, a function or a method, can also return from within an Objective-C Block, and in this case, an arbitrary type is possible as return value :

static BOOL
isElementOf(anObject,aCltn)
  id anObject,aCltn;
{
    [aCltn do:{ :eachCltn |
        [eachCltn do:{ :element |
           if ([element isEqual:anObject]) return YES;
        }];
    }];

    return NO;
}

The above example shows that we have redefined the C keyword return to be equivalent to the Smalltalk uparrow (carret), for Objective-C Blocks.

Given the fact that the function isElementOf() is defined to have a BOOL type return value, this is indeed what the return statement returns, even from within the nested block.

The Portable Object Compiler supports this by translating return statements within Blocks, to a pair of setjmp() and longjmp(), and it uses Block variables to return the actual value (about which we say more in a section below).

To summarize, returning from a Block differs from returning from a method or function, from within a Block. In the first case, it is possible to assign a return value to a Block by simply writing the expression (of type id) that it should return. In the second case, one uses an explicit returnkeyword, and it acts very much like return would in regular C compound statement, supporting arbitrary C types as return value.

Objective-C Blocks are not LIFO

Objective-C blocks should not make the assumption that they are evaluated in LIFO (last-in first-out) order. An Objective-C Block may be evaluated after the function where it was created, has returned, as for example in :

[openMenu action:{:sender | doOpen();} ];

In the above example, an object called "openMenu", which could be an instance of the Menu class in some GUI library, saves the Block that is registered as the action Block.

When the user chooses an action (such as Open, or Quit) from the Menu, the Block is evaluated.

Clearly, evaluation of Blocks works in this case not in LIFO order : the function where the Block was created has already returned.

Variables in a Block

Although that Blocks can have return values, they are often used for side effects, where the return value is irrelevant. Typically the method do: is used to evaluate a Block that modifies a variable of the enclosing scope :

static int
totalSize(aCltn)
  id aCltn;
{
    int count = 0;
    [aCltn do: { :whatever | count += [whatever size]; }];
    return count;
}

The local (stack) variable count can be used and modified from within the Block, and when the method do: returns, the value of the variable count reflects the modifications by the Block.

The compiler makes a list of the local variables, class variables, instance variables, global variables, arguments etc. that blocks and their subblocks are referencing. It then promotes local (stack) variables to heap allocated variables, by generating a C struct for the variables, and by calling the C library function malloc(). The following semi-translated pseudo-code illustrates the point :

struct generated { int count; }

static void
blockFunction(id whatever,struct generated *scope)
{
    scope->count += [whatever size];
}

static int
totalSize(id aCltn)
{
    struct generated *scope = malloc(sizeof(struct generated));

    scope->count = 0;
    [aCltn do: newBlock(1,blockFunction,scope)];
    return scope->count;
}

In the above code, newBlock() is the function that creates an instance of the Block class. It takes the number of arguments, the function to evaluate when value is sent, and a malloc'ed pointer towards variables, as argument.

Instead of passing a pointer towards a stack variable, to the Block, the compiler automatically allocates some heap memory for such variables, and it translates references to these locals, in the calling function and in the Block, towards expressions that dereference a compiler generated C struct.

The use of a pointer into a stack-frame would only work for LIFO Blocks.

In practice, we use a double indirection, so that Blocks within Blocks work correctly, and such that variables with the same name, defined in different nested compound statements, are accessible from within Blocks, but the above pseudo-code illustrates the over-all approach.

We currently use a mark and sweep garbage collector for collecting the heap allocated memory that is used for communication between Blocks and the functions (or methods) where they are defined; it's also planned to implement a reference counting scheme for keeping track of this auxiliary memory.

Rules for Variables

The implementation of Blocks for the Portable Object Compiler treats various kinds of variables in the following way :

Global variables can be used in Blocks and they have their usual semantics. They are not promoted to heap allocated variables, when referenced from within a Block. They are accessed from within a Block in the same way as from within a function or method.

Instance variables (and class variables for those Objective-C compilers that have them) can be used in Blocks, within method definitions. Since references to instance variables are translated by the compiler to expressions that derefence the self pointer, the case of instance variables is in fact the case of sharing the self pointer between a method and a Block. Since self is a hidden argument for Objective-C methods, it falls under the next rule.

Local variables and arguments of methods, functions, or enclosing blocks of subblocks (in the case of nested blocks), are promoted by the compiler to heap allocated variables. The compiler generates code such that, when those variables are accessed from within the Block or from within the enclosing function or method, the value is accessed via a pointer.

Finally, in the case of an explicit return from a method or function, from within a Block, as discussed in one of the previous sections, the return value is treated as a local variable, and added as a dummy variable to the list of local variables that the method or function makes available to the Blocks that it contains.

Error Handling

The original Objective-C runtime used to define a method error: which could be used from within a method as follows :

- lookup:(STR)name
{
    id aClass = [self findClass:name];
    return (aClass)?aClass:[self error:"Class not found."];
}

The problem with this approach used to be that error: was abort()'ing the process : it was not possible to do error handling, i.e. to specify that some other action than aborting the process should be performed in case of an error.

An alternative was developed using C macros, DURING, HANDLER and ENDHANDLER. The message error: was replaced by :

- lookup:(STR)name
{
    id aClass = [self findClass:aClassName];
    return (aClass)?aClass:RAISE("Class not found.");
}

Applications would then catch the exception, set by RAISE(), by specifying a handler as follows:

static id
findclass(STR name)
{
    id aClass;

    DURING
        aClass = [Object lookup:name];
    HANDLER
        aClass = Object; 
    ENDHANDLER

    return aClass;
}

which specified that, if Foo was not found, then the function should return the root class, instead of simply aborting the process.

The handler was, in this approach, not a Block, but a compound statement within the function that was catching the exception. The pair of RAISE() and HANDLER macros would use setjmp() and longjmp() to do a non-local jump from the method that was setting the exception, to the method or function that was catching it.

This implies however, that it was not possible for a user to write a handler that would print for example, a stack backtrace, to see where the exception originated. Valuable information was lost by the longjmp to the handler, before the handler had a chance to execute.

Error Handling using Blocks

With Blocks, we can implement an error handling approach based on the method error: and on a new method, called ifError:.

Also, a new instance method halt: was added to the root class, Object, and the old method error: was reimplemented in terms of halt:.

halt: takes an error message, a String object, as argument, and evaluates the current errorHandler. This error handler is a Block that takes two arguments, namely the error message and the receiver of the halt: message :

- halt:msg
{
    [errorHandler value:msg value:self];
    return self;
}

The default errorHandler is a Block that simply aborts the process :

id handler = { :msg :rcv | fprintf(stderr,[msg str]);abort(); };

Therefore, the default behavior of methods such as error: remains compatible with their old usage.

Of course, since the error handler is just a Block, evaluated from within error: or halt:, the user can substitute a different Block for the default handler, and thereby change the default action (which is to abort the process).

This can happen on a per process basis, using the factory method errorHandler:, to, for example, print the message in a modal error dialog box, instead of on the stderr.

[Block errorHandler:{ :msg :rcv | DialogBox([msg str]); }];

Or, as an alternative to changing the default handler per process, the Block instance method ifError: can be used.

ifError: first pushes its argument, the error handler (a Block that takes two objects as argument) onto a stack, and then evaluates the receiver (a Block without arguments) in the same way as value does :

static id
findclass(STR name)
{
    id aClass;

    [{
        aClass = [Object lookup:name];
    } ifError: { :msg :rcv |
        if (!strcmp([msg str],"Class not found.")) {
            return Object; 
	} else {
            [rcv halt:msg];
	}
    }];

    return aClass;
}

error: pops a handler from the stack, and evaluates it. Since Blocks support non-local returns, it is possible to return from within the handler to the method that caught the exception. It can also reraise exceptions in which it is not interested, by calling halt: again, which will give control to the next handler on the stack, until the default handler (which aborts the process) is reached.

Unlike a strategy where the error handler is a subroutine of the method or function that catches the exception, the approach that uses Blocks allows that the error handler is evaluated from within the method that sets the exception, which means that no information such as a stack backtrace towards the location where the exception was set, is lost, for the handler.

Conclusion

Objective-C doesn't need Blocks for control statements, since the usual C statements are available for this. However, for error handling, it should be possible to pass a piece of code (a handler) as argument to the code that might raise an error. Since Smalltalk shows that Blocks can be used for many other purposes as well, adding support for Blocks is a valuable extension to Objective-C.

Appendix : Extensions to the Grammar

PrimaryExpression was extended to allow BlockExpressions, not just C expression and Objective-C MessageExpressions :

PrimaryExpression : ...
 | MessageExpression
 | BlockExpression
;

The rules for BlockArguments and BlockExpressions are :

BlockArguments : ':' Identifier
 | BlockArguments ':' Identifier
;

The lexical analyzer returns BLOCK_TOKEN instead of '{' when a Block is allowed.

BlockExpression : BLOCK_TOKEN Expression '}'
 | BLOCK_TOKEN Statements '}'
 | BLOCK_TOKEN Declarators Statements '}'
 | BLOCK_TOKEN BlockArguments '|' Expression '}'
 | BLOCK_TOKEN BlockArguments '|' Statements '}'
 | BLOCK_TOKEN BlockArguments '|' Declarators Statements '}'
;

2


David Stes
1999-09-16