Disclaimer

This post provides an intentionally watered-down overview of mark sweep garbage collection algorithm and the tradeoffs involved around it. A lot of the gory implementation details aren’t mentioned.

Recap

At a high level, an automatic memory management system has two key responsibilities:

  1. Allocate space for new objects
  2. Reclaim space from dead objects

The sole responsibility of a garbage collector is to reclaim the space used by every object that will no longer be used by any of the execution paths in the program. In order to achieve that, it needs to categorize all objects as either dead or alive. As we mentioned in part 1, most garbage collectors treat reachability as liveness. An object is considered dead if none of the program’s execution paths can reach it. This post assumes that you’ve read part 1 and are familiar with mutator and collector concepts.

Categories Of Garbage Collection Algorithms

Most garbage collection schemes can be categorized as one of the four fundamental approaches :

It is common for GC implementations to combine multiple approaches (e.g. reference counting supplemented by mark-sweep) to meet performance goals. In this post, we’ll look into the details of mark sweep approach and its trade-offs.

How and when is GC invoked?

Before we jump into the details of mark sweep, let’s quickly understand how/when a GC is triggered. Typically, language runtimes trigger GC when they find that the heap is exhausted and there’s no more memory available to satisfy a memory allocation request by a mutator thread. Its something like:

def memory_allocator(n):
    # Allocate and return 'n' bytes on the heap.
    ref = allocate(n)
    if not ref:
        # Oops. We are out of memory. Invoke GC.
        gc()
        # Retry allocation
        ref = allocate(n)
        if not ref:
            # Let's give caller a chance to do something about it.
            # i.e. delete references, purge caches etc.
            raise OutOfMemory
    return ref

Mark Sweep

Mark Sweep algorithm is as simple as its name. It consists of a mark phase that is followed up by a sweep phase. During mark phase, the collector walks over all the roots (global variables, local variables, stack frames, virtual and hardware registers etc.) and marks every object that it encounters by setting a bit somewhere in/around that object. And during the sweep phase, it walks over the heap and reclaims memory from all the unmarked objects.

The outline of the basic algorithm is given below in python pseudo-code. Here we assume that the collector is single threaded but there could be multiple mutators. All mutator threads are stopped while the collector runs. This stop-the-world approach seems suboptimal but it greatly simplifies the implementation of collector because mutators can’t change the state under it.

def gc():
    stop_all_mutators()
    mark_roots()
    sweep()
    resume_all_mutators()

def mark_roots():
    candidates = Stack()
    for field in Roots:
        if field != nil && not is_marked(field):
            set_marked(field)
            candidates.push(field)
            mark(candidates)

def mark(candidates):
    while not candidates.empty():
        ref = candidates.pop()
        for field in pointers(ref):
            if field != nil && not is_marked(field):
                set_marked(field)
                candidates.push(field)

def sweep():
    scan = start_of_heap()
    end = end_of_heap()
    while scan < end:
        if is_marked(scan):
            unset_marked(scan)
        else:
            free(scan)
        scan = next_object(scan)

def next_object(address):
    # Parse the heap and return the next object.
    ...

Mark Phase

From the pseudo-code, it is clear that mark sweep doesn’t directly identify garbage. Instead it identifies all the objects that are not garbage i.e. alive and then concludes that anything else must be garbage. Marking is a recursive process. Upon finding a live reference, we recurse into its child fields and so on and so forth. Recursive procedure call isn’t a practical method for marking because of time overhead and stack overflow potential. That’s why we are using an explicit stack. This algorithm makes the space cost as well as the time overhead of the marking phase explicit. The maximum depth of the candidates stack depends on the size of the longest path that has to be traced through the object graph. A theoratical worst case might be equal to the number of nodes on the heap, but most real world workloads tend to produce stacks that are comparatively shallow. Nonetheless a safe GC implementation must handle such abnormal situations. In our implementation, we call mark( ) immediately after adding a new object to the candidates mainly to keep the size of the stack under control. The predicament of marking is that GC is needed precisely because of lack of available memory but auxiliary stacks require additional space. Large programs may cause garbage collector itself to run out of space. One benefit of explicit stack is that an overflow can be detected easily and a recovery action can be triggered. Overflow can be detected in many ways. A simple solution is to use an inline check in each push( ). A slightly more efficient method would be to use a guard page and trigger recovery after trapping the guard violation exception. The tradeoffs of both approaches must be understood in the context of the underlying operating system and the hardware. In first approach, the is-full test is likely to cost a couple of instructions (test followed by a branch) but will be repeated every time we examine an object. Second approach requires trapping access violation exception that is generally expensive but will be infrequent.

Sweep Phase

The implementation of sweep( ) is fairly straight-forward. It scans the heap in a linear fashion and frees any unmarked object. It does pose parsability constraints on our heap layout. The implementation of next_object(address) must be able to return the next object on the heap. Generally, it is sufficient for the heap to be parseable only in one direction. Most GC enabled language runtimes tag an object’s data with an object header. The header contains information like object’s metadata i.e. type, size, hashcode, mark bits, sync block etc. Typically an object’s header is placed immediately before an object’s data. Thus object’s reference doesn’t point to the first byte of the allocated heap cell, but into the middle of the cell, right after the object header. This facilitates upward parsing of the heap. A common implementation of free(address) will fill the freed cell with a pre-determined filler pattern that is recognized by the heap parsing logic.

Issues And Tradeoffs

Conclusion

This concludes a high level overview of mark sweep garbage collection approach. Mark sweep is a class of garbage collection algorithms and each of them involves subtle tradeoffs. Some notable algorithms include: