Better code decoupling via delegates

I work on a system called NetNORAD. NetNORAD is responsible for monitoring hosts across our fleet. We do that to measure the rate of packet loss happening in our clusters and determine if there’s a problem.

The system is comprised of three main components: A scheduler, an agent, and an aggregator. The scheduler decides what hosts the agents should monitor. The agents probe those hosts using UDP packets. The aggregators chew through all the data generated by the agents and try to come up with something useful.

The scheduler stores the lists of hosts it generates in Zookeeper. The system looks a bit like this:

A problem we had with our system until recently was the lack of backups. If something went wrong and the scheduler produced a bad list of hosts, we had no way of rolling back (only forward).

To address this, we decided store a copy of the lists in an object store.

A teammate of mine went about implementing this in a straightforward way: We had a class called ZookeeperWriter that knew how to encode the data and write it to Zookeper. My colleague’s approach was to just add a new method to support the new storage layer. Whenever we were done writing to Zookeeper, we’d write to the object store.

The code looked something like this:

class ZookeeperWriter:
    def writeShardedHostList(self, region, regionalHostList, globalHostList):
        blobs = {
            f"{region}_regional_host_list": self.encoder.encode(regionalHostList),
            f"{region}_global_host_list": self.encoder.encode(globalHostList)
        }

        try:
            self.writeBlobs(blobs)
        except ZookeeperError:
            raise
        else:
            self.writeShardedHostListToObjectStore(region, regionalHostList, globalHostList)

    def writeShardedHostListToObjectStore(self, region, regionalHostList, globalHostList):
        blobs = {
            f"{region}_regional_host_list": self.encoder.encode(regionalHostList),
            f"{region}_global_host_list": self.encoder.encode(globalHostList)
        }

       self.writeBlobsToObjectStore(blobs)

For historical reasons, we have multiple “flavours” of ZookeeperWriter classes: VanillaZookeeperWriter, ChocolateZookeeperWriter, etc. Each taking different arguments to encode and store. This made the code very cumbersome to extend when we needed to extend.

There was also another problem. When we encode data, we convert it to a binary format and compress it using bzip2. With the implementation we had that conversion and compression was executed twice: Once to store the data in Zookeeper and another to send it to the object store. Since the amount of data to encode was quite large, this repetitive behavior caused a slowed of around 15 minutes to our system.

To address the latter, we could just rearrange the code to avoid double encoding, but that needed to be replicated to every “flavour”.

How could we solve that?

Delegates

Delegation is when an object handles a request (i.e., method call) by delegating the request to a different object.

I know. That’s quite abstract.

In the context of this problem, think of it this way: The logic to encode data shouldn’t be tied to the code that actually writes data to the storage layer. The code handling the storage layer should only care about moving bits from here to there.

In our case, we had multiple ZookeeperWriters that knew both how to encode and how to store data. This made it difficult for us to reuse the compressed data and extend the system.

To solve this, I came up with a two different concepts: a BlobWriter and an AssetWriter. A BlobWriter is only concerned about writing bytes to a specific path in some kind of storage. It doesn’t care what the data is.

An AssetWriter, on the other hand, does care about the data. It knows how it should be handled and how the paths should be set. But it does not care how or where the data is written to. It delegates the writing to storage to a BlobWriter.

In this new setting, our classes looked like this:

class BlobWriter:
    @abstractmethod
    def writeBlobs(self , blobs: bytes):
        pass

class ZookeeperBlobWriter(BlobWriter):
    def writeBlobs(self, blobs: Dict[str, bytes]):
        # Do zookeeper stuff to write blobs

class ObjectStoreBlobWriter(BlobWriter):
    def writeBlobs(self, blobs: Dict[str, bytes]):
        # Do object store stuff

class AssetWriter:
    def __init__(self, blobWriterDelegates: List[BlobWriter]):
        self.writerDelegates = blobWriterDelegates

class ShardedListsAssetWriter(AssetWriter):
    def write(self, region: str, regionalHostList: HostList, globalHostList: HostList):
        blobs = {
            f"{region}_regional_host_list": self.encoder.encode(regionalHostList),
            f"{region}_global_host_list": self.encoder.encode(globalHostList)
        }
        
        for writer in self.writerDelegates:
            writer.writeBlobs(blobs)

And this is what the usage code looks like:

assetWriter = ShardedListsAssetWriter(writerDelegates=[
    ZookeeperBlobWriter(...),
    ObjectStoreBlobWriter(...),
])

...

assetWriter.write(regionName, regionalHostList, globalHostList)

With this change, adding or modifying different asset writers is quite easy. We only need to specify how the data should be encoded and its storage will be handled by the blob writers.

On the other hand, adding or modifying blob writers (i.e., support different storage layers) is a breeze. We just need to write enough code to push bytes to the storage and all the encoding is taken care by the asset writers.

In conclusion

Knowing how and when to decouple your code is an important skill. It can make adding/modifying a code base easy.

Delegates are one tool we should keep in our arsenal to achieve that.

How about you? What do you think about delegates? Would you approach this problem differently?

I’d love to hear from you and learn different ways to solve the same problem.