Lazy Logging – a Performance Story

Guy Arad November 1, 2016 4 Comments

Why Lazy Logging and my solution

While debugging some issue I needed to get more details about an object. Generating this information meant combining some fields, getting length of some array etc. This resulted in a not-so-light method. I didn’t think that calling every time is smart so I came up with a way to pass an object that will wrap this method call, and will only actually invoke it when applying str on it.

But this isn’t the actual topic of this post 🙂 You can see the implementation here.

What I really want to talk about – the process!

You can find the source code for this post here.

What I would like to describe is the process. Usually, before attempting to implement something myself, I search for existing libraries, implementation or code snippets others have come up with. In this case, I just implemented what can be seen in the link above. It was too easy to bother looking further. But I changed my mind and wanted to look around to see what other people did. I found this SO answer that shows pretty much exactly what I did. Then I read further (as you should always do, because sometimes the accepted answer is simply wrong), and saw this answer. schnittstabil claims (and proves it by timeit here) that the accepted answer is 4 times less efficient than his. This could mean, of course, that taking the lazy path is the wrong choice, because the mere scaffolding of enabling it is less performant than simply generating the debugging information every time, even when debugging is not needed.

I went out to test this a little more.

The suggested lazy implementation of the accepted answer is:

class LazyJoin(object):
    def __init__(self, s=None, items=None):
        self.s = s
        self.items = items

    def __str__(self):
        return self.s.join(self.items)


def original_lazy():
    logger.debug('Stupid log message %s',
                 LazyJoin(' ', (str(i) for i in range(20))))

The alternative, more efficient implementation is:

def simple():
    if logger.isEnabledFor(logging.DEBUG):
        logger.debug('Stupid log message ' + ' '.join([str(i) for i in range(20)]))

The benchmark indeed showed 3-4 times slower for the original_lazy one. Without consider laziness (no_lazy), this was about 8 times slower.
I speculated that because we create a new object every time, the original method is slow. So I modified LazyJoin a little so it is callable, thus needing to create the object only once. This what I came up with:

class LazyJoin(object):
    def __init__(self, s=None, items=None):
        self.__call__(s, items)

    def __call__(self, s, items):
        self.s = s
        self.items = items

    def __str__(self):
        return self.s.join(self.items)

def callable_lazy(lazy_log):
    logger.debug('Stupid log message %s', lazy_log(' ', (str(i) for i in range(


if __name__ == '__main__':
    callable_setup = "from __main__ import callable_lazy, LazyJoin; lazy_log = LazyJoin()"
    print(min(timeit.repeat("callable_lazy(lazy_log)", setup=callable_setup)))

The result was unsatisfactory and too close to the original lazy implementation (on my machine, with Python 2.7, 2.3s vs. 2.1s).
I then moved closer to my use-case, which wasn’t exactly the same as described in the SO post. The data I need to generate my debugging information exists as fields on the object product the debug log. I only needed to call a single method to generate that information. Applying that here means passing the LazyJoin as an object instead of creating or calling a method:

def no_arguments_lazy(lazy_log):
    logger.debug('Stupid log message %s', lazy_log)


if __name__ == '__main__':
    import timeit
    no_arguments_setup = "from __main__ import no_arguments_lazy, LazyJoin; "
                         "lazy_log = LazyJoin(' ', (str(i) for i in range(20)))"
    print(min(timeit.repeat("no_arguments_lazy(lazy_log)",
                            setup=no_arguments_setup)))

Success!

The last iteration resulted performance only slightly worse: 2.1s vs 2.3s (mine was slower).
This was a small price to pay for clearer code (in my opinion).

Conclusion

As it turns, the performance hit (since I didn’t investigate too thoroughly, please feel free to add a comment about that) was caused by the extraneous function calls, as Python known to be very slow at that.

Source Code

You can find the source code for this post here.

4 thoughts on “Lazy Logging – a Performance Story”

schnittstabil November 2, 2016 at 10:12 pm
That person claims (and proves it by timeit here) that the accepted answer is 4 times less efficient than his.

Just call me by my (nick) name, no problem.

Nicely written, it’s easy to follow your argumentation and why (micro) benchmarks are often misleading for real world problems. However, at the very end, I think things gone wrong.

The whole idea of calling logger.debug lazily is only important, if creating the message is very expensive. Zaar Hai gives an example at SO:
```
    while True:
        logger.debug('Stupid log message ' + ' '.join([str(i) for i in range(20)]) )
```
Hence, python has to evaluate (among other things):
1. range(20)
2. then [str(i) for i in range(20)]
3. then ‘ ‘.join([str(i) for i in range(20)]
4. then ‘Stupid log message ‘ + ‘ ‘.join([str(i) for i in range(20)])
5. then call the function logger.debug(…) with a string argument
6. then WITHIN logger.debug: if logger.isEnabledFor(logging.DEBUG)
This is of course stupid and expensive, at least if debug logging is disabled. Therefore, Zaar Hai asks for a more efficient way to do logging.

But, it’s just an example, never seen in the wild: e.g. the message to log is always the same!

You realized that your use case is different, and thus there is even more room for optimization. I fully agree, this is a good way to optimization, especially because you also benchmarked your optimization as well:
```
no_arguments_setup = "…"
                     "lazy_log = LazyJoin(' ', (str(i) for i in range(20)))"
print(min(timeit.repeat(…, setup=no_arguments_setup)))
```
However, in my opinion this benchmark has some flaws:
1. the generator (str(i) for i in range(20)) is created only once, that doesn’t match your use case
2. the other benchmarks create the generator multiple times, which is measured
3. the generator and the LazyJoin instance are created during the setup, which is expensive, but not measured – even if you have do it only once, you have to ensure equal opportunities
Despite that, personally, I wouldn’t use LazyJoin because the message creation is hard coded (self.s.join(self.items)), this is rather unusual for expensive debug messages in my opinion. Furthermore, both LazyJoin versions does not seem to be thread-safe (generators aren’t).

And last but not least, I would point out that the following is already lazy:
```
def simple():
    if logger.isEnabledFor(logging.DEBUG):
        logger.debug('Stupid log message ' + ' '.join([str(i) for i in range(20)]))
```
ifs are one of the few places where all(!?) programming languages evaluate the then and else branch lazily. This means, the then branch is only evaluated if and only if the condition evaluates to True.
Reply ↓
1. Guy Arad Post authorNovember 3, 2016 at 8:44 am
  
  Thanks for your comment and feedback – greatly appreciated!
  I updated the post with your name, although I’m a little confused with your profile, which states your name is Michael.
  I have added markdown capabilities to comments, and updated your comment to reflect this. I didn’t touch the text itself of course.
  
  Allow me to explain my use-case a little better, and I hope it will be clearer why the no argument lazy is indeed valid. First off, please examine my real implementation and usage in my gist. What I needed in my use-case is a method to be called (that produces a string) in a lazy fashion. To achieve that, I need a single instance of the lazy object (for a specific instance that wishes to produce lazy debugging information, that is – not a program-wide singleton).
  So essentially, it’s totally sufficient for me to create the lazy object once, and then pass every time I need to log a message. Hence, the benchmark of the no_arguments_lazy is valid for my needs (and it doesn’t matter what’s the implementation of __str__ or when and I create the generator).
  The OP in SO showed an implementation in which he divides the “message production” into two. One is the generator creation and the second is string concatenation. If you need to create objects in the call site to generate your lazy object, I completely agree that a simple if is much better.
  
  Your point of the lazy nature of if is completely valid. I never claimed that use the simple method isn’t lazy, just that if I have to do it in several places, it will clutter my code.
  
  As for thread-safety – I love that you pointed that out. This is an important valid point, and I will edit my post to incorporate the details of it. Essentially, the logging module could take my lazy object, passing it around and end up evaluating it in another thread, and even worse, in a later time (which can either end up in an error, or with a log message that contains information that’s not relevant to the time of logging). Thanks for that!
  
  Feel free to share more of your thoughts on that!
  
  P.S. I later found out about Bolton’s DeferredValue, which did something very similar to my implementation (a little more generic though).
  
  Reply ↓
schnittstabil November 4, 2016 at 2:45 am
Sorry, my wording was a bit misleading: My name is Michael Mayer. My nickname, more precisely, my username is usually schnittstabil 😉

So essentially, it’s totally sufficient for me to create the lazy object once, and then pass every time I need to log a message.

From my point of view, that is not entirely true. You need to create LazyLogging objects for each object and for each message function (_debugging_info in your gist). Moreover you must create those message functions:
```
class MyClass(object):
    def __init__(self):
        self._lazy_debug_1 = LazyLogging(self._debug_1)
        self._lazy_debug_2 = LazyLogging(self._debug_2)
        self._lazy_debug_3 = LazyLogging(self._debug_3)

    def _debug_1(self):
        return "my debugging string 1"

    def _debug_2(self):
        return "my debugging string 2"

    def _debug_3(self):
        return "my debugging string 3"

    def m_1(self):
        logger.debug("Info: %s", self._lazy_debug_1)

    def m_2(self):
        logger.debug("Info: %s", self._lazy_debug_2)

    def m_3(self):
        logger.debug("Info: %s", self._lazy_debug_3)
```
Personally, I wouldn’t like to be forced to write _debug_* methods just for the sake of efficient debugging. But, if this fits your needs and coding style, why not.

However, even in your use case, I see no need for those LazyLogging object creations:
```
def lazy_debug(msg, fn):
    if logger.isEnabledFor(logging.DEBUG):
        logger.debug(msg, fn())

class MyClass(object):
    def __init__(self):
        pass

    def _debug_1(self):
        return "my debugging string 1"

    def _debug_2(self):
        return "my debugging string 2"

    def _debug_3(self):
        return "my debugging string 3"

    def m_1(self):
        lazy_debug("Info: %s", self._debug_1)

    def m_2(self):
        lazy_debug("Info: %s", self._debug_2)

    def m_3(self):
        lazy_debug("Info: %s", self._debug_3)
```
AFAIK, it’s possible, but hard, to extend the logger class. Hence, you may write logger.lazy_debug methods.
Reply ↓
1. Guy Arad Post authorNovember 11, 2016 at 7:54 pm
  
  To be honest, I pretty much agree with what you are saying.
  
  If I need many “heavy” debug methods, I will need to instantiate the objects as you suggested and create those methods, and if I opt to your suggested implementation there’s not much change in the amount of needed code because it can be wrapped as you suggested.
  
  Usually, I suppose that I won’t have more than one such method, so I can still opt for condition it with an “if” statement.
  
  All that being said, the main reason to use the lazy objects, is to have a convention. So this kind coding standard will be exercised in every other place in the code that needs this.
  
  The bottom line is that I didn’t end up using this at all, because it did seems like an overkill for my needs.
  
  Reply ↓

Guy Arad’s Blog

Software Development, Architecture and Technology