Why Lazy Logging and my solution
While debugging some issue I needed to get more details about an object. Generating this information meant combining some fields, getting length of some array etc. This resulted in a not-so-light method. I didn’t think that calling every time is smart so I came up with a way to pass an object that will wrap this method call, and will only actually invoke it when applying
str on it.
But this isn’t the actual topic of this post 🙂 You can see the implementation here.
What I really want to talk about – the process!
You can find the source code for this post here.
What I would like to describe is the process. Usually, before attempting to implement something myself, I search for existing libraries, implementation or code snippets others have come up with. In this case, I just implemented what can be seen in the link above. It was too easy to bother looking further. But I changed my mind and wanted to look around to see what other people did. I found this SO answer that shows pretty much exactly what I did. Then I read further (as you should always do, because sometimes the accepted answer is simply wrong), and saw this answer. schnittstabil claims (and proves it by
timeit here) that the accepted answer is 4 times less efficient than his. This could mean, of course, that taking the lazy path is the wrong choice, because the mere scaffolding of enabling it is less performant than simply generating the debugging information every time, even when debugging is not needed.
I went out to test this a little more.
The suggested lazy implementation of the accepted answer is:
class LazyJoin(object): def __init__(self, s=None, items=None): self.s = s self.items = items def __str__(self): return self.s.join(self.items) def original_lazy(): logger.debug('Stupid log message %s', LazyJoin(' ', (str(i) for i in range(20))))
The alternative, more efficient implementation is:
def simple(): if logger.isEnabledFor(logging.DEBUG): logger.debug('Stupid log message ' + ' '.join([str(i) for i in range(20)]))
The benchmark indeed showed 3-4 times slower for the
original_lazy one. Without consider laziness (
no_lazy), this was about 8 times slower.
I speculated that because we create a new object every time, the original method is slow. So I modified
LazyJoin a little so it is callable, thus needing to create the object only once. This what I came up with:
class LazyJoin(object): def __init__(self, s=None, items=None): self.__call__(s, items) def __call__(self, s, items): self.s = s self.items = items def __str__(self): return self.s.join(self.items) def callable_lazy(lazy_log): logger.debug('Stupid log message %s', lazy_log(' ', (str(i) for i in range( if __name__ == '__main__': callable_setup = "from __main__ import callable_lazy, LazyJoin; lazy_log = LazyJoin()" print(min(timeit.repeat("callable_lazy(lazy_log)", setup=callable_setup)))
The result was unsatisfactory and too close to the original lazy implementation (on my machine, with Python 2.7, 2.3s vs. 2.1s).
I then moved closer to my use-case, which wasn’t exactly the same as described in the SO post. The data I need to generate my debugging information exists as fields on the object product the debug log. I only needed to call a single method to generate that information. Applying that here means passing the
LazyJoin as an object instead of creating or calling a method:
def no_arguments_lazy(lazy_log): logger.debug('Stupid log message %s', lazy_log) if __name__ == '__main__': import timeit no_arguments_setup = "from __main__ import no_arguments_lazy, LazyJoin; " "lazy_log = LazyJoin(' ', (str(i) for i in range(20)))" print(min(timeit.repeat("no_arguments_lazy(lazy_log)", setup=no_arguments_setup)))
The last iteration resulted performance only slightly worse: 2.1s vs 2.3s (mine was slower).
This was a small price to pay for clearer code (in my opinion).
As it turns, the performance hit (since I didn’t investigate too thoroughly, please feel free to add a comment about that) was caused by the extraneous function calls, as Python known to be very slow at that.
You can find the source code for this post here.
Just call me by my (nick) name, no problem.
Nicely written, it’s easy to follow your argumentation and why (micro) benchmarks are often misleading for real world problems. However, at the very end, I think things gone wrong.
The whole idea of calling logger.debug lazily is only important, if creating the message is very expensive. Zaar Hai gives an example at SO:
Hence, python has to evaluate (among other things):
This is of course stupid and expensive, at least if debug logging is disabled. Therefore, Zaar Hai asks for a more efficient way to do logging.
But, it’s just an example, never seen in the wild: e.g. the message to log is always the same!
You realized that your use case is different, and thus there is even more room for optimization. I fully agree, this is a good way to optimization, especially because you also benchmarked your optimization as well:
However, in my opinion this benchmark has some flaws:
(str(i) for i in range(20))is created only once, that doesn’t match your use case
Despite that, personally, I wouldn’t use LazyJoin because the message creation is hard coded (
self.s.join(self.items)), this is rather unusual for expensive debug messages in my opinion. Furthermore, both LazyJoin versions does not seem to be thread-safe (generators aren’t).
And last but not least, I would point out that the following is already lazy:
ifs are one of the few places where all(!?) programming languages evaluate the then and else branch lazily. This means, the then branch is only evaluated if and only if the condition evaluates to True.
Thanks for your comment and feedback – greatly appreciated!
I updated the post with your name, although I’m a little confused with your profile, which states your name is Michael.
I have added markdown capabilities to comments, and updated your comment to reflect this. I didn’t touch the text itself of course.
Allow me to explain my use-case a little better, and I hope it will be clearer why the
no argument lazyis indeed valid. First off, please examine my real implementation and usage in my gist. What I needed in my use-case is a method to be called (that produces a string) in a lazy fashion. To achieve that, I need a single instance of the lazy object (for a specific instance that wishes to produce lazy debugging information, that is – not a program-wide singleton).
So essentially, it’s totally sufficient for me to create the lazy object once, and then pass every time I need to log a message. Hence, the benchmark of the
no_arguments_lazyis valid for my needs (and it doesn’t matter what’s the implementation of
__str__or when and I create the generator).
The OP in SO showed an implementation in which he divides the “message production” into two. One is the generator creation and the second is string concatenation. If you need to create objects in the call site to generate your lazy object, I completely agree that a simple
ifis much better.
Your point of the lazy nature of
ifis completely valid. I never claimed that use the
simplemethod isn’t lazy, just that if I have to do it in several places, it will clutter my code.
As for thread-safety – I love that you pointed that out. This is an important valid point, and I will edit my post to incorporate the details of it. Essentially, the logging module could take my lazy object, passing it around and end up evaluating it in another thread, and even worse, in a later time (which can either end up in an error, or with a log message that contains information that’s not relevant to the time of logging). Thanks for that!
Feel free to share more of your thoughts on that!
P.S. I later found out about Bolton’s DeferredValue, which did something very similar to my implementation (a little more generic though).
Sorry, my wording was a bit misleading: My name is Michael Mayer. My nickname, more precisely, my username is usually schnittstabil 😉
From my point of view, that is not entirely true. You need to create
LazyLoggingobjects for each object and for each message function (
_debugging_infoin your gist). Moreover you must create those message functions:
Personally, I wouldn’t like to be forced to write
_debug_*methods just for the sake of efficient debugging. But, if this fits your needs and coding style, why not.
However, even in your use case, I see no need for those
AFAIK, it’s possible, but hard, to extend the logger class. Hence, you may write
To be honest, I pretty much agree with what you are saying.
If I need many “heavy” debug methods, I will need to instantiate the objects as you suggested and create those methods, and if I opt to your suggested implementation there’s not much change in the amount of needed code because it can be wrapped as you suggested.
Usually, I suppose that I won’t have more than one such method, so I can still opt for condition it with an “if” statement.
All that being said, the main reason to use the lazy objects, is to have a convention. So this kind coding standard will be exercised in every other place in the code that needs this.
The bottom line is that I didn’t end up using this at all, because it did seems like an overkill for my needs.