Art of Clean Code - Logging ~ VedantaTree ~ The Tree of Knowledge

"In some ways, programming is like painting. You start with a blank canvas and certain basic raw materials. You use a combination of science, art, and craft to determine what to do with them." - Andrew Hunt

My first lesson for importance of logging was in 2001-02, from my manager and coach Puneet Agarwal. We were building a framework for Data Bound Swing Components which can be used to design any application using a custom Net Beans kind of IDE, to enable rapid application development.

Design was to extend Swing Components, attach metadata for data source/s in context and pass it across component hierarchy using component life cycle events like add/removeNotify, and extend paint/print functions to render the components using data. This seems like a simple design flow, but had complex implementation due to hierarchical nature of UI elements. Data needs to be passed (and cleaned too) from parent to children and vice versa. Which means, any break in this recursive flow or any small miss (ex: even mishandling of data loading in repeated rendering events trigger by Swing) could cause issue in data loading, in rendering, memory management, and on focus kind of user facing features also.

The only savior to debug in such complex recursive programming environment was 'art of logging'. This is when my manager taught me the importance of logs, including right level of logs with right level of information and easy to understand format. I still remember that lesson and follow it that,

Logging is an Art

Fast forward today, we have much advanced features in IDE, ex: breakpoints, debug mode etc. These are making debugging much easier with live runtime data and control on start, restart and stop of events. Having said that, when there is issue in server or distributed environments, logs are very useful to understand the issue in one go by looking at logged data. Even in local dev environment also, it is much easier to see the whole system state by reading the properly crafted logs in one go.

Hence an effective logging is very important for clean code, a code which can be maintained easily for long, a code which can run in production without causing distress in nights for developers / ops / users and is easy to debug and fix quickly if there is any issue.

Let's talk about few of the important basics to make logging effective, and hence to write a clean effective code.

Have a common understanding of 'Format of Logs'

Yes, it is important. Refer to previous blog here about clean code, where we discussed about having standard formatting for codebase in a team, as it makes reading code easy. Same principle applies to logs also. If there are different kinds of log formatting, and patterns, it takes time to understand, and makes sense of it.

We should be able to read logs like a Story . A story narrating the state of system in simple words

Hence, decide on a format together as a team, and follow it. Sample format:

A message telling stage, state of the system. state1[value1] state2[value2]
Error caused by <reason>. Expected behavior was <expectation>. errorMessage[message], exception stacktrace if available
and so on..

Format is not that important, however, having a common understanding of format is.

Have a common understand of 'What to Log'

Similar to format of logs, having common understanding of 'what to log' is equally important. It helps to instill habit of logging right information as part of team culture, irrespective of components or developers. Ultimately, it helps to have symmetrical logs, with similar level of information which anyone can read, makes sense of the system state/error and can try to fix it. It helps to remove dependency on one person or even one team sometime, at least to understand the system state or cause of failure. Ex:

Incoming calls to system with parameters, and response codes or states.
Outgoing calls, parameters, and responses from other services. (of course after keeping security guidelines for logging in consideration)
and so on..

Have a common understanding about 'Level of Log'

Logging levels are important. What if all the logs have been put in 'info', it will make system super slow in live environment. Logs, string operations have significant impact on performance. And what if most of the logs are in debug, and there is no way to understand the important states in prod environment without enabling the debug mode, which will slow down the whole system. Same is for warn, error, and fatal too.

Define what should go where. Do we want to log all incoming and outgoing calls in info to understand important system interaction data. Do we want to log all intermediate important events in info. Do we want to log all state changes in debug, so as to debug any issue by just switching these on. Do we want to log db access failure as error, or it should be logged as fatal as we dont have any fallback db to failover automatically.

Have a common understanding as Team and follow it.

Define Infra, Tools to read the Logs

Defining tools and required infra to ingest, digest, and present log information in easy to retrieve and easy to read format is very helpful. If setup correctly, it saves lot of efforts (especially in distributed environments) of finding log files from different machines, collating data and then of making sense from multiple sources.

There are lot many awesome log analysis tools available nowadays, both open source and commercial. Pick one which makes good sense for your product, team and organization and use it. It is time, efforts savior and save team from frustration too in case of production issues.

Few examples of tools and framework: ELK, Graylog, Splunk, Sumologic, Logmatic. Here is one of the good blogs briefing about these tools.

Use all of above hard work for Monitoring, Alerts, and Stats

Once logs are in place, format and level of information is defined, tools to digest and present logs are configured. We are all set to utilize this information for higher purpose, which is to make life even easier and to make product more rich.

Setup monitoring on specific logged data and keywords which can help to raise alerts in case of any bad and good both kind of events. Support teams, dev teams can be informed at first sign of issue in system, giving them chance to act proactively, manage the issue or users or both.

An effective log monitoring system could be life savior by monitoring Vital Health signs of System proactively

Data generated by logs can be analyzed further to understand the usage pattern. This data can further be used to enhance, improve system resilience or product features also. Ex:

Analysis of traffic volume for all APIs can help to scale system and services proactively.
Analysis of pattern for feature usage can help to define the strategy for future feature enhancements, by understanding user behavior proactively.

There are many good blogs and resources available online for logging best practices and for tools, which can help to research best strategy. However, The most important aspect is to have a common understanding of log patterns, and analysis need. Once these basics are defined, agreed and followed at team level, rest of the ecosystem can be built easily around that.

We shall discuss more about error handling and documentation in next blogs.

Till then, Happy Coding..!