I am pretty sure that all of you have heard the term Performance Tuning and some of you have actively worked on tuning your applications to make it faster and scalable. Unfortunately or fortunately 🙂 I had opportunities to work on projects where stabilizing and improving the scalability of the application was our primary goal.
From those experiences, I found all those projects tackled the responsiveness and scalability issues in a reactive, rather than proactive manner. Some cases there was no mention of performance needs in the requirement doc or stories and at the end it was found that having a 2 sec response on most of the operations were mandatory for the software to be acceptable. Whose fault is this? Answers will be: Developers, Architects, Managers, QA, Outsourcing, Process, blah blah… depends on whom you ask.
So I thought I’d outline a practical approach of tackling performance requirements for applications just by answering when, how and who can help?
- During planning stage– including all performance requirements for an application is as important as including user Login/Authentication requirements. Still we forget it; as NFRs (non functional requirements) are treated as a second class candidate at the inceptions phase and at the end they cause huge damage. We should allocate % of total time to all NFR items and total cost should also include them. Having an SLA on performance requirements for every application you build always helps.
- Early and Continuous – There are cases where EARLY testing of the application from end to end becomes almost impossible as the different modules or external services are on dev and could not be integrated. This can be overcome by putting the contact first and interface base development in place and getting the the entire application skeleton ready and running with stubbed out services etc. This will allow us to do early end-2-end testing and find out problems in chosen infrastructure and communication protocols. Set up the load test environment as early as possible and continue doing that until the application is deprecated.
- Load testing– Simulating the production load before actually turning the application ON is a must do. There are functional tools available to automate and perform these AND you know what, in one of my past projects we had a httpClient written on our own that spawned multiple threads and requested different services from the system simultaneously. It was not complete but it was a trust worthy tool at that point of time. So use anything, but do it. Schedule these test to run periodically so that any performance problem can be noticed as it comes up.
- Analyze and Profile-To identify problem area(s) we need to do some analysis and profiling. In my previous projects we used to
- take JVM Thread dump time to time and analyze.
- use profiling toold to find out any possible problem and fix it. One must do item is measuring the execution time for service methods by logging delta of method exit and entry time, irrespective of having a profiling tool in place.
- check memory usage pattern in the TEST/STAGING servers specially look for GC statistic
- One thing at a time– depending on the results from analysis and profiling or just out of your experience; changes should be made in step wise fashion so that we can find out how much throughput we gain out of a change. This metric will be useful for any future decision making.
- In SWAT mode-this is more of a common scenario. In most of the cases you will be asked to fix performance problems after implementation or deployment. So, be calm when you are in a that kind of a situation. I would recommend you to do the following:
- Setting the expectation and chalking out a quick action plan is very important. So that everyone knows what are the different things we are going to do and when should they expect to hear results out of it. Believe me, it is important because in this stage you will be under the radar and all Big Bs expect something to hear from you or the group.
- Prepare a quick note of all application server and database server configuration parameters like (allocated memory, min and max heap size, max DB connection pool, etc.)
- Get a Thread dump and analyze it.
- Identify possible problem areas and start applying fix in following order (it’s not a rigid rule though).
- If you think there are server configuration param issues then you can win big with small changes, so go ahead a do it
- If you think its more of a hardware issue and the app needs more hardware support than software then raise that concern.
- If above are not the case then you must have Load testing planned on your TEST environment. So that you can test any change you make in code before actually going to production.
- Freeze all code – don’t deploy any new feature at this stage.
- If there are resource utilization issues, deadlock, contention, etc. and you need to make change in the code then do it in a very tight and well monitored way. Don’t forget to run regression test after any change, as we should not break any functionality while improving performance.
- Deploy and load test in TEST environment.
- If satisfied then move to production.
- You might need quite a few fix-test-deploy cycle to get to a desired point.
- Bottom line is there is no Cure All type of solution for it. Depending on your performance requirements and environment, different strategies can be used.
The task of performance tuning should not be the job of a special group or person. It should be a cumulative effort of the following roles given at different stages of the SDLC.
- Architect: choosing the right architecture that can meet the performance need and capacity planning help developers.
- Developers: design and implement in a way so that the code is responsive and scalable.
- Manager: monitor if the Performance requirements are met at different project milestones.
- QA: run tests to see if performance requirements are met throughout the project life.
- In case performance requirements are very stringent, should we add a dedicated person (Performance Engineer?) who SHOULD NOT take away the responsibilities from people mentioned above but WILL spend full time to collect metrics, co-ordinate with the team, raise concerns as and when any performance issues pops up? It can be a good option.
For post implementation performance tuning you might need someone (Performance Engineers/specialties, I am not sure how to name that role) who knows how to collect performance metrics, and can use the right tool set to find possible problem areas and apply fixes or give suggestions.
Let me know how you are addressing it?