Written by Joseph Jones
It was Friday, September 14, 2001. We were still reeling emotionally from the tragic events in New York City and Washington DC three days earlier. Then we learned that the lab system that processed test results for a big chunk of the nation’s blood supply had failed. It was going to be an intense weekend, to say the least.
Who were we? We were the software engineering team responsible for support of that lab information software. The lab software had been a donation from a large pharmaceutical company, and was being used by a busy regional center for viral testing of blood donations. It was old legacy code – not our design and not well-documented. To further complicate matters, it was also a 510(k) medical device and subject to the strictest quality measures. That meant we had to treat the software as a black box and find an answer without modifying the source code or data structures.
We promptly gathered the facts and went into trouble-shooting mode. Meanwhile, we scheduled an update with the stakeholders for first thing Saturday morning and lined up the project plan and schedule. Our database lead and software engineers isolated the cause of the glitch that night. A counter with a 3-digit file size was stuck at “999” and there was no direct way to invoke a reset.
On Saturday, we caucused with the business and IT operations staff to coordinate our findings, the fix options and the target schedule. We couldn’t take the direct route of a SQL data change or increasing the size of the field, but we had found a way to invoke a daemon – a background process – that would re-seed the counter and enable normal operations.
Now the race was on and we needed totally synchronized participation by all the right stakeholders on a Saturday, dialing in from multiple locations for each stage gate meeting as we agreed on the readiness to proceed from step to step in the development life cycle. By midnight, we had the requirements for the fix documented, coded, tested, reviewed, and wrapped up in an installation package with validated instructions for the IT operations team. They opened a maintenance window early Sunday morning and the system was up and running again for the day shift at the regional lab. We monitored heavy catch-up traffic throughout the day and were thankfully able to confirm normal operation again.
How we were able to respond underscores the basic principles of Lean Project Management:
- Eliminate Waste: We avoided bottlenecks in the way that work elements pass among team members through teamwork founded on a strong WBS (work breakdown structure), estimation based on metrics from prior projects with similar parameters, standard processes and a well-exercised project plan that brought crystal clear clarity to everybody’s responsibilities and time commitments.
- Empowerment: We assigned clear responsibilities for deliverables and milestones, and our reviews and communications provided the visibility for the project manager to see progress and keep all stakeholders focused and in synch.
- Respect and Integrity: The analysts, engineers, and testers who worked on this problem exhibited a great deal of dedication to the customer’s mission, needs and requirements. It was a cohesive team effort all-around, with customer users, program management, and IT operations in the loop and expressing their satisfaction once the job was done.
- Deliver Fast: Our proven project plan and estimating parameters, featuring a critical path used on 40 service packages per year, enabled the quick software turnaround and led to speedy resolution of the cause of the problem.
- Amplify Learning: With the right stakeholders involved from each discipline, we were able to apply our collective intelligence to confirming the cause of the problem and stimulating multiple options for solving it. By having a tight project script and drawing upon past metrics, we kept everybody focused and productive when they were needed.
- See the Whole: We used the Saturday morning meeting to get all the stakeholders aligned on understanding the problem and its cause, its impact, the options, and agreeing on the best plan for getting it done.
- Risk Management: We followed up on this exercise with a risk assessment of other counters in the system and when they were likely to reach capacity, based on their level of activity, so that we could put preventive measures in place.
These Lean principles are even more instructive when viewed in the context of Barry Boehm’s landmark book “Software Engineering Economics.” Boehm defines the factors that impact the technical quality, schedule, and cost of software projects. The top five parameters are listed below (in order of the least to the most risk and impact) with supporting examples from our lab system effort:
- Applications experience – we had started assembling the team to work with this code about 18 months earlier, and our understanding benefited from having to reverse engineer it to get the documentation up to date.
- Timing constraints – while Boehm’s original definition pertained to software execution time, the critical time factor in this case was “ASAP!”
- Required reliability – this software had to perform its intended functions without defect because, ultimately, lives were at stake.
- Product complexity – while this patch routine would be classified as low in complexity, the legacy software itself was a major plate of spaghetti.
- Personnel/team capability – the individuals who worked this problem would rate above average to high in analyst, applications, programmer, and language skills and experience, but the key to success was the mission dedication and cohesiveness shared across the entire team.
A few years after Boehm’s book, I saw a follow-up article that cited his software drivers and added an even bigger influence – Project Management. As today’s world demands higher quality, faster and cheaper, doesn’t it just make sense to apply Lean to the biggest impact driver of all?