I grew up in an, um, interesting household – both my parents were scientists. More specifically, they were food chemists, which led to frequent dinnertime conversations about why Polysorbate 80 was 12% better than Polysorbate 79, and how sweet Betty Crocker was actually Sweaty Larry, who ran a Buick-size industrial mixer in the factory. It’s fun to reminisce about our youth, but as I was writing this article, I was also forced to recall a notable childhood trauma – “The Cheesecake Years.”
At the time, my father was working on a new “recipe” (formula and production process) so that a different Sweaty Larry could mass-produce New York-style cheesecake and subsequently displace your grandmother. Nearly every night, FOR TWO YEARS STRAIGHT, Dad came home with samples for us to try, asked specific questions, and then carefully captured our feedback in his notebook.
While that might be considered child abuse today, I now appreciate that I was also witnessing well-constructed scientific experiments with controlled baselines, carefully managed independent variables (ingredients and process controls), and defined dependent variables like moisture, taste, and texture.
Ironically, this was one of my first exposures to good lean thinking.
Today it seems like you can’t throw a stick of butter without hitting someone who is “running a lean experiment” on one thing or another. As an engineer and scientist myself, that should warm my heart, but in reality the feeling is more often heartburn because frequently they are just trying “stuff” and that rarely leads to good science or optimal improvements. “I thought we needed a new screw machine so I bought one as an experiment” is about the level of experimental sophistication that many companies share with me. Sigh.
Our organizations need us to do much better – not only because of the results, but also the ongoing improvement of our lean skills. Like all good problem-solving, a real experiment starts with a purpose or valid reason for why this is important and defines the gap we want to bridge. Once we learn more about the situation, we can hypothesize about what is causing the gap and which countermeasures could possibly close it. Our experiments then either confirm our hypotheses or else send us back to the drawing board to try again (see article “Problem? What Problem?”).
Unfortunately, it seems like well-crafted experiments are becoming even fewer and farther between as more and more organizations “test” their solutions to give their favorite ideas legitimacy. Some less-than-savory (but all too common) clues that your organization isn’t running real experiments include:
- Inadequate measurement systems and / or missing success and failure criteria: “My new 6S process works great!” Really? How do you know without agreed upon standards and an accurate, repeatable measurement system? You really need to define and validate your measurement system before you run the experiment.
- No control group: Website developers often run A (baseline or control) / B (test case or treatment) comparisons by directing half the traffic to either of two website configurations and monitoring which leads to better conversion (users taking desired action). This takes out time-based, urgency-based, economic-based, and other hard to identify and control variation – leaving only a fair comparison of layout and content factors. If you cannot adequately control confounding variables during your experiment, you really need to run a parallel baseline case.
- Too many variables: If we simultaneously change 27 things in our new lean management system “pilot,” how do we know which were the most critical ones? It would be an inexcusable waste if we spent a lot of effort implementing the 24 things that did nothing or had a negative impact. And conversely, if our pilot failed, how could we ever know if we had a bad solution overall, or just too many confounding variables? As is done in software development, breaking large systems into smaller, testable features makes running valid, controllable experiments much easier than building the whole app or implementing the whole methodology and hoping for the best.
- Selection bias: Trying a new tool or methodology with an all-star team, more resources, under ideal/low-variation circumstances, or with extra management attention does not prove that it will work in the real world – especially if the sample size is one. When I facilitate value stream mapping for product development, I always ask the teams to capture information about multiple past projects so they can better understand and plan for “normal variation” when they run their new solution experiments. Engaging the whole problem solving team (not to mention those who will be impacted by the change) in discussions about their own unquestioned assumptions and biases before an experiment is launched helps everyone to design a more robust test.
- Confirmation bias: Did we interpret the results in a way that only confirms what we wanted the outcome to be? And related is “expectancy bias” where we create our hypotheses and tests to almost guarantee success like “we are testing to see if people in our office will use the new software.” Many people will probably try something new if you ask them to, but how can you be sure it actually creates an important long-term benefit instead of short-term compliance? Confirmation and expectancy bias can usually be controlled by correctly implementing the previously-mentioned countermeasures.
- Not capturing/analyzing/reflecting on unintended consequences (see article “The Chief Cause of Problems Is Solutions”): Complex systems, like organizations and markets, have complex problems and many unforeseeable outcomes. Trading one problem for another is not very lean, so make sure you capture, discuss, and actually do something about any unintended consequences before implementing that new solution.
There are other forms of scientific thinking that can be useful instead of controlled experiments, however. Many product ideas, like Play-Doh (originally a wallpaper cleaner), were created through leveraging anthropology / ethnography (scientific observation) to uncover situational problems or unexpected benefits. In these cases, it is critical to capture all observations in an unbiased manner while not tampering with the situation. This can lead to insights which can later be leveraged to create testable hypotheses in more conventional experiments – like “do children really enjoy playing with colorful, fake dough” and “are kids really going to try eating large quantities of this stuff?” (Yes and yes, in case you are wondering).
Practicing good science is no doubt tough, but like any skill, practice and application in many different scenarios will eventually lead to it becoming second nature.
I am not suggesting that you should never, ever try something new without a detailed A3 or complex Design of Experiment. Sometimes a broad trial helps us prioritize if we want to try something again with more rigor. “Nope, nothing caught fire and nobody died during that nine-minute stand-up meeting experiment, so maybe we should study the +’s and –‘s further.”
And finally, don’t forget to socialize your experiments to get better and broader input, control more bias, learn faster from each other, and pave the way for implementation. For instance, I am running a long-term experiment right now. My hypothesis is that I can remain happy for the rest of my life without ever eating cheesecake again. Thirty-five years later and my theory is still holding up.