Back To Blog
Fake Doors, A/B Testing, and Instincts

Jess Lee, one of the co-founders of Polyvore and its VP of Product, posted this great video about "Fake Doors" [link]. (Thanks to Rich Chen / Wesley Chan for the link.) Jess, by the way, was one of the most well-respected product managers at Google -- I don't know a single person who didn't think highly of her and her work.

Her video got me thinking about testing in general. By the way, this post is not in direct response to her video -- I broadly agree with everything she says -- it's more ruminations that came about after watching it.

When I first became a product manager, one of the things that was surprising to me was the volume of decisions I had to make. Sometimes they were large decisions like what we were building or how much I projected a particular product or feature would do revenue-wise -- other times they were smaller like where a box should go and the color of a particular object. But there were often hundreds of decisions -- maybe even more for something particularly complex.

A/B testing -- where you're essentially testing 2 options: A vs. B (though I've run tests where you're testing even 5 different options) -- is done to answer a simple question. Which is better, A or B? 

So here's the question -- where do things like fake doors, A/B testing, or just pure iterating fit into the product development process and why are they so important?

When deciding on a new feature -- you have a handful of options:

1. Fake doors / A/B testing -- compare option A versus option B

2. Just implement it, get feedback, and iterate

3. Just implement it

Let's look at (1) -- essentially A/B testing. The general assumption is that testing is better than no testing -- after all, you tested something out, two sets of groups used something simultaneously, you now know the answer, right?

The first thing to consider with A/B testing is cost. Any form of testing is expensive -- it takes time and resources to run and analyze a test. If you're faced with hundreds of questions -- you can't run a test for every single decision you need to make. I've run a lot of A/B tests in my life -- and many were unnecessary. A lot came back with statistically insignificant results. Only some questions fit the potentially big enough impact / fundamentally not sure equation to make it worth it to run such a test.

Here's the second thing to consider. A/B testing sets up a construct that you're testing option A versus option B and the winner is definitely the best option. Here's why I disagree. I can't remember where I read this example so apologies in advance -- but take the example of Coca-Cola. Let's say you're in a meeting and someone comes in and says, "We've found a substitute for the sugar we use in Coke that will save us 0.5 cents per serving. We tested the two different versions of Coke and customers can't tell the difference between the two. Shall we implement the change?" Presumably the answer is yes. Then at the next meeting, someone comes back and says, "We've found a substitute for the food coloring we use that will save us 0.2 cents per serving. We tested the two different versions and customers can't tell the difference. Shall we implement the change?" Presumably the answer again is yes. This goes on until 10+ changes are made and you're now saving, let's say 15 cents per serving. A massive improvement. Here's the problem. While I have no doubt that customers can't tell the different between two versions of the product where a single incremental change was made -- I'm sure they can tell the difference between the original version and the one that had 10+ changes made. This is the problem with A/B testing -- it gives you information on only two options -- but doesn't necessarily inform you on whether or not that change makes sense in the larger context of the company.

Option (2) -- implementing, getting feedback, and iterating -- is another approach. I very much like this approach because of its speed and attitude. You create something, you put it out on the market, and basically you see what the market says. Do they use it? How do they use it? Is there a particular aspect of the feature that's appealing? This isn't necessarily better than A/B testing. What I like about it though is its approach -- its approach is around the development of instincts. You're basically making bets. "I think X will be good, therefore we will launch X and see what happens." X may be good. X may be bad. You'll find out. But what's happening is that you're honing your ability to make those bets which will be really key for speed purposes and for just good decision making in general.

Option (3) -- just implement it. I want to illustrate this example with something from Apple. I have never seen a company that A/B tested things as much as Amazon. I really thought this was often a poor use of resources because we would test a lot of things that I felt strongly fell into the "common sense" category. This wasn't always the case -- but in a world of limited resources, you can't test everything. Testing is not a replacement for instincts. The development of good instincts is the most important thing.

One of the things that I love about buying Apple products is the packaging. I recently bought their battery charger. It comes in a very high quality box with an excellent finish. It's packaged very tightly yet it's easy to open and re-use. It's elegantly constructed and conveys a strong sense of quality and value. This comes from instincts and a fundamental sense of who Apple is. Detail-oriented. High quality. A re-imagining of something staid and boring.

I guarantee they didn't test this. They didn't say, "If we have a regular box versus this high quality box, do we sell more of these battery chargers to justify the increased cost of the box?" It comes from a place of, "We make really high quality products that our customers derive value from. Everything about our product should reflect this -- right down to the packaging." In all likelihood, if they did test it -- it probably would test negatively. But the test only measures short-term decisions / impact -- it doesn't measure what the long-term brand impact is. If Apple today saved money across the board on packaging -- no doubt they would make more money -- short-term. But this would eventually have a corrosive effect on their brand value -- likely stripping away more in brand value (and eventually sales) than they would've saved on packaging. 

The way I view all this is that any which way you implement a feature -- simultaneous testing, testing over time, or just implementing it -- it's all about 1 thing. Honing your instincts. Having a fundamental understanding of who your company is, what it stands for, and how it delivers value to the customer and then making decisions appropriately. You want testing to inform it -- to give you more information about how your customers view you and interact with you -- but you don't want them to dictate it. You want it to come from a fundamental core of who you are and how you do business.