In my last post, I wrote about how to apply intuition from group dynamics to better understand machine learning. Today, I’d like to do the reverse, and use frameworks from machine learning to think through a question about group dynamics. Specifically, how should groups most efficiently iterate on ideas/products/processes? Or, more simply, what framework should groups use to learn?
As you might expect, machine learning has a lot to say about the process of learning by iteration! There are many different approaches one can take, but there are two key parameters which come up frequently: learning rate and batch size.
Learning rate is basically how much one should update their behavior based on a given lesson. If you lose your $10 bet at the craps table, then the difference between choosing to bet $10, $5, or $1 next round is determined by the size of your learning rate.
Batch size, on the other hand, is how many trials you need before you start learning from them. In the craps table example above, the batch size is 1. But, if you told yourself “losing might have been a fluke, let’s try four more times at $10 before we make any decisions,” that would mean you only change your behavior after 5 total trials – in other words, a batch size of 5.
In machine learning, tuning these parameters is key to not only finding the optimal result faster, but also to help ensure that you find the right answer at all. Choosing the wrong values can result in your system getting stuck in “valleys” – basically, answers that seem correct because there aren’t any small, obvious ways to improve them, even though a big change could result in something far better.
When thinking about how to learn as a group, picking the wrong value for these parameters has the same kinds of costs – basically, it results in you either wasting your time, or finding the wrong answers.
For instance, say you had a super high learning rate. If you tried to build an app and failed, you might update to “I should just never build apps.” But with a low learning rate, there are also issues, because in order to make a nuanced update you need to understand exactly what you’ve done wrong. For instance, “oh, our app failed because our UI was poor – so we can still keep doing most of what we were doing, but need to make sure we build out our UI team first.” While this might be a correct conclusion, it certainly will take a real time investment to confirm!
The tradeoff with batch size is a bit more straightforward – when groups learn, each completed “batch” means the group needs to agree on their next action – which usually happens in a meeting. So, with large batch sizes, you minimize the number of meetings – but you also run the risk of taking a long time to update on information that was available to you, in principle, some time ago.
Different learning rates and batch sizes make sense for different kinds of organizations. Large organizations tend to have low learning rates, because if they’ve gotten this big, it’s probably because they’re pretty effective (at least for this point in time.) In contrast, startups have large learning rates, because they are still exploring both their industries and their own internal organization.
The batch size changes how smoothly these organizations develop. At a large organization, a low batch size means you’re learning smoothly – though slowly, because you’re taking tons of time in meetings to do so! Bridgewater Associates, where I worked for a time (or, at least, my division within Bridgewater while I was there) was a great example of this behavior. In fact, I sometimes heard people speak about “planning for the plan”, in the sense of having a meeting just to figure out how to most effectively run the following meeting! This behavior meant that Bridgewater could eventually squeeze all the useful information out of any particular event – but at the risk of taking a long time and a lot of effort to get there, which might have been more efficiently used applying lessons they’d already learned. In contrast, large organizations with large batch sizes don’t learn continuously – they tweak their behavior only now and again, meaning they are slower to update to new information, but also have more time outside of meetings to act on their existing knowledge. Bridgewater, as a hedge fund, needed to react smoothly to new information, so their decision made sense – but a company working in a less rapidly changing field, such as a clothing manufacturer, would prefer a large batch size.
Startups, with our much higher learning rates, tend to behave pretty differently. One key difference is that the cost of small batch sizes is much less than at larger companies – since there are fewer decision makers, meetings are generally shorter and it’s easier to commit to a new direction.
This results in many startups feeling a strong drive towards small batch sizes to pair with our large learning rates. But looking at machine learning, we might immediately suspect something is wrong – most machine learning systems find that increasing batch size has a hugely positive effect on the performance of the learning system. And in fact, although small batches might seem natural in this case, it turns out the wisdom we’ve learned from machine learning does apply – increasing our batch size helps avoid many possible failures. These include:
- Transaction costs: small batches and large learning rates create chaos, which is challenging to work through to say the least..
- Overindexing on training data: I can’t tell you how many potential investors tried to tell me the obvious right application for Modulate’s technology, before even fully understanding it themselves. It’s easy for startups to convince themselves to believe these arguments (if I agree with this investor, they’ll fund me!) but a funded startup focused on the wrong thing is far, far worse than a poorly funded startup that knows the market it should pursue. Too low a batch size makes it too easy to be influenced by these one-off opinions.
- Finding local – but not global – maximums: If your learning rate is too high, it’s easy to find yourself shifting your focus too frequently. The result is that you land somewhere fairly randomly – guided by the data you happen to come across rather than first principles – and have no guarantee that you’re focusing on the right thing. In the best case, a small batch size with this high learning rate can mean you find the optimum more quickly – but then nothing guarantees you stay there. The only way to ensure you’ll end up at the right answer is systematic thinking, not jumping around.
Putting this together, I’ve found lately that it’s crucial to for startups to keep their batch size large, even if it feels like there’s something new to immediately change to. (Props to Carter, my cofounder at Modulate, who is phenomenal at this sort of consistency and has been a great influence on me here.) Some founders come to this approach through reason, though some simply land their out of sheer unbridled confidence in their idea – which is why that sort of stubbornness, generally considered unwise, can be a real boon for founders. Though of course, too much stubbornness can still be dangerous – you want a large batch size, but not an infinite one. Ultimately, the best decisions can only be made if you strike a careful balance between knowing when to stick to your beliefs, always being willing to question what you think you know, and finding the right cadence to leave yourself time to actually apply your newfound wisdom.