It’s incredibly easy, in any field, to not realize how much of what you say is jargon. This may be particularly true in programming, which much of society continues to treat more as magic than something which they might be able to understand.
There are lots of resources out there which try to explain what programming is – and of course, a savvy nontechnical person can still learn certain things. Work alongside a programmer and you’ll probably realize that the stuff they write is “code”, the output are “programs”, the stuff that’s fed in is “data” or maybe “parameters.”
When I first started work as a programmer, I relied on this kind of vocabulary-by-immersion process to help those around me understand what I was doing. As I saw them become comfortable with more vocab terms related to the engineering side, I became more confident they understood what I was doing.
That lasted until one of my coworkers came up to me one day and asked a simple question. “How long does it take to write a page of code?”
What a natural question! After all, we’re writing text, and text is generally measured in pages – it’s a very reasonable way to think about things. If all you knew was that code was instructions for computers, you might easily ask this question, the same way you’d ask about the length of instructions for building a table to understand how complex the task was.
But a programmer knows that when it comes to code, the far greater challenge is determining the most effective instructions for the job, not actually writing those final instructions down. In other words, this question reveals an enormous gap in intuition around what code is and how it works.
In an attempt to remedy this, I prepared an hour-long programming tutorial course, which I ran for several of my colleagues in an attempt to help them gain more intuition for programming (specifically, for webserver development in this case.) I approached this from two angles – outlining the building blocks in more detail, and drawing on intuition from other parts of life. This latter piece came from the glossary, which outlined a rather involved analogy between programming and cooking, and which I found to almost universally be the most valuable piece of what I’d put together.
I can’t tell you how many times I’ve heard the equivalent of “page of code” in discussions about AI and machine learning. There are, at this point, thousands of articles out there attempting to explain the technical building blocks used here, but fairly little that dares to draw on real intuition from elsewhere in the world. (I suspect this comes from a combination of AI’s reputation as a “black box” or fundamentally different world, like quantum vs classical physics, and is exacerbated by our tendency as technologists to recoil from all but the most precise descriptions of our work.) I could spend hours talking about other issues – like the difference between AI and ML – but many others have already done so.
If there’s anything I can add to this conversation, I felt it would be to try to offer a new kind of intuition for how machine learning can work. Thus, without further ado, I present Mike’s Not Technically Correct But Nevertheless Useful AI Framework.
Our intuition in thinking about artificial intelligence will come from intelligence in the real world – but not, as many people might suggest, by comparing to the human brain. After all, most of us have nearly no intuition for how the human brain works! But we do have intuition for group intelligence – that is, how teams function together. And there’s a lot we can learn about AI by leveraging this.
To start, let’s talk about a couple of key terms.
As the articles I’ve linked above mention, AI is a loose, imprecise term for anything artificial which we feel appears to be…well, intelligent…at a given task. Machine learning (ML) is an approach towards achieving this outcome, by enabling the program to learn until it reaches the necessary level of intelligence.
Say you have a group of architects. They’ve been tasked with taking customer requests for new buildings, and ultimately producing full designs for those buildings. If you know exactly the right people you need for the job, and exactly the rules they can follow to incorporate any customer request into a viable building, then you can build your team perfectly right away – this is the equivalent of writing a normal, non-machine learning algorithm.
But designing houses is hard, and customer requests might be vague, nonsensical, or simply uniquely complicated. So you’ll need your group to be able to learn over time and continue to improve their skills at handling any kind of requests.
Perhaps you first test this by giving them a few “mock” runs, where you play the role of the customer and give them some fake requests to make sure they produce reasonable designs. This is known in machine learning as training time, and the mock requests you give them are their training data. Then, once you deem them good enough, you let them loose to work with real customers – which is known as inference time.
In some machine learning algorithms, you continue improving even during inference time – these are called online algorithms, and are great when you want to remain extremely flexible to new customer requests, since the group keeps evolving. Alternately, offline algorithms don’t train during inference time – the equivalent of saying that the group never debriefs or learns from their mistakes around fulfilling customer requests, and only improves when running additional mock sessions. This probably sounds like a bad idea to you, and indeed, it provides less flexibility – but the tradeoff is that it produces much more consistent and predictable behavior, which can be extremely valuable. (For instance, perhaps you have a customer who briefly shows up and demands their house be build fully out of straws. If the team can reorganize itself to handle this single crazy demand, perhaps they replace their steel-specialist with a straw-specialist. This helps them out in the short term, but after this customer is gone, they’re working with one less specialist in useful materials than they had before.)
So, let’s summarize the key vocabulary here.
|In AI||In Groups|
|The “Entity” that is trying to|
complete a kind of task
|Label for an Entity that seems pretty clever about how it handles the task||Artificial Intelligence (AI)||Effective / Successful Group|
|How the Entity attempts to improve at solving the tasks||Machine Learning (ML)||Strategic Plan|
|The kind of task the Entity is trying to do||Training Objective||Mission|
|When the Entity practices or |
plans how to do the tasks
|Training Time||Practice / Preparation|
|When the Entity actually does |
|Continuing to learn even while working on |
|Online Algorithm||Learning on the job|
|Only learning during practice sessions, but |
not from doing the tasks for real
|Offline Algorithm||Following exactly the previously determined plan|
Now that we have the basics out of the way, let’s talk in more detail about neural networks. Unlike “AI” and “ML”, which are imprecise labels for certain kinds of approaches, the phrase “neural network” is a precise, specific technical term. It refers to a very specific software approach – or, going back to groups, a very specific way of organizing your group.
Let’s continue discussing your architecture firm. You’ll need a customer-facing team to collect the full set of requests and constraints from each customer (the input). This information might then be transmitted to an engineering team, which reviews it, and puts together an additional set of constraints that the building must adhere to (the engineering specs). This new information is then conveyed to the design team, which produces yet more new information (the building designs). I could keep going, but to keep it simple, let’s take say that these final building designs are our finished output.
Each of these teams – customer-facing, engineering, design – are different layers of the neural network. They are each themselves made of individual team members – or, in ML speak, nodes. And the interim forms the information takes – for instance, the engineering specs, which are neither the input nor the output but something in between – are known as latent space representations of the input data.
How these layers and nodes (teams and team members) are organized tells you what kind of neural network you’re working with. For instance, deep neural networks (DNNs), the use of which is often just called deep learning, are simply neural networks with a large number of layers. Convolutional neural networks (CNNs) are organized to take into account the fact that the input data is spatial in nature, while recurrent neural networks (RNNs) are organized to better handle timeseries. You generally don’t need a CNN for data that is laid out spatially (such as images) but as you might expect it generally helps to be able to focus your teams / layers to specialize in exactly what it is that they are processing. (For instance, you might have one arm of your firm focus fully on industrial buildings, while another focuses on individual homes. The industrial arm might have a heavy-duty customer relations team, while the individual home arm might have a more informal setup in order to make the individual homeowners feel more at ease.)
Now that we’ve talked about how a neural network might be organized, we need to talk about how it’s trained – or in other words, how groups become better at doing their tasks. But first, let’s summarize this recent vocab.
|In AI||In Groups|
|The way the Entity is internally |
|Neural Network||Org Chart|
|The pieces the Entity is organized into||Layers||Teams|
|The participating components of the Entity||Nodes||Team Members|
|The data provided to the Entity to get it started on the task||Input||Customer Requests|
|The distilled version of the data a given team received that it ends up sharing with the next team||Latent Space Representation||Engineering Specs|
|The final data produced by the Entity||Output||Final Designs|
|An Entity with a large number of pieces||Deep Neural Network (DNN)||Large Corporation|
|Specific ways of organizing the Entity to |
perform better at certain kinds of tasks
|Convolutional Neural Network (CNN) or|
Recurrent Neural Network (RNN)
|Scrum or Waterfall; |
Cultures which are specifically suited to work in a particular industry; etc
OK, so how do neural networks learn? Through feedback – or in ML parlance, backpropagation. Picture a designer being asked by the customer, “hey, why does this building have so few windows?” The designer confirms that the engineering specs requested only a few windows, so they route it to the engineer. The engineer discovers that this came about because the customer relations team had reported it “wasn’t important” how many windows there were, and decreasing the number had led to improved structural integrity. So the lesson learned is that customer relations team didn’t understand the customer request well enough, as well as that the engineering team might want to bias a bit less towards getting rid of all windows not specifically asked for. In other words, each layer updates how it processes its information, becoming more useful thanks to this feedback.
Backpropagation is similar. When the neural network produces any attempt at an answer, that answer is judged by something known as the loss function, and any errors found are passed back to the last layer. That layer basically assesses how much of the error is their fault as opposed to coming from layers before them, and both updates themselves based on their errors and passes the rest of the feedback back to the previous layer. This process keeps going until everyone has improved based on the feedback given.
Of course, what the group learns is based on what it practices. If your group only ever practices building simple homes, they might struggle when asked to build a skyscraper instead. Similarly, neural networks learn based on the training data they are given, and can’t be expected to magically generalize to anything they haven’t seen before. This makes it crucial, especially in offline algorithms, to ensure the training data is statistically similar to the real distribution of data the algorithm will actually encounter – otherwise you will be creating input bias (we’ve only seen homes so can’t build skyscrapers.) Similarly, if your neural network only ever sees images of men, it will make errors when confronted with pictures of women – not because neural nets are themselves less able to handle pictures of women, but because it simply wasn’t taught using the right data.
There’s a contrasting type of bias, which occurs when your loss function is not actually judging what you want it to judge. I call this objective bias. For instance, say your customers never intend to live in the houses you design, and only care about them looking impressive from the outside. In this case, it’s easy to imagine starting to cut corners when it comes to the inside of the house, even if you and your team have practiced designing complete houses many times before.
Let’s summarize these new terms.
|In AI||In Groups|
|How the Entity learns whether or not it did a good job||Backpropagation||Feedback|
|How success / failure is measured||Loss Function||KPIs / Metrics|
|The example tasks the Entity practices on ahead of time||Training Data||Practice Runs|
|Limitations to the ability of the Entity to succeed at all tasks because of its practice runs being focused only on a subset of the tasks it should care about||Input Bias||Only seeing houses and then having to build a skyscraper|
|Limitations to the ability of the Entity to succeed at tasks fully because it’s only being judged on a portion of how it’s done the task||Objective Bias||Only being judged on the outside of the house when the inside also matters|
Hopefully this groups <—> AI analogy makes sense – and, if nothing else, hasn’t hurt your understanding of machine learning concepts. But, as I mentioned at the beginning, the real power of this analogy is that it’s sufficiently close that you can start to use your intuition about groups to help think about machine learning in new and unexpected ways.
For instance, a commonly understood problem in team building is what’s known as the key-person problem – the risk of one person being the only one who knows how to, say, build buildings more than five stories tall, leaving the team helpless if that one person leaves. If you asked if this logic could be applied to neural networks, you would discover dropout. Dropout is a common training technique that only allows randomly selected subsets of nodes (team members) to help at any given moment, which forces the neural net on the whole to train all its nodes broadly enough that no single node is absolutely required. The equivalent is saying “that one person can’t be on all the skyscraper projects”, which forces the rest of the team to learn about tall buildings as well.
Similarly, you might imagine the team receiving feedback not simply by looking at metrics about how their customers tend to respond, but based on a whole other “quality assurance” group analyzing their work on behalf of their customers. This is the concept of generative adversarial networks, or GANs, in which one neural network (the generator) attempts to complete the task as normal, but receives its feedback from a second neural network (the discriminator) which can learn and change based on the kind of errors it sees the generator making. Of course, the quality assurance group needs to learn too – it needs to understand what your customers are actually looking for, as otherwise it might simply tell the generator – the original team – that their designs are bad, when in fact the customers would have loved them!
Of course, I hope it goes without saying that this is a leaky abstraction. Please don’t make the mistake of assuming that, because something makes sense for groups, it must apply to machine learning. But hopefully, relying on this analogy will allow you to make more sense of the places where neural networks succeed and fail – and perhaps even contribute your own expertise and unique perspective to help push the field of AI forward!