Do You Need to Understand the Math Behind a System to Implement It?


A while ago, someone in a Discord server I’m in asked how much of the math behind a system you need to know to implement it. I thought it was an interesting question, and I felt qualified to answer it, so I ended up writing quite a lengthy response. It just occurred to me that it might also be useful to other people, so I thought I would clean it up a little bit and archive it here.

(I ended up cleaning it up a lot more than expected, actually: turns out that writing messages on Discord, where you expect people to see your words more or less in real-time and know that they can ask follow up questions or respond if they need clarification or want to continue the discussion, is very different from writing a stand-alone blog article, which should explain things in a self-contained way. I ended up adding quite a bit of detail and restructuring the answer a little bit.)

How completely/thoroughly do you need to understand the math behind a system in order to implement it?

I think there are a few levels of “implementation” and “understanding” at play here, and it really depends on what you want to do. Here’s how I think you should think about it: is your implementation going to stick close to the original systems? Are you going to have the same assumptions as those the system was designed under? Is your implementation going to be doing exactly what the original system was intended to do, with only extremely minor changes, if any? Or are you going to be customizing and making changes to your implementation (and by extension, the mathematical system) to better suit your needs? The short answer is that the level of understanding required to successfully implement the system for a purpose somewhat different from that for which it was originally intended is much deeper than the level of understanding required to simply replicate the system in your implementation.

I think there are two main ways to understand math, and while they overlap somewhat, I see them as being mostly separate. Mathematical understanding can happen on a procedural level: this is understanding how to perform a particular procedure, for example. I typically associate high levels of procedural understanding with computational fluency, which is the ability to perform error-free calculations (or carry out the procedure) quickly and efficiently. It’s when doing the math becomes automatic. However, mathematical understanding can also happen on a conceptual level, which is all about understanding why the procedure works and the lineage of the ideas it’s derived from.

You can have a procedural understanding of an area of math without also having conceptual understanding of it. This is super common, actually: unless you specialized in math in university, your math courses have likely been focused on carrying out calculations and attaining computational fluency. Similarly, though it is impossible to have conceptual understanding without also having some level of procedural understanding (it’s hard to understand why a procedure works without actually knowing what it is), someone with a lot of conceptual understanding may not always have a high level of computational fluency. For example, while I have studied linear algebra on a theoretical level, can write proofs involving linear algebra, and likely have a better intuition for using it to solve problems than most people, I am likely much worse at computational tasks (such as matrix diagonalization) than a first year engineering student who has just finished a linear algebra course. I routinely need to remind myself of how to multiply matrices, for example.

If your only goal is to implement a system, you really only need to understand the math deeply enough to gain a procedural understanding of it. It is generally enough to understand it well enough that you can break it down into pseudocode and atomic instructions, which, depending on the specific thing you’re trying to implement and your background, may or may not involve learning a lot of new math.

Quite a few approximation algorithms I’ve been learning about recently, for example, are very easy to understand procedurally once you get past the horrible notation used to describe them in theoretical CS papers. And when I say this, I mean that I could totally describe them to a first year CS student, and they would be able to produce a reasonable implementation. They are extremely simple on a procedural level. However, a lot of the time, I’ve found that the math needed to conceptually understand an algorithm – that is, understand why it works and its limitations – is a lot more involved than the knowledge needed to just implement it. This is true for pretty much all of the approximation algorithms I’m describing here.

If you only have the procedural understanding of a system and you want to start making changes to things (for example, maybe you want to use this procedure for a more general case than was originally intended), that’s when you start running into problems. If you want any chance at anticipating the behaviour of the modified system, you will probably need conceptual understanding. Making changes to something you don’t understand is a recipe for disaster.

I think conceptually understanding what you’re doing becomes especially relevant once you start working at a larger scale, or working with much larger amounts of data. Things that don’t even register as issues at small scales get amplified and can become problems at larger scales, and in the process of scaling your implementation, things can and will go wrong, and you will need to spend time figuring out where exactly it is that your problems are coming from. There are multiple moving parts that could be causing problems – it could be the hardware, it could be that your implementation was bad on a technical level, it could be limitations coming from your choice of programming language, it could be the algorithm itself being ill-suited to what you’re trying to make it do – so understanding what the math really means can help you focus your energy in the right place when you’re trying to fix things.

Here’s an example: as part of the research project I did last summer, I was implementing some algorithms and running tests to compare them, but I didn’t really have a solid understanding of where the algorithm came from or why it worked. Because of time limitations (turns out 4 months is not a ton of time), my own inexperience, and the fact that the focus of my project was less on the theoretical side and more on the applied side, I wasn’t able to gain that theoretical understanding of what was going on with the algorithms. One of the algorithms I was implementing had really only been tested in one context beforehand, and we were using it in a context that was sufficiently different for there to be major implications on the effectiveness of the algorithm, which I didn’t know at the time.

When I was testing this algorithm on larger datasets, it had some behaviour that I found kind of weird based on my understanding of one of the earlier research papers I had referenced during my implementation. This paper had a lot of graphs and statistics that deviated pretty significantly from what I was seeing (in this slightly different context). I assumed that the worse performance I was seeing was due to memory issues, bad implementation choices on my part, and the programming language we were using. I now think those assumptions were partly correct, but nowhere close to the whole story. I think I made some really incorrect conclusions about the results and what was actually going on because of this.

Last term, for my data science course project, I decided to revisit those theoretical papers to try to figure out what was actually going on behind the scenes, and they were a total nightmare to try to understand (I still don’t really understand a good chunk of it). However, going through the process of reconstructing a bunch of the proofs and theoretical guarantees really helped me understand why the code I wrote in the summer was behaving that way, almost four months later. I think having that understanding likely would have saved me a lot of grief and speculation.

So I guess it depends on why you’re implementing the system. If you’re not planning on messing with it, then you probably don’t need to go all that deep.

If you’re planning on messing with it, I think going deeper is a great idea.