Research Reflections: On Reading (Math and Math-Adjacent) Academic Papers
Last summer, I spent a good chunk of my time doing an undergraduate research project during which I worked on a project largely by myself, under the supervision of a math professor. I then took a graduate level course in a related area this fall, where I investigated the theoretical underpinning behind my summer project. I had no idea what I was doing or what I had gotten myself into. As a result, I learned a lot, and tried to keep note of the various things I had learned. This is the first installment, on what I learned from trying to read math and theoretical CS papers.
1. You should figure out what your goals for reading a paper are before you actually read it.
When reading a paper, it is helpful to know exactly why you’re reading it in order to tailor how you’re going to approach getting what it is that you’re trying to learn out of it. For example, reading a paper to understand, use, and/or build upon a result is somewhat different from reading a paper to digest the techniques and lines of inquiry the authors used so that you can use them in your own work.
Looking through a paper for motivating examples or supporting evidence might look more like skimming the paper and discarding it if it’s useless than actually reading it in depth; even if you end up using it, you might only read a few sections or paragraphs of the paper. If you’re just quoting a well-known result, you might not even need to spend time trying to understand it. If you’re just casually reading the paper to learn about a certain field, you can probably afford to be more shallow in your reading – or, alternatively, you could spend more time going down the rabbit hole of citations in the paper and indulging tangents. The bottom line is that sometimes you want to understand most of the paper, sometimes you want to quote only a small part of it. Sometimes a shallow understanding is enough, sometimes a deeper understanding is required.
Last semester, I did two course projects that required me to write reports about research papers. One course required me to more or less rewrite a section of a paper in my own words, with more details added to the proofs. The other course required me to give an overview of the paper. My approach to writing the reports and studying the papers was very different in both courses. Giving an overview implies that you can get away with a shallow understanding of what happened, but you put the research into context and help the reader get their feet wet. Reimplementing or reinterpreting a paper requires a much deeper understanding of its contents.
It took me months for me to understand the five pages of the paper I wrote the deeper report on. I spent less than two weeks on the overview report.
2. Reading math (or math-adjacent) papers is extremely slow.
In fact, it’s probably some of the slowest reading you will ever do. It once took me 2.5 hours to get through just the introduction of a very short paper. By the end, my brain was tired, all I felt like I’d gained was knowledge of the existence of new vocabulary (which I did not retain), and I still didn’t really understand what the paper was about.
The best way to deal with this is to break the paper down into chunks and read in multiple passes. Trying to understand everything in one shot is doomed to fail; it’s much better to try to read the paper once just to figure out the key terms you don’t understand, read up on those terms, come back to the paper, try to read some more of it, and continually iterate in this fashion. As you get a better and better understanding of the general idea, defining sections that can individually be read and digested is a good follow-up strategy.
I find that experienced academics will just tell you to just read the paper without focusing on the details at first. While this is well-intentioned (and even good) advice, I think it’s useless to students who are new to reading papers because frankly, a paper from a field that’s very new to you just reads like it’s written in a foreign language. A person new to reading papers hasn’t yet built the context or skills to be able to differentiate between key points and auxiliary details or even just gain a general sense of what the paper is trying to say.
3. When reading a paper, typically less than 50% of the paper will actually be relevant to what you actually need.
This is something my research supervisor mentioned offhand in the summer, but I wasn’t really able to appreciate the truth of his statement until much later, when I learned it for myself the hard way. For example, one of the main papers I was reading last summer has five sections in it. It took me several weeks to understand enough of what was going on to realize that three of the five sections had close to zero relevance to the research I was doing.
Honestly, 50% is a really generous figure. Unless the research project is directly relying on that paper as a foundation, I feel like a much more accurate figure is that less than 20%, or even 10%, of a given paper will be relevant to you. The less experience you have with reading papers, the worse you are at figuring out exactly which of the 10% is going to be relevant to you, which is why I’ve spent so much time trying and failing to stumble through papers in full.
I think as I read (or attempted to read) more papers, more of what I was actually trying to understand came into focus, and I had more background knowledge, so I was able to more quickly skip the useless parts. But I think it’s worth noting that it’s totally possible to spend several days reading documents because you thought they might be relevant and then only mention them in passing, or not even end up citing them.
4. Come up with examples!
I spent some time reading a paper called Opinion Dynamics on Discourse Sheaves last summer, which applies techniques from algebraic geometry to study social networking problems via a reduction to networked vector spaces on graphs. Is that sentence confusing to you? Yeah, it was confusing to me too.
What really helped me while I was reading the paper was to shift my engagement from passive (just reading the paper) to active (applying what I was learning from the paper), and the best way to do that was to come up with some really small examples and start messing around with the definitions. (Yeah, I know I should have guessed – “math is not a spectator sport” and all that nonsense.) Even a super tiny example was enough to illustrate what the authors were trying to say, and places that were confusing when I was trying to hold all of the information in my head suddenly would become much more clear when I had a concrete example to work with.
Another case where I saw someone else do this was around two months ago, when I went to my professor’s office hours to ask about a section of a paper I was having a lot of trouble understanding. It was a very interesting interaction because usually, in office hours, you ask questions about things the professor is intimately familiar with (usually the course content) and they answer them with ease, and you forget that they too, are humans who don’t know everything. But I was reading the paper for my course project, which was related to the course content, but (I think) slightly outside of the professor’s main area of interest, and he also found that section confusing when he read it. I got a rare opportunity to watch a professor work through a problem in real time. He pulled out a notepad of paper, started writing stuff down, and drew a whole bunch of examples in order to understand what was going on. I found that experience incredibly enlightening.
I guess the question then becomes “how does one know what kind of examples to create?” or possibly, “where do examples become useful?” or possibly, “how do you create an example that’s the right size to be interesting but not unwieldy?” I can’t really help you with that; I guess it comes with time and experience. In the situation above that led me to seek my prof in office hours, I didn’t understand enough to know that an example would be helpful, and even if I had had the idea of trying examples, I definitely would not have come up with the correct kind of example. So I guess some of it is already having been exposed to the right kind of math, unfortunately.
(Seriously, what is wrong with mathematicians? They’ve managed to create a field that is a never-ending hole. It always feels like my problems in math are created by not knowing enough math from other disciplines of math, which makes me feel like if I ever want to get anywhere in math I will always end up doing more of it than intended. How inconsiderate of them.)
By the way, Opinion Dynamics on Discourse Sheaves is an extremely clear and well-written paper and I use it as my standard for quality and communication in mathematical writing. I have no background in algebraic geometry, limited background in graph theory, and some background in linear algebra, but reading through it and grasping the main ideas was totally doable! It is also very cool and worth reading purely for fun. (I read it purely for fun.)
5. Skimming “the literature” (a.k.a. related papers) for relevant terms and ideas can be a good idea.
I think one of the factors that influences how scary I find a paper to approach is the amount of terminology in it that I don’t understand, and I suspect lots of other people feel the same way. I recently did an in-class presentation about a quantum computing paper, and I kicked off the presentation with a screenshot of the abstract, where all of the words I didn’t understand were highlighted. The class (including the professor) laughed, so maybe I was hitting on something there.
I’m a firm believer in exposure reducing fear and expediting learning. When I’m learning something new, my brain first has to get over the feeling of novelty and get comfortable with the new environment before it unfreezes enough to grapple with and internalize the new concepts. I don’t think anyone is going to explicitly tell you to do this, but I think skimming through the abstracts and introductions of related papers (just look through some papers in the citations, lol) without worrying about whether or not you actually understand everything can be helpful.
Sometimes papers will have “preliminaries” or “background” sections; those are gold. The idea is to start getting familiar with the new terminology, and to get a sense of which new terms are the most important for background (hint: they will likely keep showing up across multiple papers), but in a low-key way before diving deep into the main paper. I find that going back to the original paper afterwards will be a lot less terrifying.
I guess this also goes with the standard advice of taking multiple passes to read a paper. The first passes are probably good for familiarizing yourself, and then the subsequent passes are where you go deeper. But I don’t know, I find something about the different perspectives you get from wandering around and looking at different papers to also makes things a bit less scary, though I guess it does really heighten the confusion levels at first.
6. You will go off on tangents during your research – they can be useful, so do indulge them somewhat, but don’t let them take over your research.
This also goes with the idea that being literate in adjacent topics can be very useful, but when you’re short on time (and you always are), gaining that literacy can quickly become a time sink.
Papers, unlike textbooks, usually don’t even try to pretend to be self-contained documents. They can’t be – how are you supposed to communicate advancements in knowledge if you keep having to spend time and space explaining what’s already known? So while papers might give some sort of introduction, I think that’s mainly done when the authors are introducing new ideas, or possibly when they are heavily borrowing from a different area of research. Otherwise, papers largely assume that you have the same knowledge base as the authors.
Of course, as a random undergrad, I did not come close to having even a fraction of the same background as the authors.
The problem with trying to understand new ideas is that sometimes you need to first understand some adjacent topics for anything to even make sense. Some of this isn’t strictly going to be used: it’s more “cultural” knowledge – but that sort of cultural knowledge can be the difference between actually understanding the idea behind what the author is saying and staring at the page in deep confusion for months. So at first, you might find yourself spending a disproportionate amount of time understanding basics that might not even come up later. For example, I spent quite a bit of time trying to understand approximation algorithms, heuristics, and optimization problems just so I could understand the terminology in the papers I was reading, but I never used the formal definitions or anything like that later.
The trick is figuring out when the tangent has become useless – when you’ve already learned more than enough “background” for what your original goal was and are now just studying the adjacent material for its own sake. Honestly, I am pretty terrible at this. Usually I persist in thinking I haven’t learned enough, but pull myself away from the tangent due to time-constraints, and then only realize in hindsight that maybe I did go deep and could have saved time by switching gears earlier.
The problem is that it’s really hard to tell whether or not the extra knowledge was useful sometimes, and whether or not the extra content you didn’t explicitly need made your life easier in some way.
Ugh, research is messy.