Want to Write a Compiler? Just Read These Two Papers (2008)

(prog21.dadgum.com)

237 points | by downbad_ 4 hours ago

25 comments

  • aldousd666 2 minutes ago
    I learned from the Dragon Book, decades ago. I already knew a lot of programming at that point, but I think most people writing compilers do. I'm curious if there really is an audience of people whose first introduction to programming is writing a compiler... I would think not, actually.
  • jll29 3 hours ago
    *Donald Knute -> Donald Ervin Knuth is the author of the book "The Art of Computer Programming" (in progress for a couple of decades, currently volume 4c is being written). It is quite advanced, and it will likely not cover compilers anymore (Addison-Wesley had commissioned a compiler book from Knuth when he was a doctoral candidate, now he is retired and has stated his goal for the series has changed).

    I disagree with the author's point: the "Dragon book"'s ("Compilers: Principles, Techniques, and Tools" by Aho et al.) Chapter 2 is a self-sufficient introduction into compilers from end to end, and it can be read on its own, ignoring the rest of the excellent book.

    Another fantastic intro to compiler writing is the short little book "Compilers" by Niklaus Wirth, which explains and contains the surprisingly short source code of a complete compiler (the whole book is highly understandable - pristine clarity, really) and all in <100 pages total (99).

    (I learned enough from these two sources to write a compiler in high school.)

    • projektfu 47 minutes ago
      The dragon book almost convinced me never to try to write a compiler. I don't know why people recommend it. I guess you're a lot smarter than I am.

      There are some excellent books out there. In its own way, the dragon book is excellent, but it is a terrible starting place.

      Here are a bunch of references from the same vintage as OP. I recommend starting with a book that actually walks through the process of building a compiler and doesn't spend its time exclusively with theory.

      https://news.ycombinator.com/item?id=136875

      • saidnooneever 1 minute ago
        the dragon book is how to write a production grade thing i guess. it has all the interesting concepts very elaborated on which is great but it dives quickly into things that can clutter a project if its just for fun..
      • tovej 37 minutes ago
        I started with the dragon book, and I found it to be a good introductory text.

        A lot of people say the dragon book is difficult, so I suppose there must be something there. But I don't see what it is, I thought it was quite accessible.

        I'm curious, what parts/aspects of the dragon book make it difficult to start with?

        • hmry 6 minutes ago
          It's been a few years since I worked with the dragon book, but I think the most common complaint was that it starts with like 350 pages on parser theory: generating bottom up and top down parsers from context free grammars, optimizing lexers for systems that don't have enough RAM to store an entire source file, etc... before ever getting to what most people who want to write a compiler care about (implementing type inference, optimizing intermediate representations, generating assembly code). Of course parsing is important, and very interesting to some. But there's a reason most modern resources skip over all of that and just make the reader write a recursive descent parser.
    • Findecanor 37 minutes ago
      The "Dragon Book" contains a lot on parsing, but I wouldn't recommend it if you want to make many optimisation passes or a back-end.

      I had bought the first edition when I was in High School back in the '90s. It was my first CS textbook and as a young programmer I learned a lot from it. A couple years ago, I started on a modern compiler back-end, and found that I needed to update my knowledge quite a lot.

      The second edition covers data-flow analysis, which is very important. However, modern compilers (GCC, LLVM, Cranelift) are built around an intermediary representation in Static Single Assignment-form, and the second ed. has only a single page about it, but you'd need to also learn a lot of theory about its properties to actually use it.

      • aldousd666 0 minutes ago
        Parsing is the front end to a compiler. Can't get semantics without first recognizing syntax. I have a hard time thinking about programming languages without seeing them as a parsing exercise first, every time.
    • Hendrikto 2 hours ago
    • antiquark 2 hours ago
      There is still hope for a compiler book. From Knuth's website:

      > And after Volumes 1--5 are done, God willing, I plan to publish Volume 6 (the theory of context-free languages) and Volume 7 (Compiler techniques), but only if the things I want to say about those topics are still relevant and still haven't been said.

      https://www-cs-faculty.stanford.edu/~knuth/taocp.html

      • jcranmer 45 minutes ago
        I don't think there is hope if you look at actuarial tables and Knuth's age. It's not clear to me if he'll be able to finish volume 4. The outline he has seems to have enough material to fill volumes 4C-4G to my eyes, and he isn't exactly cranking out the volumes.

        Admittedly, volumes 5-7 wouldn't be as massive as volume 4 (it sort of turns out that almost all interesting algorithms ends up being categorized as being in volume 4), so you probably wouldn't have a half-dozen subvolumes per topic but, it's still too many books down the line, especially if he plans to revise volumes 1-3 before working on anything else.

      • mghackerlady 32 minutes ago
        I really hope he ends up completing the whole series. I started volume one recently and it is excellent
      • gdwatson 54 minutes ago
        I hope that God is indeed willing, but the man is 88 years old and he’s not done with the third tome of volume four. It would require a minor miracle for him to finish volume 7 within this lifetime.
  • stupefy 1 hour ago
    One nice piece of advice that I received is that books are like RAMs, you do not have to go through them sequentially, but can do random access to the parts of it you need. With this in mind I find it doable to get one the thick books and only read the part that I need for my task.

    But, to also be fair, the above random access method does not work when you don't know what you don't know. So I understand why having a light, but good introduction to the topic is important, and I believe that's what the author is pointing out.

    • commandlinefan 22 minutes ago
      I've seen people suggest that throughout the years, but it's never worked out for me. To get anything meaningful out of a printed book, I've had to read them cover to cover. There used to be worthwhile reference books, but those have moved on to the internet.
  • soegaard 3 hours ago
    An Incremental Approach to Compiler Construction

    Abdulaziz Ghuloum

    http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf

    Abstract

    Compilers are perceived to be magical artifacts, carefully crafted by the wizards, and unfathomable by the mere mortals. Books on compilers are better described as wizard-talk: written by and for a clique of all-knowing practitioners. Real-life compilers are too complex to serve as an educational tool. And the gap between real-life compilers and the educational toy compilers is too wide. The novice compiler writer stands puzzled facing an impenetrable barrier, “better write an interpreter instead.”

    The goal of this paper is to break that barrier. We show that building a compiler can be as easy as building an interpreter. The compiler we construct accepts a large subset of the Scheme programming language and produces assembly code for the Intel-x86 architecture, the dominant architecture of personal computing. The development of the compiler is broken into many small incremental steps. Every step yields a fully working compiler for a progressively expanding subset of Scheme. Every compiler step produces real assembly code that can be assembled then executed directly by the hardware. We assume that the reader is familiar with the basic computer architecture: its components and execution model. Detailed knowledge of the Intel-x86 architecture is not required.

    The development of the compiler is described in detail in an extended tutorial. Supporting material for the tutorial such as an automated testing facility coupled with a comprehensive test suite are provided with the tutorial. It is our hope that current and future implementors of Scheme find in this paper the motivation for developing high-performance compilers and the means for achieving that goal.

  • armchairhacker 3 hours ago
    Nowadays I’ve heard recommended Crafting Interpreters. (https://craftinginterpreters.com)

    The Nanopass paper link doesn’t work.

    • orthoxerox 1 hour ago
      Crafting Interpreters is great, I wish it had a companion book that covered:

        - types and typing
        - optimization passes
        - object files, executables, libraries and linking
      
      Then two of them would be sufficient for writing a compiler.
      • ux266478 15 minutes ago
        > types and typing

        This would be like asking for a book on designing grammar. It's just too disjoint of a field to have any kind of reasonable baseline, and it's drop dead easy to grok a basic one together. With those two things being equal, just like with grammar, the answer to this is any resource about implementing the language you're trying to ape.

        • orthoxerox 3 minutes ago
          It's drop dead easy to grok a basic one together until you get to hairy stuff like overloading, lambdas and generics.

          The reasonable baseline would be something like Java 1. Scalars, arrays and classes. If I remember correctly, Lox even skips arrays as an exercise for the user.

      • gavinray 24 minutes ago
        To your last point, "Linkers and Loaders" has no equal despite being a bit dated
      • duped 11 minutes ago
        > types and typing

        Types and Programming Languages, Benjamin C Pierce

        > object files, executables, libraries and linking

        Linkers and Loaders, John R Levine

        • orthoxerox 7 minutes ago
          I've read Pierce. It's not a bad book, but less grounded than CI, which has an explicit "workmanlike" approach.
    • gobdovan 2 hours ago
      Compilers are broad enough that when someone recommends a "compiler book", it's rarely exactly the slice you wanted.

      So this made me do a runnable cheat sheet for Crafting Interpreters. I keep parsing demonstrative, and the AST is a little more Lisp-y than the book's.

      Disclaimer: it's meant to convey the essence of what you'll learn, it is NOT by any means a replacement for the book. I'd also describe the book as more of an experience (including some things Nystrom clearly enjoyed, like the visitor pattern) than a compilers manual. If anyone's interested, I can do a separate visitor-pattern cheat sheet too, also in Python.

      I turned it into a 'public-facing artifact' from private scripts with an AI agent.

      [0] https://ouatu.ro/blog/crafting-interpreters-cheat-sheet/

    • tmountain 3 hours ago
      Incredible book for self guided learning!
    • ramon156 3 hours ago
      Awesome course! finished it while i was doing my final CS year because I had to wait on a bunch of classes (and frankly had no one to talk to before classes). I haven't tried nanopass, but there's other links that work, so I'll give it a go.
  • blueybingo 1 hour ago
    the article's framing around nanopass is undersold: the real insight isn't the number of passes but that each pass has an explicit input and output language, which forces you to think about what invariants hold at each stage. that discipline alone catches a suprising number of bugs before you even run the compiler. crenshaw is fantastic but this structural thinking is what separates toy compilers from ones you can actaully extend later.
  • morphle 3 hours ago
    Compiler writing has progressed a lot. Notably in meta compilers [1] written in a few hundred lines of code and adaptive compilation [3] and just in time compilers. Alan Kay's research group VPRi tackled the problems of complexity (in writing compilers) [4].

    [1] Ometa https://tinlizzie.org/VPRIPapers/tr2007003_ometa.pdf

    [2] Other ometa papers https://tinlizzie.org/IA/index.php/Papers_from_Viewpoints_Re...

    [3] Adaptive compilation https://youtu.be/CfYnzVxdwZE?t=4575

    the PhD thesis https://www.researchgate.net/publication/309254446_Adaptive_...

    [4] Is it really "Complex"? Or did we just make it "Complicated"? Alan Kay https://youtu.be/ubaX1Smg6pY?t=3605

  • omcnoe 3 hours ago
    Been working on a toy compiler for fun recently.

    I have ignored all the stuff about parsing theory, parser generators, custom DSL's, formal grammers etc. and instead have just been using the wonderful Megaparsec parser combinator library. I can easily follow the parsing logic, it's unambiguous (only one successful parse is possible, even if it might not be what you intended), it's easy to compose and re-use parser functions (was particularly helpful for whitespace sensitive parsing/line-fold handling), and it removes the tedious lexer/parser split you get with traditional parsing approaches.

    • armchairhacker 1 hour ago
      It seems to me LL and LR parser generators are overrated, and hand-written recursive descent is best in practice. I understand why academics teach them, but not why some spend so long on different parsing techniques, nor why hobbyists who just want to compile their toy language are directed to them.

      I work in PL, and from my first compiler to today, have always found recursive descent easiest, most effective (less bugs, better error diagnostics, fast enough), and intuitive. Many popular language compilers use recursive descent: I know at least C# (Roslyn) and Rust, but I believe most except Haskell (GHC) and ocaml.

      The LR algorithm was simple once I learned it, and yacc-like LR (and antlr-like LL) parser generators were straightforward once I learned how to resolve conflicts. But recursive descent (at least to me) is simpler and more straightforward.

      LR being more expressive than LL has never mattered. A hand-written recursive descent parser is most expressive: it has unlimited lookahead, and can modify parsed AST nodes (e.g. reordering for precedence, converting if into if-else).

      The only solution that comes close is tree-sitter, because it implements GLR, provides helpful conflict messages, and provides basic IDE support (e.g. syntax highlighting) almost for free. But it’s a build dependency, while recursive descent parsers can be written in most languages with zero dependencies and minimal boilerplate.

      • _old_dude_ 1 hour ago
        Parser generators are great in Python (Lark for me) so you can iterate fast and get a runtime spec of your grammar.

        A hand-written recursive descent parser is something you do later when you start to industrialize your code, to get better error messages, make the parser incremental, etc.

        Bison/ANTLR are code generators, they do not fit well in that model.

    • mrkeen 2 hours ago
      I'll push back and say that the lexer/parser split is well worth it.

      And the best thing about the parser combinator approach is that each is just a kind of parser, something like

        type Lexer = ParsecT e ByteString m [Token]
      
        type Parser = ParsecT e [Token] Expr
      
      All the usual helper functions like many or sepBy work equally well in the lexing and parsing phases.

      It really beats getting to the parentheses-interacting-with-ordering-of-division-operations stage and still having to think "have I already trimmed off the whitespace here or not?"

  • GCUMstlyHarmls 3 hours ago
    Nanopass paper seems to be dead but can be found here at least https://stanleymiracle.github.io/blogs/compiler/docs/extra/n...
  • itsmemattchung 4 hours ago
    It's been about 4 years since I took a compilers course (from OMSCS, graduate program) and still shutter ... it was, hands down, the most difficult (yet rewarding) classes I've taken.
    • nirvdrum 3 hours ago
      Based on another reply I can’t tell if there’s a clever window-based pun that I’m missing. If not, I think you want “shudder” and not “shutter” here. I’m sorry if I just ruined the joke.
    • tjarjoura 2 hours ago
      What did you find more painful about compilers than other forms of programming?
      • kuboble 2 hours ago
        I think there is a million ways to make a compilers course.

        The course I did was organized perfectly with big parts of compiler boiler plate already written, and I only had to implement parser/lexer rules and the translation of language structures into assembly instructions. Also it was a compiler for a language designed just for this course with the intention of it being specifically easy to write a compiler for it and not programming.

        Without this I can imagine it being a painful experience

    • msla 4 hours ago
      It made me drink myself Venetian blind.
    • Pay08 3 hours ago
      I would like to agree with this comment, but I certainly didn't find it rewarding. It was pure pain.
      • kuboble 3 hours ago
        10 years ago I took few coursera courses to fill the gaps in my computer science education.

        One of them was a compilers course done by karpathy. It was pure joy and a great learning experience.

        Also in my experience the joy of doing a course was much stronger correlated with the teacher's qualities rather than the subject itself.

        • jgwil2 29 minutes ago
          Do you have a link by by chance? A quick search didn't turn anything up.
  • fzeindl 3 hours ago
    A similarly scoped book series is „AI game programming wisdom“, which contains a multitude of chapters that focus on diverse, individual algorithms that can be practically used in games for a variety of usecases.
  • bradley13 3 hours ago
    Maybe I'm missing the point of this article? Writing a simple compiler is not difficult. It's not something for beginners, but towards the end of a serious CS degree program it is absolutely do-able. Parsing, transforming into some lower-level representation, even optimizations - it's all fun really not that difficult. I still have my copy of the "Dragon Book", which is where I originally learned about this stuff.

    In fact, inventing new programming languages and writing compilers for them used to be so much of a trend that people created YACC (Yet Another Compiler Compiler) to make it easier.

  • krtkush 4 hours ago
    I wonder if it makes sense to do the nand2tetris course for an absolute beginner since it too has compiler creation in it.
    • wartywhoa23 3 hours ago
      I highly recommend nand2tetris to everyone. For me, nothing ever explained the whole domain from logic gates and inner workings of a CPU to compilers better than this course.
      • atan2 54 minutes ago
        I think it's worth mentioning Gustavo Pezzi's lectures at pikuma.com. The one on "Digital Electronics" and the one on "Interpreters & Compilers" really helped me.
      • wartywhoa23 3 hours ago
        On a side note, why is imrozim's comment dead? What in the world is wrong with it? It's perfectly fine IMO.
    • imrozim 3 hours ago
      [dead]
  • petcat 3 hours ago
    And Nystrom's book
    • vlaaad 2 hours ago
      Yeah, I really enjoyed Crafting Interpreters, wholeheartedly recommend!
  • LiamPowell 3 hours ago
    See also, Andy Keep's dissertation [1] and his talk at Clojure/Conj 2013 [2].

    I think that the nanopass architecture is especially well suited for compilers implemented by LLMs as they're excellent at performing small and well defined pieces of work. I'd love to see Anthropic try their C compiler experiment again but with a Nanopass framework to build on.

    I've recently been looking in to adding Nanopass support to Langkit, which would allow for writing a Nanopass compiler in Ada, Java, Python, or a few other languages [3].

    [1]: https://andykeep.com/pubs/dissertation.pdf

    [2]: https://www.youtube.com/watch?v=Os7FE3J-U5Q

    [3]: https://github.com/AdaCore/langkit/issues/668

  • ape4 2 hours ago
    Would a practical approach be parsing the source into clang's AST format. Then let it make the actual executable.
  • downbad_ 4 hours ago
  • msla 4 hours ago
    fanf2 on Dec 25, 2015 [dead] | parent | prev | next [–]

    I quite like "understanding and writing compilers" by Richard Bornat - written in the 1970s using BCPL as the implementation language, so rather old-fashioned, but it gives a friendly gentle overview of how to do it, without excessive quantities of parsing theory.

  • voidUpdate 2 hours ago
    I've been having a look at the Crenshaw series, and it seems pretty good, but one thing that kinda annoys me is the baked-in line wrapping. Is there a way to unwrap the text so its not all in a small area on the left of my screen?
  • baarse 1 hour ago
    [dead]
  • zionpi 3 hours ago
    [dead]
  • consomida 3 hours ago
    [dead]
  • imrozim 3 hours ago
    [dead]
  • rdevilla 3 hours ago
    [flagged]
    • voidUpdate 2 hours ago
      I've been handwriting my website since forever. That way I don't have to also load megabytes of JS libraries, and it keeps it nice and lean (ish)
    • KnuthIsGod 3 hours ago
      Some of us enjoy intellectual challenges....

      The Bornat book looks great.

      The fact that it is in BCPL is ultra cool.

      • rdevilla 3 hours ago
        Liking intellectual challenges on Hacker News is very 2008. It's 2026, the AI will write a compiler in 5 minutes, no headache required.

        If you're still playing with Rubik's cubes you're going to get left behind.

        • guitarlimeo 3 hours ago
          > you're going to get left behind.

          What's wrong with that? Why do you fear getting left behind? This is just fearmongering.

          Mind you these are legitimate interests of people and in most of cases probably not related to professional work.

          And lol @ "It's 2026, the AI will write a compiler in 5 minutes, no headache required.", no it will not, have you seen Anthropic's post about Claude writing the "C compiler"?

          • weavie 2 hours ago
            I think this chap is just joking.
    • noosphr 3 hours ago
      lol
  • lateforwork 51 minutes ago
    These days there's an even easier way to learn to write a compiler. Just ask Claude to write a simple compiler. Here's a simple C compiler (under 1500 lines) written by Claude: https://github.com/Rajeev-K/c-compiler It can compile and run C programs for sorting and searching. The code is very readable and very easy to understand.
    • voidfunc 13 minutes ago
      For those of us that learn better by taking something and tinkering with it this is definitely the better approach.

      Ive never been a good book learner but I love taking apart and tinkering with something to learn. A small toy compiler is way better than any book and its not like the LLM didnt absorb the book anyways during training.

      • lateforwork 5 minutes ago
        Exactly! Writing a compiler is not rocket science if you know assembly language. You can pick up the gist in an hour or two by looking at a simple toy compiler.
    • angusturner 43 minutes ago
      why read that, vs an actually well-written compiler though?
      • lateforwork 25 minutes ago
        Because an actual compiler would be tens of thousands of lines and most of it is going to be perf optimization. If you want to get the big picture first, read a simple working compiler that has all the key parts, such as a lexer, abstract syntax tree, parser, code generator and so on.