Hey neverending knitters! Long time no blog (dev log? post? self-indulgent spam?)!
Where have I been? What have I been up to? What’s new with knit? Is there a God? Quiet, child; I know you have questions. The answers will come with time. Or LSD.
Hold up, what’s knit
Are we even friends? Unsubscribe1.
On self-motivation
Honestly the last year or so of knit has been choppy. I ran into a wall with my first implementation2 of knit (in Rust). There wasn’t anything it couldn’t do, per se, and I still had lots of features3 ready to implement. But something just didn’t sit right. I felt malaise.
So I did what any self-respecting programmer without a deadline does and started from scratch. On the second rewrite (in C++), I was excited to start with a clean slate. This time I was going to Do It Rightâ„¢. I wrote splendid interfaces, typesafe abstractions, testable modules. Happily, I wrote my first flow plan parser (more on that later). But this too stalled. This time I couldn’t blame the weight of a years old code base or unfamiliarity implementing a novel design.
I foundered for a while. Found solace in other pursuits. It’s strangely alienating to feel burned out by your own hobby.
I recently started my third rewrite (in C) of knit. The fact you’re reading this is proof of things going better! I found something, and I’d like to share it.
Tell me more, tell me more, did you get very far?
I dunno, buddy. I said I found something, didn’t say I knew what it was!
It’s probably a lot of tiny lessons. Some technical. Some philosophical. I do know motivation can be an evanescent thing, so I’m striking the iron while its hot4. Welcome to my unrehearsed TED Talk.
On contrarianism
This is the first time I’ve truly worked on knit for myself. No pitches. No demos. No dreams of courting investors or employees or influential people. Even your thoughts, my dear reader, I have unburdened. It’s freedom through IDGAF.
While creativity loves freedom, where it really flourishes is within constraints. You don’t get more limited than vim, C, and the command line! But why?, I hear you ask. I don’t know that I have a good answer yet. I just felt a desire to get back to my roots, like a paleo diet for software development. All this ultraprocessed programming has given me an upset stomach.
Turns out that even the most basic programming environments are pretty darn capable5. It’s thrilling to not have any distractions. What’s trending on Hacker News doesn’t matter. Best practices don’t matter6. Leaking memory doesn’t matter. Tests don’t matter7. Let’s rub off the varnish and resand the boards.
The only thing that matters is building. More writing. Less editing8.
Tower of Babel
But I know you didn’t sign up to hear my talk self-therapy (or did you?).
There are two things I’m particularly excited about in knit right now. One you’ll find out more about in the future. The other is a custom programming language (a DSL, if you will). This language describes what I call the flow plan. You can think of this like a Makefile9; it expresses dependencies between data processes.
Some other orchestration tools require a specific language, like dbt (Jinja-flavored SQL) and Airflow (Python). Others do not use a programming language at all, but just describe data processing steps as a data structure, like DVC (YAML).
Ideally you could have a polyglot data flow. That is, even the flow itself could be described in multiple ways. Imagine combining your Airflow and dbt pipelines into a single DAG.
But most importantly, YAML and Jinja do not spark joy. Thanks Marie.
Cowabunga, dude
Uh, how does having another programming language help express polyglot data flows?
One of the few things to come from the second rewrite was experience with the Ninja build system. Unlike build tools of yore, Ninja is ruthlessly simple (think, fast); even so compared with Makefiles! The compromise is that Ninja build files are incredibly verbose10. In the vast majority of cases, humans aren’t intended to write them by hand. Instead they’ll use any number of other build languages (CMake, Meson) that generate Ninja build files.
This is the same idea behind Knit flow plans. You’re not intended to write these by hand; instead you can describe them in any number of ways11, and a higher level tool would translate that into a Knit plan. Of course, the process of taking some higher level build plan and translating that into a Knit plan is itself a data process.
Long, long ago, in a newsletter far, far away, you may remember that knit can invoke itself (so-called subflows). Well, with the flow plan language, the intention is for knit to run a data flow that outputs a flow plan, which is then used to run itself in a subflow12. Say that five times fast.
What’s it look like?
Wait, why is this a picture of text13, you ask? Because I made it in Figma. /shrug
Superman’s archnemesis
Excuse this section for largely being a wall of code. Enter at your own risk.
The nice thing about Ninja’s ruthless simplicity is that its implementation is also soberingly simple. That’s pretty impressive considering it’s used to build the Chrome browser you’re probably reading this in.
Inspired, I wrote a simple lexer using the same tool as Ninja. re2c is a preprocesser that lets you mix regexes (in specially crafted comments) into your code14. The main logic fits in a few screenfuls.
Now we have a fully customizable foundation of tokens. The next part is a simple parser15. The full code is a little too long to include but here are the main data structures representing the parse tree16.
With this we can work on the next thing I’m excited about! But we’ll save that for…
Next time
While I seemingly dismissed your thoughts earlier, your influence helpfully remains. Perhaps one of these topics piques your gray matter’s appetite. Let me know!
- Stupendous serialized sessions
- Bodacious byte-based B-trees
- Worse is better: a creation case study
- git log git.c
K, knit ya’ lata’, alligata'.
-
You read a footnote. Resubscribe. ↩︎
-
The first implementation of knit was already a spiritual successor of Blocks from a previous job. ↩︎
-
Does your main branch ever become a dumping ground for WIP commits? ↩︎
-
Creativity is legion but expresses itself in spurts. ↩︎
-
Have you read your man pages? POSIX is a beast. ↩︎
-
I did wring my hands over whether to indent 2 or 4 spaces though. ↩︎
-
Obviously, tests do matter. The point is these things only matter insofar as they help to build more quickly. And sad tests make slow programmers. ↩︎
-
Case in point, I’ve found myself writing fewer comments and longer commit messages. I used to think the work product (e.g., comments that go into your code) was what mattered, but now I’m leaning into the process (e.g., commit messages that describe what I was thinking as of a particular time). ↩︎
-
Insistence on tabs notwithstanding, the Makefile syntax has a surprisingly delightful elegance. ↩︎
-
Basically in Ninja, every build rule has to be written explicitly. There are no pattern rules. ↩︎
-
One of the classic divides in expressing data flows is code vs notebooks. Spreadsheets, also, are really just a tabular design to express data dependencies. And relatively uncommon among data analysts but a regular fixture of 3D modeling, other data flows are expressed visually. ↩︎
-
This is another example of a feature implemented with dynamic flow composition. I expect this will be a recurring theme. ↩︎
-
While the syntax in the Figma picture matches up closely to what’s implemented, the flow itself is not representative of how I would expect one to be described in real life. ↩︎
-
Under the hood, re2c translates regexes into optimized DFAs. ↩︎
-
The parser uses handwritten recursive descent. Nothing fancy like Tree-sitter or a context-free grammar. Although, LALRPOP is fun to try to pronounce. It’s also the working title for Charli XCX’s next single. ↩︎
-
The screenshot leaves intact global variables and a comment about memory leaks. You would shudder to see my goto statements. ↩︎