Knit picks: Mar 12

2021/03/12

Hey Knit needlers! It feels like everywhere is unseasonably sunny right now.

Soft stuff

I had a terrible pitch yesterday. It was great! I got a lot of good feedback. My main thought is to ease into the big vision and more technical concepts, which I’ve often been leading with. It’s a little tricky because the elevator pitch needs to deliver the payoff without much buildup.

Hard stuff

My brain was on fire untangling some conceptual knots. For those who know how I think, that meant lots of pacing.

There are two concepts I’ve been wrestling with. One is modularization. Zooming out a bit, a couple issues back we talked about flow composition. I’m now calling that subflows to cut back on jargon. Subflows are kind of like Python modules (anything you can import); they are units of reuse.

As a particular kind of reuse, Knit modules are kind of like Python packages. They can be “precompiled” and encapsulated in a wheel or whatnot. They’re kind of units of distribution, and are prerequisites to an actual distribution system like PyPI. They also define a sort of administrative unit, and will have things like authorship, versions, and clearly defined interfaces (Knit doesn’t implement any of these yet).

I’ve dodged around modules for the most part by keeping all my flows in a single directory. But if the goal is to get to a point where a flow can be shared and run by someone else, that breaks down eventually.

The other concept is external resources. Knit doesn’t want to ever touch your data directly. It just wants to hold onto some resource metadata that describes where the data is actually stored. This is obviously important for large datasets, but it also lets Knit coordinate resources that are not files. For example, Knit should be able to manage tables in a database; I’m pretty excited about that eventual use case!

Knit has had an external storage prototype for a while, so it’s conceptually possible. But it’s been hard to use until—(drumroll)—modules! Modules let us encapsulate a storage backend (or just a store), then drop it into a flow. The store can then be used to save and load external resources.

I should point out that stores are an example of code as data. Unlike almost all other data orchestration tools, Knit does not formally distinguish between code and data. That means Knit can manipulate code like data, and data can be used to implement code. We end up being able to share a lot of mechanisms (like modules) when we treat code and data the same.

The module and external storage stuff is still not quite where I want it to be; it will probably be a bit of an ongoing refinement.

Psyche stuff

This week was a lot more productive than last week. I think the main issue was prioritization. I didn’t feel like I was working on the thing that would unlock the next step.

I probably could have powered through it but was frustrated by some coding hurdles I couldn’t tease apart. Initially I was thinking of Marie Kondo-ing everything into C, but that didn’t seem like such a great idea the next morning.

Stepping back a bit helped. I wrote a lot of things down. Made some sketches. It’s frustrating not to be able to dive into things when you want, but it can really help to tie things back to the big picture.

I’ve mainly been using Notion for task management, but I’m always trying out a couple things. I’m not really that happy with any of them, because at an early stage it’s more of a creative process than a task-oriented one, and I haven’t found a good way to bridge between those. A hybrid I used this week was to dump a bunch of open-ended ideas into Notion, and I added a priority column to classify the top handful. It’s a small tweak that helped.

Other time hacks are setting up morning engagements and having a coffee routine. I’m not a natural early riser.

News stuff

The Covid Tracking Project released their final data update. The grassroots data collection and publication effort spun out from The Atlantic’s frustration with uncoordinated and patchwork data. Data collection was entirely manual, mostly from screenshots of state public health web pages.

Pretty stuff

Ryven is a flow-based visual programming language built on top of Python. Flow-based programming is already prevalent in games and digital art. This one seems like it has potential for more general software development. It’s very early but has already been used for data processing.