Knit picks: Feb 26

2021/02/26

Hello Knit neophytes! This came out pretty hefty. I think future issues of Knit picks will become briefer as more context is built up.

Soft stuff

Had a lot of high quality conversations this week. It’s fun to see people get excited about the potential (and feels like it karmically balances out some dud conversations I had the week before). In terms of pitch refinement, some Knit points that seem to resonate are:

Maybelline

Knit is not “born with it.” How do we make it zing? Some thoughts I’ve had or heard:

Tech stuff

Did the hard parts of splitting Knit into frontend / backend layers. Quick recap: the backend manages dependencies and executes things, the frontend can be a GUI, notebook, templated code (a la dbt), etc. This took a little while; more on that in a sec.

Once that’s finished up it should be straightforward to run pipelines that use multiple frontends; so I’ll be experimenting with that. Also planning some quality of life improvements for early users (error messages and the like).

Tech spew warning

Refactoring the frontend / backend touches one of the more conceptually tricky parts of Knit. Let’s follow this digression! I think of it as dynamic flow composition. That means any job in a Knit flow can run another Knit flow inside of it. Taking it a step further, the steps in that nested flow can be different depending on data from the parent flow. A meta way to think about it is that Knit flows are data that Knit itself can process.

Knit uses dynamic flow composition to implement a couple features. The most straightforward is running multiple user-defined flows together, such as to combine different frontends. It is also used for partitioning/multiplexing, so you can run the same data analysis over subsets of the input. Some tools can statically compose flows, but this won’t add partitions as your data grows. Other tools rely on partitioning at the data storage level (dbt, Snowflake, Spark), but these usually can only be used with one data storage system.

In the future, I expect dynamic flow composition to be used for other features like packaging and using versioned flows.

I haven’t heard of anything that does dynamic flow composition like Knit. It might be a little crazy. It’s possible it will be hidden as an implementation detail.

Phew that was a trip. I’d be curious if this type of differentiation seems meaningful or if it just sounds like implementation gobbleygook. Or if you just liked clicking the pictures.

OK that’s enough for this edition. Stay classy!