We need to talk about the next data build tool

Data build tools are nothing new; after all, we've had Makefiles since the '70s. Airflow entered in with the big data scene, followed by more recent successors like Prefect. And with the passing of the NoSQL fad, dbt has operationalized SQL.

Despite all of this apparent progress, does it sometimes feel like our tools aren’t keeping up?

What if we could actually track and manage data through its entire life, from being pulled into an organization¹, through analysis², even ML model development³, until it lands in a released product⁴ or analytic dashboard⁵?
What if we could share data the same way we share Git branches?
What if reviewing data was as simple as reviewing code?
What if our data tools could keep track of staleness and versioning, even as data moves across different databases?

Query anything

Knit is flexible enough to query databases, scrape APIs, handle unstructured data, or deal with whatever else might come your way.

Review and debug together

Work independently to get started and collaborate when you’re ready. Share your works in progress and combine efforts at any time.

Run anywhere

Execute on your workstation, on prem, or in the cloud. Mix-and-match deployment options however you choose.

Cache more, work faster

Knit shares data across multiple invocations, even from other users. Spend more time writing queries not waiting for them.

Knit is a new way to collaborate with data dependencies. Inspired by development tools like Make and Git, it supercharges your data tools with capabilities you've come to expect from a modern stack.

Knit creates a dependency graph of all the steps in a data flow. Metadata about this graph and every output that’s produced is tracked each time Knit is invoked. This metadata is lightweight and designed to be efficiently synchronized among workstations and servers. The end result is a flexible way to work collaboratively with data flow dependencies.

Many thanks to Carlos Aguilar, Rohit Parulkar, Brennan Moore, Mike Fotinakis, Sam Bail, Lee Kuczewski, Ben Birnbaum, Sankeerth Garapati, Sharang Phadke, and Michael Stratton for reviewing drafts of this post.

We need to talk about the next data build tool

2022/06/30

Query anything

Review and debug together

Run anywhere

Cache more, work faster