We need to talk about the next data build tool

2022/06/30

Data build tools are nothing new; after all, we've had Makefiles since the '70s. Airflow entered in with the big data scene, followed by more recent successors like Prefect. And with the passing of the NoSQL fad, dbt has operationalized SQL.

Despite all of this apparent progress, does it sometimes feel like our tools aren’t keeping up?

What if we could actually track and manage data through its entire life, from being pulled into an organization1, through analysis2, even ML model development3, until it lands in a released product4 or analytic dashboard5?
What if we could share data the same way we share Git branches?
What if reviewing data was as simple as reviewing code?
What if our data tools could keep track of staleness and versioning, even as data moves across different databases?

Excited about the future of data tooling? Stay in the loop!

Query anything

Knit is flexible enough to query databases, scrape APIs, handle unstructured data, or deal with whatever else might come your way.

Review and debug together

Work independently to get started and collaborate when you’re ready. Share your works in progress and combine efforts at any time.

Run anywhere

Execute on your workstation, on prem, or in the cloud. Mix-and-match deployment options however you choose.

Cache more, work faster

Knit shares data across multiple invocations, even from other users. Spend more time writing queries not waiting for them.

Knit is a new way to collaborate with data dependencies. Inspired by development tools like Make and Git, it supercharges your data tools with capabilities you've come to expect from a modern stack.

Knit creates a dependency graph of all the steps in a data flow. Metadata about this graph and every output that’s produced is tracked each time Knit is invoked. This metadata is lightweight and designed to be efficiently synchronized among workstations and servers. The end result is a flexible way to work collaboratively with data flow dependencies.

Excited about the future of data tooling? Stay in the loop!


Many thanks to Carlos Aguilar, Rohit Parulkar, Brennan Moore, Mike Fotinakis, Sam Bail, Lee Kuczewski, Ben Birnbaum, Sankeerth Garapati, Sharang Phadke, and Michael Stratton for reviewing drafts of this post.