[AIEWF Preview] Containing Agent Chaos

[AIEWF Preview] Containing Agent Chaos — Solomon Hykes

Hey everyone, welcome to a Latent Space Learning Pod. This is Alessio, partner and CTO at Decibel, and I’m joined by McCall’s Wix, founder of Small AI. Hello, hello, and today I’m so happy to have Solomon Haix join us. Solomon, you’re most famously the creator of Docker.

Hi, thanks for having me. You started Dagger six years ago, and I think originally it was pitched as some sort of infrastructure provisioning thing. I’m sorry, I’m probably totally mangling this in front of you. How do you introduce Dagger today?

So Dagger is, I think six years, I guess sounds right, yeah. It’s a workflow engine. It’s an automation tool for software teams that would deliver software faster, more efficiently. And it takes all these workflows that are usually semi-automated with artisanal scripts, you know, your builds, your tests, you kind of end-to-end pipelines. And it turns them into robust, modular workflows that you can drive with codes, and it all runs in containers. So it’s highly portable, highly isolated. You can run them locally or in CI, which saves a lot of time.

We’re an open source platform. We’ve got a very active and engaged open source community, mostly made of platform engineers. You know, those systems engineers that actually design the factory and run it and enable the developers on the team to be more productive. So that’s our core community.

Yeah. In some ways, yeah, sorry. I was going to say, just to make that clear, these are both pre-development. So spinning up an environment for people, and then also between you’re done writing the code and getting ready for production. Are those basically like the two entry points?

We started mostly post-development. So anything that happens after you’ve saved and you’re ready to see what happens next, you know, build tests, you want to take that live. So there’s that delivery loop, right? We’ve been focusing on making that delivery loop more efficient because it’s really terrible in most places. And there are a lot of inefficiencies that could be cleaned up.

In a lot of ways, it’s a cobbler’s, you know, the cobbler has no shoes situation. Like those platform engineers, they spend all their energy and their significant experience helping developers get the best possible tooling and the best possible experience. But they themselves for their own tooling, it’s sort of like, okay, we got to cobble this together with Bash scripts and YAML. So we’ve been focusing on that post-development.

Although recently we’re getting pulled into the dev loop in part because of this crazy change that’s sweeping the whole market, right? With agents.

Yeah. So I think let’s kind of run right into that. Obviously, there’s a lot more of context on Dagger and Docker that you’ve done prior, but we, you know, we are an AI podcast. So like, why not? Let’s just go right into it. A few months ago, you messaged me and you were like, I think this is the biggest thing I’ve had since Docker itself. And I was like, what? Like Docker is very big. What is your context going into the AI builders?

In some ways, I’ve always said basically like to us DevTools people, it is just more of the same. Everything that you wanted, you just need 100x more. So how did you approach this?

Yeah, we didn’t think of ourselves as an AI company. We got pulled into it by our community, our users. Because although Dagger is primarily used to build CICD pipelines, historically, we’ve never thought of ourselves as a CICD company. We like to build platforms from first principles. And then we encourage our community to go and apply it.

So what we have is an engine for automating workflows and making them reliable, portable, and giving them very, very clean environments to run in. And the environment is key. And of course, we use containers because that’s what we know. And the container tech still today is underutilized, misunderstood. It can do so much more.

So that our community pulls us into this AI space because of agents. Because these platform engineers start messing with agent loops. They want to insert LLMs into their workflows. And now it’s becoming more popular to run agents in the context of your CI, to automate more parts of your delivery. And then they started showing us that everybody wants to use these coding agents now.

You know, so if you’re a developer, increasingly, your job is not going to be to actually develop, but to manage and enable these coding agents. And we’re at the very beginning of this. You got this one agent in your IDE helping you. But now you want more than one, right? You want a team of agents sort of doing the work for you.

And that transition from a team of one to a team of multiple coders, that’s basically what our community deals with, these platform engineers. So what we’re witnessing is developers becoming platform engineers. They have to learn how to enable others to be productive. These others, of course, are AIs. And to do that, they have to give them environments to work in. You can see the problem when you see someone live streaming their vibe coding, right? Everyone’s vibe coding. But you can kind of make one set of changes at a time. Everyone’s sort of messing with this dev environment that really isn’t cleanly isolated.

So what we’re doing is we’re taking this technology that we invented for CICD and bringing it into the coding agents environment and giving your agent basically a perfectly isolated, reusable, and portable environment so that you’re not completely locked into this one app connected to this one model running on this one cloud infra provider. You want the environment where the agent does its work to be decoupled, to be its own thing that you can manage and look at and then move to another platform if you want. That’s sort of what happened with Docker in the previous wave when everyone was adopting cloud technology. Everyone was building these big platforms. They had everything, but they were highly fragmented. They tried to kind of cobble everything together like a monolith. So you didn’t have this portable environment that you could carry around with you. You were trapped in one big platform. And the same thing’s happening now.

So we want to use our experience from the past to enable this new generation of developers to really unlock the potential of these coding agents. That’s what I’m excited about. I think the scope of what people do for coding agents today is maybe kind of like single VM, let’s call it. Especially the coding agents from the big labs, they don’t even have internet access. They have pre-installed libraries and it’s very, very limited. And it could be so much better if there were standards for it. I think obviously the standard needs to be open source. But there’s a question of the design constraints that you are coding for.

I think that, so for example, Gitpod is another demo and market participant I’ve seen where they’re like, yeah, we really need to isolate this kind of like almost like a mini VPC. Like you need the networking sorted out and you need storage, everything, right? Just all the fundamental units of compute. So I guess like what is the container here? What is that concept here? The unit of isolation. It’s still actual containers as the base layer. That’s my first insight here. You don’t have to reinvent the core technology that already exists, but you got to rethink the tooling because the tooling, as it stands, honestly, it’s a little frustrating for me because we busted our ass on Docker and we built this whole ecosystem.

We invented the Docker file and Docker compose and the format, and we kind of messed up on the follow-through. At some point, Docker basically stopped innovating and I left, and the ecosystem continued around containers, but it didn’t actually pick up where we left off. Everything towards applying container tech to development. There’s still a lot of interesting experiments, but it never had the unity of purpose that this initial movement had. So you have fragmentation, right? No standard really emerged beyond Docker file, Docker compose, and that’s it. All the effort went into infrastructure, you know, Kubernetes, scalable storage, scalable networking. That kept moving a lot.

If you go to KubeCon or any of those events, you see all the infra people applying containers to solving that problem, but for development, for dev environments, it really hasn’t moved that much. Using containers would be my first recommendation, but then, yeah, I just think you got to design a solution from first principles. It’s just a difficult design problem. I’m very excited because there’s an opportunity to go through this design process again from first principles.

I’m aware of Gitpod, of course, you know, there’s dev containers as a standard in IDs, and there’s a million other products out there, but none of them have convinced the majority of humans to develop in them with their tool. It’s very fragmented. In fact, most people don’t develop in a container still. They just develop on their laptop. That’s just a failure of tooling. So yeah, I think it’s anyone’s game.

How to apply container technology to the perfect developer experience? But the criteria are, it has to be well isolated. You should be able to have a bunch of agents working in parallel, and they don’t miss each other’s workup. It has to be portable. It should not be locked to a model or a cloud provider. It should also not be locked to an IDE. That’s crazy. If you think a whole team is going to standardize on the same IDE forever, you’re diluted. It should be fully observable. You should be able to see everything that happens in that environment end-to-end. Everything from what the model’s doing and thinking and saying, all the way to what the tools are actually running, and what’s the state of the environment. You should be able to see everything in one place.

And you need a strong multiplayer element. You need agents and humans to both be able to interact with that environment. So that you can tell an agent, do this. And when the agent says, I did it, you can go and verify. Okay, did you do it? Give me the keyboard for a second. You need all those things.

And right now, I’m not seeing us heading in that direction. I’m seeing us heading in the direction of highly integrated, very vertical, end-to-end monoliths. I don’t want to name names, but it’s definitely a market trend. If you’re selling an IDE right now, and people are asking for more customization for the coding agent’s environment, you’re going to add some proprietary way to customize the environment.

You’re going to add your own observability solution. And when people ask for a way to have the agent work in the background, you’re going to add your own hosting solution to run the agent in the background. That’s a monolith. That’s what fragmentation looks like. And that’s exactly what we were up against in the early days of cloud that led to the creation of Docker.

So we’re going to need some sort of a standard, not on everything, just on the environment. Just that little piece that connects all the other pieces. It’s not the most powerful piece, but it’s the linchpin that connects everything else. The environment in which the agent does work. That should be independent.

What do you think are the biggest limitations to this? So I use Docker to develop and I force the coding agent in the IDE to run commands in the Docker environment. I would say if I had to get feedback on it, one, the agent cannot make changes while inside the container and propagate them back to the Docker file in Docker Compose. So it kind of spends all the cycles and then all that work is kind of lost.

And there’s no AI native interface. I’m busy just doing Docker Compose exec and running the command in the container instead of having a more native way to do it. How do you think about that changing with the new Dagger approach? I mean, yeah, you just got to rethink. There’s just a design process.

You got to, I mean, Docker file was something we designed as a stopgap prototype thinking, oh, we’ll clean this up later in 2013. It’s been more than 10 years. Compose was a clone of a clone that we acquired into the team and stitched on top. As soon as we launched it, you have to understand there was so much excitement and demand. People didn’t want it to move. You would build platforms on top and then you’d be like, don’t change the syntax.

And so that stuff has been frozen in time. I commend Docker for maintaining it and keeping it alive and just doing the hard work and maintaining it. But it’s not agent native. It never will be. So there’s got to go through a good engineering and design process of understanding how people develop with agents and understand what’s the best UX that they need and try to design that on top of technology and components that are as standard as possible.

You don’t want to reinvent the wheel where it’s not necessary, but we’ve got a bunch of standards to work with. We’ve got the container tech, I mean, it’s there, it’s universal. We’ve got Git, we’ve got the LLM, the open AI API spec, and its derivatives. And now we’ve got MCP. So that’s a pretty good set of standards. We can work with that.

But yeah, it’s got to be a new UX, in my opinion. And I mean, not to plug Dagger, but obviously Dagger is our vehicle for going through this design process. If you want to see my particular opinions and how things should be designed and how you want to balance, for example, simplicity versus modularity, then in my case, look at Dagger.

But yeah, it’s normal that you can’t just tape existing tools as is on new workflows and hope it’ll be perfect. That’s totally normal. Something I bring up in my work history a lot, and I think it’s also worked in workflow orchestration in Temporal. This migration of a custom language, like a Docker file, or like an AWS step function type thing into a more programming language, like TypeScript or Go, that is the exact same journey I took. Yeah, I mean, it’s really hard to find the right balance.

I think of it as Lego, because it’s a hard problem to solve, these workflows and these environments, because no two workflows are the same. No two dev environments are the same. It’s like factory design. Every great product has its own factory that’s unique. No one goes and buys a factory at the factory store. You design it, and you build it alongside your products. And that’s what these things are. It’s like a factory.

And it’s really hard to provide tooling for that space, because if you make it too complicated and too customizable for no reason, then you’re wasting people’s time. It’s just, I buy this. I’m just going to do everything myself from scratch. But if you try to streamline and simplify too much, then you’re restricting choice, and then it becomes useless. Well, this doesn’t fit in my factory because my factory uses this system, and I can’t reconfigure it. So it’s a really hard area of design and engineering.

And, I mean, to me, the GOAT in this space is Lego. You know, the actual Lego. And it’s used as an analogy so much that it loses its meaning. But if you really think about what Lego really is, it’s really hard to – it was, in reality, very difficult to design and engineer the Lego brick because it had to be just right. And it’s designed as one component, but it’s designed with a much larger system in mind, right? So there’s sort of these two layers of design. That’s what’s so hard. And that’s why Lego is genius and has stood the test of time.

That’s what everyone in this space should be trying to do, build a better Lego system that is worth adopting. It’s expensive to adopt a new standard in your stack. Like, it’s one more of everything to worry about, right? So it’s got to justify the cost by actually saving – you know, it’s got to save you something, effort, money, whatever. That’s what Lego does when I play. I like to play with Lego because I can assemble things quickly, et cetera. So, yeah, that’s the challenge.

In the case of Temporal, I think these systems are very well positioned also for running agentic systems, right? Because an agent always has some sort of loop that’s triggered by events. It’s asynchronous. You’ve got to run that somewhere. Yeah, I would say Temporal is focused more on the runtime applications, and then other systems are more focused on CI, CD. Honestly, you could use one for the other.

Yeah, but they’re converging because, like, your ACD will soon be nothing more than runtime infrastructure for your workflows, and all those workflows will become agentic, right? It’s going to be either workflows running LLMs, LLMs running workflows, all the way down. And CI is also events, a job dispatcher, and compute. So, I think these things will converge with coding agents being the domain of application where everything meets, right?

Because when you’re running a coding agent, you’re running an agent, so it’s a runtime problem, but it’s a very specific area of application. Like, it’s not real-time. You don’t have to worry about voice and video and things like that. You worry a lot about artifacts that are being produced, and are they repeatable? Can I trace how they were created? Is this binary created by an agent that – a model that went rogue? You know, things like that.

I think one element that I see a lot for these kinds of – like, we’ve talked to Bolt, and there’s all this universe of – let’s call it ephemeral apps that people are making, or vibe-coded apps, single-use apps even. Just because it’s so easy to create an app now that you’re like, I just – it doesn’t matter. So the speed and the setup and the teardown is pretty significant, and then obviously, like, the resource usage and cost. These are all things that I’m hearing the founders in these companies all trying to solve for and all finding the current set of tools lacking.

Because we just don’t have the – we cannot subdivide things that small or that cheaply. We can’t start things up that fast. But, like, the users always want this. That’s why they’re doing the shortcuts. That’s why they’re not using containers. Because they’re like, I don’t know how to do this with containers.

Yeah, and there’s a lot of duct tape. I mean, if you want to use containers yourself, you’ve got to duct tape a bunch of tools together. And it’s not just containers. It’s also, like, the file system isolation. You know, everyone’s playing with Git work trees to try and get multiple instances of the agent working in parallel. That’s the same problem, right? It’s not just containers. It’s something container-based. It’s containers for isolated execution, work trees for isolated files, and it all kind of – you’ve got to connect it together somehow. I do think what I’m seeing – I’m seeing a lot of vendors, of course, think about that and try to find the right solution for their customers.

I think one mistake everyone’s making – again, I’m making a historical parallel with the rise of PaaS. Everyone has this very – only you can do this. Hopefully, I’m not the only one left.

But I think it’s becoming – it’s going to be very apparent very quickly that when someone who sells a commercial cloud product, a cloud-centric product in the AI space, is hosting – they’re hosting your stuff, right? You log in, and they have your data. They have your traces. They run the model or they proxy to model whatever. Everything solution they come up with will be very infra-centric, right? They’re going to think of another hosted feature, another hosted service.

So when they think environments, they think, how can I run these VMs on my infrastructure as fast and cheaply as possible? My scale is going to be X. I have this many customers. The pricing is X. And that’s excellent. But developers also want to run stuff themselves locally. And they don’t really want that to be an afterthought. But right now, it is an afterthought.

And it’s the same mistake CICD vendors made. CICD has no good local story. There are some open-source projects that try to simulate, say, GitHub actions locally. It must be a nightmare to maintain this project because compatibility is just so hard.

But my point is, local execution is not everything, but it’s a good test. Whatever solution you’re imagining, does it support local execution? Will developers be able to run it locally and enjoy it? If the answer is no, you’re solving part of the problem. But you’re not fully solving the problem of standardizing dev environments for coding agents. It’s not going to stand the test of time because it cannot be ubiquitous. It’ll be a great commercial solution. You’ll make lots of money with it. But it’s not going to be ubiquitous.

So, I’m going to ask a really hard question, maybe. What does it take for one of the big clouds or big labs to adopt Dagger?

For example, right? I’ve had this exact same call, same problems with the Microsoft team. And they’re pushing DevContainer. Obviously, DevContainer is not enough, but whatever. They want to build around it. They want to extend it. And I’m like, okay, but, like, everyone’s working on their thing. When do we get some consolidation in this space?

Who knows? Honestly, everyone should just give it their best shot and design the best possible solution. Honestly, I think with open source, to some degree, it depends on the domain. But I think for this problem, scale doesn’t really help as much. If you’re going for foundational models, to take the extreme example, sure, let the best design win.

But also, I know who, you know, I know the 10 names of who’s going to win. It’s going to be one of those 10. Startups don’t really stand a chance. This is very different because it’s not about scale. It’s about developer experience. And it’s about designing the interfaces in a way that actually help people be more productive.

There’s a lot of leverage for small teams to just design the best possible solution. And then you have to convince others to adopt it and build on it. Community and ecosystem are extremely important. If you’re large, like Microsoft, you have an advantage on that. Obviously, there’s a Microsoft ecosystem. It’s massive. But you still need to have a solution, right?

So, like, we’re talking to Microsoft. We’re talking to a whole bunch of people. But there’s a lot in this particular problem that we’re targeting, you know, standardizing the dev environments for agents. You don’t really need permission as a startup. You just go ahead and do it and build momentum. There’s less gatekeeping, I would say.

I think, like, the other thing that does come to mind sometimes is that there are different layers in infrastructure. I think you’re very focused on, I don’t know how to call it, like, virtual hardware. Is that, like, a term that resonates? Like, virtualization?

No. When I look at, let’s say, what I can do with Dagger, it doesn’t, for example, have auth or billing. The higher level, still infrastructure. But there are recombinations of things that are more business-facing than just the hard metal-facing.

So, I think that’s what you’re describing as a consequence of the Lego approach. We’re focusing on a platform problem. You know, how do you create a modular system that can run anywhere, that you can connect to anything? And allows you to compose the ideal environment, the ideal workflow, and it can be ubiquitous because it can be integrated into pretty much any existing system.

In order to do that, you can’t go ahead and design a complete end-to-end solution, right? That’s the price to pay. So, Dagger will always be a component of a bigger platform. It will never be the complete end-to-end platform because in order to do that, you have to force your customer to adopt your authentication and your UI and your storage and your networking, etc. And we do, we’re doing the opposite. We’re telling these platform teams, hey, we’re not, we will adapt to the stack that you have.

And so, for example, we started this conversation with CI-CD because that’s our traditional use case. Dagger makes your CI-CD better, but it does not actually replace your legacy CI platform. It runs on top of it and allows you to simplify it and turn it into basically dumb runner infrastructure. But it’s still important infrastructure.

So we don’t come in and seek to replace what you have. We try to integrate with it and make the overall system better. As a result, it’s just a different shape of a product, right? And we definitely don’t solve off. You have to rely on others to help solve off. But whatever you do, we’ll integrate with it.

Yeah. Cool. I think that that’s plenty for an intro to where Dagger’s at and the pull that you guys are seeing. I’m looking forward to your talk, to be honest. I think I haven’t seen a good Solomon Haik’s AI talk. Well, I don’t know anyone has.

Yeah. So hopefully we’ll see one next week. I know, I know. Feel free to lean on me to prep and all that. But thank you. I’m excited. I think when I saw the workshops and the mission come in, you guys had a really good plan to build code agents, and here’s all the infra things that really tied it together for me.

That was like, okay, this is where Dagger is going. This is how, ultimately, I need the LLMs to start generating their own infrastructure, right? Generative infrastructure almost. I feel like that’s prone to a lot of spend.

Yeah. But it’s going to happen. Controlling the environment is crucial. How much control, how much freedom do I want to give this agent? I mean, we’re all past the stage where these things are just running wild on the internet. So why not?

But to give them my AWS account is tricky. Depending on how things go, there’s a 50% chance that for my keynote, there will be some brand new original stuff to show and talk about that is not yet out, on top of what we’re showing in the workshop and everything. So it depends if it’s ready in time, but we may have even more fresh stuff to talk about.

All right. We’ll let you get back to it. Thanks so much for your time. Thank you. Yeah. Thanks, guys. I’ll see you next week.

Josherich's Blog

[AIEWF Preview] Containing Agent Chaos — Solomon Hykes