Show HN: Smol machines – subsecond coldstart, portable virtual machines

(github.com)

134 points | by binsquare 3 hours ago

19 comments

binsquare 3 hours ago
Hello, I'm building a replacement for docker containers with a virtual machine with the ergonomics of containers + subsecond start times.
I worked in AWS previously in the container space + with firecracker. I realized the container is an unnecessary layer that slowed things down + firecracker was a technology designed for AWS org structure + usecase.
So I ended up building a hybrid taking the best of containers with the best of firecracker.
Let me know your thoughts, thanks!
[-]
- PufPufPuf 1 hour ago
  Hey this is super cool. I've been researching tech like this for my AI sandboxing solution, ended up with Lima+Incus: https://github.com/JanPokorny/locki
  My problem with microVMs was that they usually won't run docker / kubernetes, I work on apps that consist of whole kubernetes clusters and want the sandbox to contain all that.
  Does your solution support running k3s for example?
  [-]
  - fqiao 36 minutes ago
    we will evaluate. I created this issue to track this: https://github.com/smol-machines/smolvm/issues/150
    Really appreciate the feedback!
- topspin 1 hour ago
  What is the status of supporting live migration?
  That's the one feature of similar systems that always gets left out. I understand why: it's not a priority for "cloud native" workloads. The world, however, has work loads that are not cloud native, because that comes at a high cost, and it always will. So if you'd like a real value-add differentiator for your micro-VM platform (beyond what I believe you already have,) there you go.
  Otherwise this looks pretty compelling.
  [-]
  - genxy 45 minutes ago
    It helps if you offer a concrete use case, as in how large the heap is, what kinda of blackout period you can handle, and whether the app can handle all of it's open connections being destroyed, etc. The more an app can handle resetting some of it's own state, the easier LM is going to be to implement. If your workload jives with CRIU https://github.com/checkpoint-restore/criu you could do this already.
    By what I assume is your definition, there are plenty of "non cloud native" workloads running on clouds that need live migration. Azure and GCP use LM behind the scenes to give the illusion of long uptime hosts. Guest VMs are moved around for host maintenance.
    [-]
    - topspin 37 minutes ago
      "Azure and GCP use LM behind the scenes"
      As does OCI, and (relatively recently) AWS. That's a lot of votes.
      Use case: some legacy database VM needs to move because the host needs maintenance, the storage is on a iSCSI/NFS/NVMe-oF array somewhere, and clients are just smart enough to transparently handle a brief disconnect/reconnect (which is built-in to essentially every such database connection pool stack today.)
      Use case: a web app platform (node/spring/django/rails/whatever) with a bunch of cached client state needs to move because the host needs maintenance. The developers haven't done all the legwork to make the state survive restart, and they'll likely never get time needed to do that. That's essentially the same use case as previous. It's also rampant.
      Use case: a long running batch process (training, etc.) needs to move because reasons, and ops can't wait for it to stop, and they can't kill it because time==money.
      "as in how large the heap is"
      That's an undecidable moving target, so let the user worry about it. Trust them to figure out what what is feasible given the capabilities of their hardware and their talent. They'll do fine if you provide the mechanism.
      "CRIU"
      Dormant, and still containers. Also, it's re-solving solved problems, but with more steps.
  - fqiao 58 minutes ago
    Really appreciate the suggestion! By "live migration", do you mean keeping the existing files and migrate them elsewhere with the vm?
    Thanks
    [-]
    - topspin 54 minutes ago
      I mean making any given VM stop on host A and appear on host B; e.g. standard Qemu/KVM:
      virsh migrate --live GuestName DestinationURL
      This is feasible when network storage is available and useful when a host needs to be drained for maintenance.
      [-]
      - fqiao 33 minutes ago
        I see. so right now smolvm can be stopped, and then "packed" (think of it as compressed), and restart on a different host. files in the disks are preserved, but memory snapshotting is still hard tbh
- harshdoesdev 3 hours ago
  +1. i built something similar called shuru.run because i wanted an easy way to set up microVM sandboxes to run some of my AI apps, and firecracker wasn't available for macOS (and, as you said, it is just too heavy for normal user-level workloads).
  [-]
  - sahil-shubham 2 hours ago
    Nice work on Shuru — I remember looking at it when I was researching this space. You went with a Rust wrapper on Apple’s Virtualization framework right?
    I have been working on something similar but on top of firecracker, called it bhatti (https://github.com/sahil-shubham/bhatti).
    I believe anyone with a spare linux box should be able to carve it into isolated programmable machines, without having to worry about provisioning them or their lifecycle.
    The documentation’s still early but I have been using it for orchestrating parallel work (with deploy previews), offloading browser automation for my agents etc. An auction bought heztner server is serving me quite well :)
    [-]
    - harshdoesdev 1 hour ago
      bhatti's cli looks very ergonomic! great job!
      also, yes, shuru was (still) a wrapper over the Virtualization.framework, but it now supports Linux too (wrapper over KVM lol)
  - fqiao 2 hours ago
    Yes, having a light-weight solution for local devices as well is one primary goal of the design. Another one is to make it easy for hosting, self or managed
- thm 2 hours ago
  You could add OrbStack to the comp. table
  [-]
  - fqiao 2 hours ago
    Will do. Thanks for the suggestion!
- sdrinf 2 hours ago
  hi, great project! Windows support is sorely lacking, though. As someone working a lot with sandboxed LLMs right now, the options-space on windows for sandboxing is _extremely lacking_. Any plans to support it?
  [-]
  - fqiao 2 hours ago
    Hey, thanks so much! yah we will definitely add windows support later. We are exploring how to get this work with WSL and will release it asap. Stay tuned and thanks!
  - binsquare 2 hours ago
    Yeah, it's in my mind.
    WSL2 runs a linux virtual machine. Need to take some time and care to wire that up, but definitely feasible.
gavinray 1 hour ago
The feature that lets you create self-contained binaries seems like a potentially simpler way to package JVM apps than GraalVM Native.
Probably a lot of other neat usecases for this, too
```
  smolvm pack create --image python:3.12-alpine -o ./python312
  ./python312 run -- python3 --version
  # Python 3.12.x — isolated, no pyenv/venv/conda needed
```
[-]
- binsquare 1 hour ago
  yeah, it's analogous to Electron.
  Electron ships your web app bundled with a browser.
  Smol machines ship your software packaged with a linux vm. No need for dependency management or compatibility issues because it is baked in.
  I think this is how Codex or Claude Code should be shipped by default, to avoid any isolation issues tbh
mrbluecoat 42 minutes ago
Can .smolmachine be digitally signed and self authenticate when run? Similar to https://docs.sylabs.io/guides/main/user-guide/signNverify.ht...
irickt 23 minutes ago
Is there a relation to the similarly-purposed and similarly-named https://github.com/CelestoAI/SmolVM
cr125rider 3 hours ago
Great job with the comparison table. Immediately I was like “neat sounds like firecracker” then saw your table to see where it was similar and different. Easy!
Nice job! This looks really cool
[-]
- fqiao 2 hours ago
  Thanks so much
akoenig 51 minutes ago
smolvm is awesome. The team is highly responsive and very experienced. They clearly know what they’re doing.
I’m currently evaluating smolvm for my project, https://withcave.ai, where I’m using Incus for isolation. The initial integration results look very promising!
[-]
- fqiao 41 minutes ago
  Cannot thank you more for this! Lets' work together to see how we can make this easier for cave!
lambdanodecore 1 hour ago
Basically any open source project nowadays run their software stack in containers often requiring docker compose. Unfortunatley Smol machines do not support Docker inside the microvms and they also do not support nested VMs for things that use Vagrant. I think this is a big drawback.
[-]
- binsquare 1 hour ago
  I can support docker - will ship a compatible kernel with the necessary flags in the next release.
  [-]
  - lambdanodecore 1 hour ago
    I tried something like this already, also including nested kvm. I think this will increase the boot time quiet a bit.
    Also libkrun is not secure by default. From their README.md:
    > The libkrun security model is primarily defined by the consideration that both the guest and the VMM pertain to the same security context. For many operations, the VMM acts as a proxy for the guest within the host. Host resources that are accessible to the VMM can potentially be accessed by the guest through it.
    > While defining the security implementation of your environment, you should think about the guest and the VMM as a single entity. To prevent the guest from accessing host's resources, you need to use the host's OS security features to run the VMM inside an isolated context. On Linux, the primary mechanism to be used for this purpose is namespaces. Single-user systems may have a more relaxed security policy and just ensure the VMM runs with a particular UID/GID.
    > While most virtio devices allow the guest to access resources from the host, two of them require special consideration when used: virtio-fs and virtio-vsock+TSI.
    > When exposing a directory in a filesystem from the host to the guest through virtio-fs devices configured with krun_set_root and/or krun_add_virtiofs, libkrun does not provide any protection against the guest attempting to access other directories in the same filesystem, or even other filesystems in the host.
- genxy 41 minutes ago
  So Vagrant is launching the VM locally, is that why it needs nesting?
  Would you be ok with a trampoline that launched the VM as a sibling to the Vagrant VM?
isterin 1 hour ago
We’re using smolmachines to create environments for our agents to execute code. It’s been great so far and the team is super responsive. The dev ergonomics are also great.
[-]
- fqiao 56 minutes ago
  Really appreciate it! Would love to work together to make this easier to use.
ukuina 1 hour ago
Doesn't Docker's sbx do this?
https://docs.docker.com/reference/cli/sbx/
[-]
- binsquare 1 hour ago
  sandboxing is one of the features of virtual machines.
  I'm building a different virtual machine.
chrisweekly 35 minutes ago
This looks awesome. Thanks for sharing!
parasitid 1 hour ago
hi! congrats for your work that's really nice.
question: why do you report that qemu is 15s<x<30s? for instance with katacontainers, you can run fast microvms, and even faster with unikernels. what was your setup?
thanks a lot
nonameiguess 1 hour ago
What are you actually doing on top of libkrun? Providing really small machine images that boot quickly? If I run the smolvm run --image alpine example, what is "alpine?" Where is that image coming from? Does this have some built-in default registry of machine images it pulls from? Does it need an Internet connection that allows outbound access to wherever this registry runs? Is it one of a default set of pre-built images that comes with the software itself and is stored on my own filesystem? Where are the builds for these images? Where do these machine images end up? ~/.local/share/smolvm/?
0cf8612b2e1e 2 hours ago
This looks very cool. Does the VM machinery still work if I run it in a bubblewrap? Can it talk to a GPU?
Can you pipe into one? It would be cute if I could wget in machine 1 and send that result to offline machine 2 for processing.
[-]
- binsquare 2 hours ago
  Haven't tried with bubblewrap - but it should.
  Yes! GPU passthrough is being actively worked on and will land in next major release: https://github.com/smol-machines/smolvm/pull/96
  Yea just tried piping, it works:
``` smolvm machine exec --name m1 -- wget -qO- https://example.com/data.csv \ | smolvm machine exec --name m2 -i -- python3 process.py ```
bch 2 hours ago
see too[0][1] for projects of a similar* vein, incl historical account.
*yes, FreeBSD is specifically developed against Firecracker which is specifically avoided w "Smol machines", but interesting nonetheless
[0] https://github.com/NetBSDfr/smolBSD
[1] https://www.usenix.org/publications/loginonline/freebsd-fire...
[-]
- binsquare 1 hour ago
  that was one of my inspirations but I don't think they went far enough in innovation.
  microvm space is still underserved.
  [-]
  - bch 1 hour ago
    > that was one of my inspirations
    Colins FreeBSD work or Emiles NetBSD work?
fqiao 3 hours ago
Give it a try folks. Would really love to hear all the feedbacks!
Cheers!
[-]
- leetrout 2 hours ago
  why did you seemingly create two HN accounts?
  Edit: I see this appears to be a contributor to the project as well. It was not obvious to me.
  [-]
  - fqiao 2 hours ago
    this is me: https://github.com/phooq
    @binsquare is this one: https://github.com/BinSquare
messh 1 hour ago
https://shellbox.dev is a hosted version of something very similar
harshdoesdev 3 hours ago
its a really innovative idea! very interested in the subsecond coldstart claim, how does it achieve that?
[-]
- fqiao 3 hours ago
  @binsquare basically brute-force trimmed down unnecessary linux kernel modules, tried to get the vm started with just bare minimum. There are more rooms for improvement for sure. We will keep trying!
  [-]
  - deivid 2 hours ago
    With this approach I managed to get to sub-10ms start (to pid1), if you can accept a few constraints there's plenty of room!
    Though my version was only tested on Linux hosts
    [-]
    - binsquare 1 hour ago
      would be interested to see how you do it, how can I connect with you - emotionally?
  - harshdoesdev 2 hours ago
    nice! for most local workloads, it is actually sufficient. so, do you ship a complete disk snapshot of the machines?
    [-]
    - fqiao 2 hours ago
      Yes. files on the disks are kept across stop and restart. We also have a pack command to compress the machine as a single file so that it can shipped and rehydrated elsewhere
cperciva 1 hour ago
See also SmolBSD -- similar idea, similar name, using NetBSD.
[-]
- fqiao 31 minutes ago
  I came across SmolBSD before too. Cool project!
volume_tech 2 hours ago
[dead]