Unused base images in OpenStack Nova

Posted on Wednesday, 27 March 2013 at 16:36 UTC
— Filed in Fedora OpenStack

By default, unused base images are left behind in /var/lib/nova/instances/_base/ indefinitely. Some people like it that way, probably because additional instances (on the same node) will launch quicker. Some people don't like it, maybe because they are limited on space even though that's considered cheap nowadays. I'm still a bit twisted, having limited storage in my proof of concept cloud but valuing short spawn times.

But ever since (I learnt that) removal of unused base images was set to True by default, I wondered why they'd still remain for me. Did I do something wrong? Being new to this I had not the slightest clue, except there were no errors which is generally a good sign you're doing something right. But then Kashyap Chamarthy wrote an interesting article on the matter a few weeks ago. Still, something seemed either wrong or missing or just different in my installation (likely one of the latter two and probably a detail, but an important one). So I started to poke around the topic myself this morning, which included reading some code to help me understand things better. And to find the actual default values, because the documentation is often wrong about those.

Configuration File Options

There's four options that you can set in /etc/nova/nova.conf that do matter here, most of them being specific to Nova Compute. The following lists the options with their default values and adds a brief explanation (the first sentence always being based on the actual help text of the option).

remove_unused_base_images = True
Should unused base images be removed?

periodic_interval = 60
How many seconds to wait between running periodic tasks. This is the definition of ticks (see below) and used by several Nova services. I think changing this value requires you to restart all of them on the same node.

image_cache_manager_interval = 0
Number of periodic scheduler ticks to wait between runs of the image cache manager. Obviously that's the periodic task which will actually perform the image removal. If set to 0, the cache manager will not be started!

remove_unused_original_minimum_age_seconds = 86400
Unused unresized base images younger than this will not be removed. The age refers to the file's mtime.

remove_unused_resized_minimum_age_seconds = 3600
Unused resized base images younger than this will not be removed. The age refers to the file's mtime.

So if you want to disable automatic removal of base images, you should be fine because the Cache Manager is disabled by default. To be sure, you might want to change the first of those options to False.

But should you wish to enable it, you must define an interval for the Cache Manager. Not sure what good values are, though but I guess that depends on how your cloud is used, anyway.

Once the Cache Manager enabled, you might want to tweak the last two options a bit though the defaults are sane to start with. Basically you have two kinds of base images and one option for each: 1) original, or unresized. That's basically the uncompressed COW image. 2) resized. That's the original image but extended to reflect the (chosen) flavor's disk size. Note: depending on your configuration, you might not have both or they might be in different formats, not sure. See the excellent and very complete discussion of this topic: Pádraig Brady.

What's actually happening

Let me try to make the complete process a little bit clearer still. Once Nova Compute is started (and after a configurable random delay to prevent a stampede), the periodic task manager is started in order to count the seconds between ticks. Every X (the interval you set) ticks, the cache manager is run. First check it does is what images are not being used in any instances anymore. Those that are still in use are ignored, the others are checked for their age which is calculated as seconds since the base file's mtime (i.e. time of the last modification of the file). If the age is smaller than what you configured (for this kind of base image: original or resized), it's deemed "too young" and left alone. But if it's mature enough, it's deleted. That's it already, end of story. The process can easily be obeyed in the log file, to understand it even better.

Shared instances Folder

If you have configured live migration of instances, all your compute nodes share one common /var/lib/nova/instances/ and you might wonder what's different in that case. Of course the process is exactly the same with one important difference: base images that are unused on one node could still be in use on a different node and should not be removed. But Nova Compute won't know it's used somewhere else and the Cache Manager will try to remove the image nevertheless. Luckily, the mtime changes now and then if the image is in usage on any node so unless you set the age options to a very low value, you should be fine. In a test run I once had it down to 4 minutes and that still worked. With the idle Cirros 0.3.0 image I spawned, the mtime changed ever ~2 minutes. That said, I don't yet completely understand why it changes at all.

Grizzly

It's important to note that the above is probably valid for Folsom only.

Update: In Grizzly, there is no periodic_interval anymore, the replacement is more dynamic and can probably be ignored. That also means image_cache_manager_interval is now using seconds as there are no fixed ticks anymore. More importantly, though: its value is now set to 2400 by default, i.e. it's enabled. I don't think anything else changed.