Code

OS X will create sparse files across NFS, or does it?

KinMage:src thomas$ sudo mount -o vers=3,intr 172.16.1.129:/fooper /mnt
KinMage:src thomas$ ls -la /mnt
total 8
drwxrwxrwx   2 root  wheel  4096 Feb 13 21:50 .
drwxrwxr-t  24 root  admin  1224 Feb 13 21:54 ..

Where 172.16.1.129 is snakey, the Linux VM I am using for testing.

And then:

KinMage:mnt thomas$ python punch.py 
KinMage:mnt thomas$ ls -la p*out
-rw-r--r--  1 thomas  staff    1023 Feb 13 21:54 p1023.out
-rw-r--r--  1 thomas  staff    1024 Feb 13 21:54 p1024.out
-rw-r--r--  1 thomas  staff    1025 Feb 13 21:54 p1025.out
-rw-r--r--  1 thomas  staff   10250 Feb 13 21:54 p10250.out
-rw-r--r--  1 thomas  staff  102500 Feb 13 21:54 p102500.out
-rw-r--r--  1 thomas  staff      64 Feb 13 21:54 p64.out
KinMage:mnt thomas$ du -sh p*out
1.0K	p1023.out
1.0K	p1024.out
1.5K	p1025.out
 10K	p10250.out
100K	p102500.out
512B	p64.out

So it creates sparse files across NFS!

Well, yes and no. It will only send a block of data across and then the
server OS decides to create the sparse file or not.

Another thing to note is that the size reported is flexible in the sense
that the underlying file system interface determines how much space
is being reported:

[thomas@snakey fooper]$ du -sh p*out
4.0K	p1023.out
4.0K	p1024.out
4.0K	p102500.out
4.0K	p10250.out
4.0K	p1025.out
4.0K	p64.out

Ideally we would like the sizes to match, but since we are pulling a fast one, we get what we see.

Sparse file creation not an option on OS X

We can see that when we try to create a sparse file a file under OS X (10.6) that the OS writes out the intervening pages.

KinMage:src thomas$ uname -a
Darwin KinMage 10.6.0 Darwin Kernel Version 10.6.0: Wed Nov 10 18:13:17 PST 2010; root:xnu-1504.9.26~3/RELEASE_I386 i386
KinMage:src thomas$ python punch.py 
KinMage:src thomas$ ls -la p*out
-rw-r--r--  1 thomas  staff    1023 Feb 13 21:44 p1023.out
-rw-r--r--  1 thomas  staff    1024 Feb 13 21:44 p1024.out
-rw-r--r--  1 thomas  staff    1025 Feb 13 21:44 p1025.out
-rw-r--r--  1 thomas  staff   10250 Feb 13 21:44 p10250.out
-rw-r--r--  1 thomas  staff  102500 Feb 13 21:44 p102500.out
-rw-r--r--  1 thomas  staff      64 Feb 13 21:44 p64.out
KinMage:src thomas$ du -sh p*out
4.0K	p1023.out
4.0K	p1024.out
4.0K	p1025.out
 12K	p10250.out
104K	p102500.out
4.0K	p64.out

They should all be a multiple of a block size if there are sparse files. Since
they are not, we can conclude no sparse files were created.

Hmm, I wonder what happens over NFS?

Sparse file creation, not zero fill

If all we wanted to do was create a file of size N, then we can simply write at byte N in the file. This can be a significant savings in IO and is called sparse file creation.

Here is a simple Python script which does just that:

#
# Does not account for size < 1
#
def punch(file, size) :
    try :
        f = open(file, "wb")
        f.seek(size-1)
        f.write(b'\x00')

    finally:
        f.close()

punch("p1024.out", 1024)
punch("p1023.out", 1023)
punch("p64.out", 64)
punch("p1025.out", 1025)
punch("p10250.out", 10250)

And it yields:

[thomas@snakey src]$ ./punch.py 
[thomas@snakey src]$ ls -la p*out
-rw-r--r-- 1 thomas wheel  1023 Feb 13 16:05 p1023.out
-rw-r--r-- 1 thomas wheel  1024 Feb 13 16:05 p1024.out
-rw-r--r-- 1 thomas wheel 10250 Feb 13 16:05 p10250.out
-rw-r--r-- 1 thomas wheel  1025 Feb 13 16:05 p1025.out
-rw-r--r-- 1 thomas wheel    64 Feb 13 16:05 p64.out

Can we detect a difference between the two outputs?

[thomas@snakey src]$ uname -a
Linux snakey 2.6.35.11-83.fc14.x86_64 #1 SMP Mon Feb 7 07:06:44 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
[thomas@snakey src]$ cmp p1023.out h1023.out 
[thomas@snakey src]$ cmp p10250.out h10250.out 

That asks the question over whether the OS is dumping data down there for me or not. Hmm, let's try this:

[thomas@snakey src]$ du -sh p10250.out h10250.out 
4.0K	p10250.out
12K	h10250.out

So the sparse one is 1/3rd the size. If we do an additional file 10x the size of this one, we see:

[thomas@snakey src]$ cmp p102500.out h102500.out 
[thomas@snakey src]$ du -sh p102500.out h102500.out 
4.0K	p102500.out
104K	h102500.out

So the sparse file creation is working in the sense that we are seeing a block being written. (Which must perforce be 4k.)

So why then does cmp not squawk? Well, when this OS reads the missing pages, it reports them as 0s to the caller. We can also see that this OS is zero filling the page for us.

Learning Python – mkfile

I’m trying to learn Python, again, and the problem is in trying to code things which would be simple in C/Perl/etc which are hard in Python.

I talked about how cool mkfile was in Creating a small zpool for testing and so I thought I would try to code that up in Python. Remember, this is my first whack at Python in 2 years, so it is all pretty much fresh and I’m in a learning mode.

I struggled with strings, bytearrays, and bufferedreaders. I still don’t know how to open a bufferedreader.

But I finally got to a working piece of code:

import array

def mkfile(file, size) :
    chunk = 1024
    loopto = size // chunk
    filler = size % chunk

    bite = bytearray(chunk)

    try :
        f = open(file, "wb")
        for n in range(loopto) :
            f.write(bite)

        if filler > 0 :
            f.write(bytearray(filler))

    finally:
        f.close()

mkfile("h1024.out", 1024)
mkfile("h1023.out", 1023)
mkfile("h64.out", 64)
mkfile("h1025.out", 1025)
mkfile("h10250.out", 10250)

And it actually yields appropriately sized files:

[thomas@snakey src]$ ls -la h*.out
-rw-r--r-- 1 thomas wheel  1023 Feb 13 15:47 h1023.out
-rw-r--r-- 1 thomas wheel  1024 Feb 13 15:47 h1024.out
-rw-r--r-- 1 thomas wheel 10250 Feb 13 15:47 h10250.out
-rw-r--r-- 1 thomas wheel  1025 Feb 13 15:47 h1025.out
-rw-r--r-- 1 thomas wheel    64 Feb 13 15:47 h64.out

Now I need to check to see if they are all zeros:

[thomas@snakey src]$ dd if=/dev/zero of=z10250.out bs=10250 count=1
1+0 records in
1+0 records out
10250 bytes (10 kB) copied, 4.6629e-05 s, 220 MB/s
[thomas@snakey src]$ cmp z10250.out h10250.out 

So yeah, there are really easy ways to accomplish mkfile.

Speaking of ugly code

The following violates many coding principals that I normally hold true to, but it was actually cool to code using the string variables as a stack.


#!/usr/bin/perl

# print hostlets
#
# 192.168.$i.0
$titan = "";
for ($i = 0; $i < 11; $i++) {
        $j = 0;
        $l = 0;
        $m = 0;
        $n = 0;
        $monster = "";
        $groupzilla = "";
        while ($j < 256) {
                $monster .= " hostlet_" . $i . "_" . $l;

                print "hostlet_" . $i . "_" . $l . " ";
                for ($k = 0; $k < 10 && $j < 256; $k++) {
                        print "(192.168.$i.$j,,) ";
                        $j++;
                }
                print "\n";
                $l++;

                if (($l % 5 == 0) || $j >= 255) {
                        $groupzilla .= " monster_" . $i . "_" . $m;
                        print "monster_" . $i . "_" . $m . $monster . "\n";
                        $monster = "";
                        $m++;
                }
        }

        $titan .= " groupzilla_" . $i;
        print "groupzilla_" . $i . $groupzilla . "\n";
}

print "titan" . $titan . "\n";