Sparse file creation, not zero fill

If all we wanted to do was create a file of size N, then we can simply write at byte N in the file. This can be a significant savings in IO and is called sparse file creation.

Here is a simple Python script which does just that:

#
# Does not account for size < 1
#
def punch(file, size) :
    try :
        f = open(file, "wb")
        f.seek(size-1)
        f.write(b'\x00')

    finally:
        f.close()

punch("p1024.out", 1024)
punch("p1023.out", 1023)
punch("p64.out", 64)
punch("p1025.out", 1025)
punch("p10250.out", 10250)

And it yields:

[thomas@snakey src]$ ./punch.py 
[thomas@snakey src]$ ls -la p*out
-rw-r--r-- 1 thomas wheel  1023 Feb 13 16:05 p1023.out
-rw-r--r-- 1 thomas wheel  1024 Feb 13 16:05 p1024.out
-rw-r--r-- 1 thomas wheel 10250 Feb 13 16:05 p10250.out
-rw-r--r-- 1 thomas wheel  1025 Feb 13 16:05 p1025.out
-rw-r--r-- 1 thomas wheel    64 Feb 13 16:05 p64.out

Can we detect a difference between the two outputs?

[thomas@snakey src]$ uname -a
Linux snakey 2.6.35.11-83.fc14.x86_64 #1 SMP Mon Feb 7 07:06:44 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
[thomas@snakey src]$ cmp p1023.out h1023.out 
[thomas@snakey src]$ cmp p10250.out h10250.out

That asks the question over whether the OS is dumping data down there for me or not. Hmm, let's try this:

[thomas@snakey src]$ du -sh p10250.out h10250.out 
4.0K	p10250.out
12K	h10250.out

So the sparse one is 1/3rd the size. If we do an additional file 10x the size of this one, we see:

[thomas@snakey src]$ cmp p102500.out h102500.out 
[thomas@snakey src]$ du -sh p102500.out h102500.out 
4.0K	p102500.out
104K	h102500.out

So the sparse file creation is working in the sense that we are seeing a block being written. (Which must perforce be 4k.)

So why then does cmp not squawk? Well, when this OS reads the missing pages, it reports them as 0s to the caller. We can also see that this OS is zero filling the page for us.

Comments4
Pingbacks0

pozycjonowanie says:

March 2, 2011 at 5:30 pm

Great article I’ve just added to my bookmark list.
opal says:

April 28, 2011 at 3:52 pm

Hello,

Check what can be done with “Sparse Files” on NTFS: http://www.opalapps.com/sparse_checker/sparse_checker.html

Although I would love to have the sparse regions would be allocated on NTFS automatically. Linux (ext3 I believe) is one step ahead here.

Regards
tdh says:

May 8, 2011 at 3:04 pm

opal, thanks for the link – it helped me with something else I was looking at!
opal says:

January 29, 2012 at 2:44 pm

Hello tdh,

Good to hear it helped.

Share your experience – it would be interesting to know how people use “SparseChecker” and what else could be added improved there.

Regards

Sparse file creation, not zero fill

You may also like...

4 Responses

Leave a Reply

You may also like...

A recursive gdb script for Binary Trees

Compiling XDR for NFSv4 on Linux

Sparse file creation not an option on OS X

4 Responses

Leave a Reply