(2015-01-15)
Problem
Running du -s (many somethings) is especially slow when you do it across NFS.
Solution:
#!/bin/bashfunction jobs_limiter
{
if [[ $# -eq 0 ]] ; then
echo "Usage: jobs_limiter NUM_PROCS. Will wait until the number of background (&)"
echo " bash processes (as determined by 'jobs -p') falls below NUM_PROCS"
return
fi
local max_number=$((0 + ${1:-0}))
while true; do
local current_number=$(jobs -p | wc -l)
if [[ $current_number -lt $max_number ]]; then
break
fi
sleep 1
done
}for i in $@ ; do
jobs_limiter 20 # this blocks if 20 children are already running
(
echo `find $i -xdev -name .snapshot -prune -o -type f -printf '%s\n' | awk '{n=int($1/4096)?int($1/4096):int($1/4096)+1 ; print n*4}' | paste -sd+ | bc ` $i
) &
done
wait
Basically: for each item passed, spin up a find in a child process which records the sizes of all files it encounters in the tree, converts it to 4K allocation unit sizes, adds them up, then prints them out.
The magic is the jobs_limiter which sleep-blocks as long as the number of child processes is equal to or more than the passed number (in this case, 20). And I'll confess, I got it from an answer on Stack Overflow or somewhere.
This means that while one find is in an NFS wait for whatever reason, there's always another (or several) ready to process in the meantime.
Notes
This is sub-optimal in all kinds of ways:
- sleeping for a second in the jobs_limiter means time gets lost in dispatching the next tree
- if you have few deep trees you won't see the improvement you would if you had many, shallow trees
- it doesn't count directories as using space
- the 20 is, at this point, arbitrary
- if all your data is in large trees at the end of your spec, you are still stuck traversing through them at slow speed; ideally you'd have an idea what is biggest and start those trees off first
- it will probably melt your computer if you try to run it against local disk
This scratches a
highly specific to me itch and is probably not suitable for general use.
The story is that I have a /homes volume on a NetApp filer and the du -s that was running was taking in excess of three days to run. (This nifty script does it in an hour forty, just sayin'.) It was also not filtering out the .snapshot tree and it was running the trees one at a time which was taking for. ever. So a little cold medication later, I cranked this out.