For When You Can't Have The Real Thing
[ start | index | login ]
start > dave > experiments > Net Flows > 2009-04-23 > 1

2009-04-23 #1

Created by dave. Last edited by dave, 11 years and 151 days ago. Viewed 1,854 times. #3
[diff] [history] [edit] [rdf]

Inserts got slow

So I made two mistakes:

First I turned on indexing around the 65 million record mark. Indexing took 45 minutes, during which time the inserts totally stopped. Inserts after indexing was turned on took much longer due to wait-on-IO issues on commit, probably due to index updating.

Second, I didn't optimize Postgres. This is the information I got second hand regarding a mail/posgres server with 2GB of RAM:

  1. Set effective_cache_size to about 1GB.
  2. Increase shared_buffers from 48MB to 96MB.
  3. Set checkpoint_segments to 16.
  4. Set wal_buffers to 512KB.
  5. Set default_statistics_target to 50.
So I made those changes:
effective_cache_size = 131072           # typically 8KB each 131072x8KB=1GB
shared_buffers = 40000                  # min 16 or max_connections*2, 8KB each
checkpoint_segments = 16                # in logfile segments, min 1, 16MB each
wal_buffers = 64                        # min 4, 8KB each
default_statistics_target = 50          # range 1-1000

Rerunning the import now, we'll see what happens.

Later: More Changes

  • changed the running loop so we insert flow files in chronological order instead of random
  • changed the running loop so that we read 12 flow files at a time
  • changed the internal logic so we only commit once per run (ie once per 12 flow files read)
Put those together, and right now I am on day 5 of 25, inserting at a rate of one hour (337000 flows/hour) every 3.5 minutes.
no comments | post comment
This is a collection of techical information, much of it learned the hard way. Consider it a lab book or a /info directory. I doubt much of it will be of use to anyone else.

Useful: | Copyright 2000-2002 Matthias L. Jugel and Stephan J. Schmidt