For When You Can't Have The Real Thing
[ start | index | login ]
start > dave > experiments > Net Flows > 2007-11-08 > 1

2007-11-08 #1

Created by dave. Last edited by dave, 11 years and 38 days ago. Viewed 1,468 times. #1
[edit] [rdf]

Progress today

Changed the way the script works: now the call to commit the inserts to the database is outside the loop. This approximately halves execution time:
foreach $ip (keys %LocalTraffic)
    $statement="insert into LocalIP values ($timeStamp,'$ip',$LocalTraffic{$ip});";
    $sth = $database->prepare($statement);
    die "$DBI::errstr" if (!defined sth);

I was getting some errors talking about the parameter being too large for an integer field, so I changed the table structure for LocalIP to use bigint instead:

flowdb=# drop table LocalIP;
flowdb=# create table LocalIP ( timeslot int, ip varchar(16), bytes bigint );
flowdb=# grant all on table LocalIP to flowdb;

...although using the bigint now means the script time has quadrupled or so. Interesting.

Dry-run import of the data takes a long time. Six or seven seconds per 15-minute file.

Changed the way flow-capture is invoked:

flow-capture -w /var/spool/flows -n 288 -S 5 0/0/999
The parameter -n 288 tells the capture system to break up the capture into five-minute long files instead of the default, fifteen-minute long files. (288 is the number of five-minute intervals in 24 hours.)

This will mean that the old data is practically useless, unless we want to re-parse the existing 15-minute interval data down into five-minute intervals. Or maybe what we should be doing is just round every datapoint down to the appropriate five-minute mark… although that would mean keeping three times the amount of tables in memory while parsing the files. Hmmm. On the other hand, we really only need five-minute-interval data from the last day as everything else will be averaged out by the rrdb.

I think that parsing it out into five-minute intervals is the "right thing to do", even though it increases the amount of data we hold in memory, and it means we effectively have to do a read on the database for each record to ensure we are not clobbering a record from a previous data file and will instead add to it.

That's for another day.

no comments | post comment
This is a collection of techical information, much of it learned the hard way. Consider it a lab book or a /info directory. I doubt much of it will be of use to anyone else.

Useful: | Copyright 2000-2002 Matthias L. Jugel and Stephan J. Schmidt