start > dave > experiments > Net Flows > 2007-11-08 > 1

2007-11-08 #1

Created by dave. Last edited by dave, 16 years and 167 days ago. Viewed 2,542 times. #1

[edit] [rdf]

labels

attachments

Progress today

Changed the way the script works: now the call to commit the inserts to the database is outside the loop. This approximately halves execution time:

foreach $ip (keys %LocalTraffic)
{
    $statement="insert into LocalIP values ($timeStamp,'$ip',$LocalTraffic{$ip});";
    $sth = $database->prepare($statement);
    die "$DBI::errstr" if (!defined sth);
    $sth->execute;
    $sth->finish;
}
$database->commit;

I was getting some errors talking about the parameter being too large for an integer field, so I changed the table structure for LocalIP to use bigint instead:

flowdb=# drop table LocalIP;
flowdb=# create table LocalIP ( timeslot int, ip varchar(16), bytes bigint );
flowdb=# grant all on table LocalIP to flowdb;

...although using the bigint now means the script time has quadrupled or so. Interesting.

Dry-run import of the data takes a long time. Six or seven seconds per 15-minute file.

Changed the way flow-capture is invoked:

flow-capture -w /var/spool/flows -n 288 -S 5 0/0/999

The parameter -n 288 tells the capture system to break up the capture into five-minute long files instead of the default, fifteen-minute long files. (288 is the number of five-minute intervals in 24 hours.)

This will mean that the old data is practically useless, unless we want to re-parse the existing 15-minute interval data down into five-minute intervals. Or maybe what we should be doing is just round every datapoint down to the appropriate five-minute mark… although that would mean keeping three times the amount of tables in memory while parsing the files. Hmmm. On the other hand, we really only need five-minute-interval data from the last day as everything else will be averaged out by the rrdb.

I think that parsing it out into five-minute intervals is the "right thing to do", even though it increases the amount of data we hold in memory, and it means we effectively have to do a read on the database for each record to ensure we are not clobbering a record from a previous data file and will instead add to it.

That's for another day.

no comments | post comment

see also:
dave	Net F...	snipsnap-search	Expor...
...	Netscreen	NDMP Restore ...	VNC Co...
Juniper

2007-11-08 #1

Progress today

Virtual Dave Megaplex:

Useful: