start > dave > experiments > Net Flows > 2007-11-20 > 1

2007-11-20 #1

Created by dave. Last edited by dave, 16 years and 159 days ago. Viewed 2,292 times. #3

[diff] [history] [edit] [rdf]

labels

attachments

Continuing

The data load gets slow with all this data in the database. 11 days of data took over 730 minutes to load. Running the cleanup insert (ie the time between the run-time and now) is taking a large amount of time per file now. Detection/removal/correction process is taking almost a whole second per record.

This is probably because every record we want to insert we have to do the lookup on to make sure we are not clobbering. There are a couple of ways around this:

just write the new record and make postgres2rrd deal with the multiple records per time period/ip combination
create some method where we have a pre-load table of some kind that only contains timeperiods likely to have clobbers (ie, the last time period written from the previous file) so lookups are fast; then re-populate the table with the data from the last time period from this time so it's available for the next file.

Creation of the RRD files is also pretty slow, three time periods per second or so.

I'm probably totally I/O bound here:

reading the flows and writing the database is all disk
reading the database and writing the rrd files is all disk

...so there won't be much to be gained by throwing more CPU at the problem.

Wrote a shell script to do the graph generation, it's pretty speedy. Less than four seconds. So on-demand generation isn't going to be unreasonable.

Changed the table layout:

removed bytes
added bytesIn, bytesOut
added a new table which tracks which flow files have been inserted

postgres2rrd now knows to skip the last time period it's offered, record it, then exclude all timeperiods prior to that the next time it's run. This is stored in the table rrdload. It also knows to not delete the rrds.

To do:

switch to tell postgres2rrd to generate the rrds from scratch
figure out if this is robust enough for the collector to go back to 5-minute interval files (probably not)
generate a simple web page with the IPs and graphs on it
figure out how to generate the graphs dynamically
generate tables based on top talkers, top listeners (this is probably simple sql query voodoo)

I think at this point I can load the data and then pretty much forget about having to re-load it from scratch. Again.

Oh yeah, forgot to mention: once you convert the values to bits per second, the graphs make much more sense.

no comments | post comment

see also:
dave	dave	Net F...	snipsnap-search
Netscreen	Juniper	Vista	Net+F...
Non-Compete	tft...

2007-11-20 #1

Continuing

Virtual Dave Megaplex:

Useful: