reddragdiva: (Default)
[personal profile] reddragdiva

We need a system and network monitoring tool that generates graphs that are (a) useful to us the drones but (b) pretty to show The Mgt. I know there are any number of monitoring things that generate PNGs in real time. What do you use?

Wikimedia uses Ganglia, which generates just the sort of thing we're after, but the description of the application looks a little heavyweight for under ten Solaris boxes. Of course, I'd be happy to hear that this was not the case.

Update: We also need to check stuff like number of users on Oracle, but that should be a simple check every five minutes, assuming it can log arbitrary data.

(no subject)

Date: 2005-10-11 03:05 pm (UTC)
From: [identity profile] mr-tom.livejournal.com
I use Nagios. It works, and makes graphs.

(no subject)

Date: 2005-10-11 03:21 pm (UTC)
From: [identity profile] wyrdrune.livejournal.com
Nagios is good.

If you're looking for traffic monitoring, MRTG gets a major vote here.

(no subject)

Date: 2005-10-11 03:51 pm (UTC)
From: [identity profile] simonb.livejournal.com
MRTG is fiarly good, but it does have some sucky aspects - for example if you get it to auto-generate a configuration its quite likely that it'll set the maximum speed for a port (which MRTG bases the maximum number of bytes in a single data point within the RRD file) to a way too low value if nothing is plugged into a port - for example we've got GigE ports which have a maximum speed which is more suitable for 10BaseT ports.

At some point I'll be updating one of my own scripts to do a lot of the work MRTG already does on our network and relgate MRTG to just monitoring specific ports on the router and firewall.

(no subject)

Date: 2005-10-11 05:22 pm (UTC)
From: [identity profile] wyrdrune.livejournal.com
I tend to hand roll my own configs, by taking an existing one and just tweaking it - but then I'm not talking about a server farm or big network. :-)

(no subject)

Date: 2005-10-11 05:53 pm (UTC)
From: [identity profile] simonb.livejournal.com
Ah; I use MRTG across every port on our network; currently that works out to be 732 ports :)

(no subject)

Date: 2005-10-11 07:17 pm (UTC)
From: [identity profile] wyrdrune.livejournal.com
Meep!

I've got [fx: counts on fingers] 6 ports directly queried via SNMP, 4 more queried via NeTraMet monitoring and a bunch of things like disk space etc. on a couple of Linux boxen. No where near the scale of things you're looking at.

(no subject)

Date: 2005-10-12 09:17 am (UTC)
From: [identity profile] simonb.livejournal.com
*smile* it is quite a bit of work for MRTG to do, it has to be said.

I really need to update the switch status script I wrote to handle the stuff MRTG does for the switch ports - basically I've written a script which I can point at our network and it'll do a gaggle of SNMP queries and from those generate web pages which show representations of the switches of the network with which ports are active, which ports have high usage, which ports are blocked, etc. Even better, it also handles modular switches so that module slots have the correct bits shown in them (ie blank module, 24x10/100BaseT module, 16x10/100/1000BaseT, etc).

In a way, I almost don't need to monitor the router or PIX ports as I already monitor our external bandwidth usage in a different way (done via using libpcap to snarf packets and then generate a cute graph broken down by who is using what).

(no subject)

Date: 2005-10-12 09:24 am (UTC)
From: [identity profile] wyrdrune.livejournal.com
already monitor our external bandwidth usage in a different way

That's a part of what I use MRTG to do - I've got a very low budget, so I slung a hub between the firewall and the rest of the network (poor-mans vampire tap I suppose); one of the ports on that hub goes to a Linux box which sits in promisc. mode, using NeTraMet to log packets to/from each of the branches, thus producing figures that MRTG can read to give me branch specific traffic figures. :-)

From the sound of it, you deal in the sort of area that I want to move to.

I've managed to wangle a 5K training budget for myself for next year, which should see me through my Cisco exams. Then, after the 18 month lock-in period has expired, I'll be zapping my freshly re-written CV around various places.

(no subject)

Date: 2005-10-12 10:38 am (UTC)
From: [identity profile] simonb.livejournal.com
I've not encountered NeTraMet before; however I coded up my own system which seems to work pretty well. I've got a daemon which monitors an interface which sees all of the traffic between the firewall and router - personally I use a mirror port to do this on the switch which sits between the firewall and the router, however you can use a hub as well. Personally I dislike using hubs as I've encountered various problems with them in the past; if nothing else their dumbness means that diagnosing problems can be a right pain! The monitoring daemon uses the perl interface to libpcap (same library used by tcpdump) to snarf packets off the network. Once its got them it categorises them into different groups like web servers, hosted services, anything not in a specific group, etc; it does this based on source or destination IP addresses. Every five minutes it writes out a file containing this data; that is then taken by another script which uses RRDTool to generate lots of graphs; you can see an example of the overview graph generated here.

From the sound of it, you deal in the sort of area that I want to move to

Well, my offical title is "Unix and Network manager" for an organisation called UNEP-WCMC - a conserveration charity which provides data to governments, companies, other scientists, etc. For a small organisation (less than 100 people) we've got unusual IT requirements - for starters we've got more servers than people in the building. Personally I blame the high overheads required by the interactive mapping work we do - we've got a fairly good GIS team who tend to eat servers and storage at rather a high rate; eg a processed satellite image can eat up 2Gb for a 200x200 mile tile!

When I arrived our network was in a shocking state - router joined by tin cans and string (aka a couple of hubs) to a single Cisco Catalyst 5509 which was configured with 13 VLANs. I've since pulled the network into this millenium and now every desktop port can run at 100BaseT (even if the Cat3 cabling on the ground floor can't - we've upgraded the other floors to Cat5e, but we're currently on a purchase freeze) and all servers have the ability to run at GiGE speeds. We've also put in firewalls, got a proper network layout, etc. You can see an overview of our current network here.

I should probably note that I've not had any Cisco training at all - I've picked all of this up as I went along at this job. I'd like some Cisco training - at least my CCNA as it would be useful on my CV. However I doubt that work would spring for that :(

(no subject)

Date: 2005-10-12 11:40 am (UTC)
From: [identity profile] wyrdrune.livejournal.com
NeTraMet

I found it a pig to deal with, so now that it's working, I've taken the attitude that "it ain't broke, so don't fix it"! :-)

Personally I dislike using hubs

Oh don't get me wrong, a mirror port would be my prefered method - however I don't have a single IP addressable switch in the place. I'm loath to mention what type of company I work for, but let's just say that the Holy Grail is "Gross Profit" and if I can't justify a purchase in terms of a direct effect on GP then it ain't gonna happen. The only reason I got the training budget was by hinting that I might start looking elsewhere if I didn't get it.

example of the overview graph

403

When I arrived our network was in a shocking state

AOL - IP address ranges plucked from $deity alone knows where (all RFC1918 space fortunatly) with no consistancy, servers configured so badly I had to blow the whole windows setup away and rebuild it from the ground up, the solution for "user can't do X" had been "give user full admin privs.", all traffic to/from all branches through a single ADSL line - no local servers, that sort of thing. It's taken me two years, and we're about 80% of the way to where I want to be.

my offical title

I got IT Manager in the last round - it's just another way of spelling "scapegoat".

UNEP-WCMC

Had a look around - looks like interesting stuff to be involved in.

an overview of our current network

Also 403

I've picked all of this up as I went along at this job

*nod* In many respects I'm the same, but in order to get to where I want to be (at least before I hit retirement age), I'm going to need those pieces of paper. Part of my Master Plan is to move back down towards the south - most of our close friends are based Nottingham and below, and I'm not going to get the sort of salary I'd want (in order to keep the wife in the style etc.) without those pieces of paper.

(no subject)

Date: 2005-10-12 11:50 am (UTC)
From: [identity profile] simonb.livejournal.com
I found it a pig to deal with, so now that it's working, I've taken the attitude that "it ain't broke, so don't fix it"! :-)

*smile* one of the downsides to being a developer is that when you come across something you need, but which is crap, you end up coding your own! I'm about to do code which handles the parsing of snort and PIX logs as I've not yet found something which fits what I need yet.

[The holy grail of Gross Profit]

Sounds as fun as our situation - we're in yet another purchasing freeze to the point that instead of getting the ten or so new machines we need this year, we're recycling PCs which are at least four years old!

403's

The images are locked to friends-only on the LJ Pics stuff - you'll need to be logged in to LJ to view those specific images.

UNEP-WCMC does do some interesting work; its kind of fun to be working somewhere which is named in international treaties (eg CITES - Convention In the Trade of Endangered Species - states that copies of all permits issued under CITES for moving endangered fluffy critters from country A to country B will be sent to a central organsiation for the generation of reports; we are that organisation) and which is also doing quite a bit of interesting work. However we're also an academic charity, with everything which both of those things imply.

(no subject)

Date: 2005-10-13 06:39 am (UTC)
From: [identity profile] wyrdrune.livejournal.com
[Network Diagram]
Looks interesting - I use Network Notepad to do pretty piccies for Manglement - if I can dig out the latest I'll stick it somewhere and drop you a note with the URL.

one of the downsides to being a developer

I notice from your LJ userinfo that you're a perl hacker? I'm starting to learn perl - just basic stuff like reordering text files, accessing DBI and a little bit of CGI. I'm finding it fun (which is worrying I suppose). I used to be a DB developer before moving to networks, so I understand the compunction to "roll it yourself". :-)

purchasing freeze

Oh don't. I have been trying for two years to get some sort of UPS backup beyond an old, and out of date, APC desktop unit, for our servers. "It's not cost effective", "we can't justify it"... We had a power outage late last week. That made the point. I've got a chap coming round on Friday to talk generator & UPS solutions! :-)

Air con? For the server room? Why? "Because it's so damn hot in there that things have stopped pushing packets" to which I get told "so open a window"... In the height of summer... I now *have* air con and find myself popular in the summer! :-)

But enough of this ranting in RDD's journal! :-)

One of these days I'll have a rant in the SDM again.

(I'll add you to my work ranting journal BTW).

pretty large cacti installation

Date: 2005-10-12 11:35 am (UTC)
pvaneynd: (Default)
From: [personal profile] pvaneynd
mysql> select count(*) from host_snmp_cache;
+----------+
| count(*) |
+----------+
| 16172 |
+----------+
1 row in set (0.00 sec)

mysql> Bye

Re: pretty large cacti installation

Date: 2005-10-12 11:41 am (UTC)
From: [identity profile] wyrdrune.livejournal.com
I think you win that one! :-)

(no subject)

Date: 2005-10-11 03:07 pm (UTC)
bob: (Default)
From: [personal profile] bob
we use nagios and cacti
nagios ofr alertign and moitntoing uptimes.
cacti for drawign pretty graphs of usage

(no subject)

Date: 2005-10-11 03:14 pm (UTC)
From: [identity profile] bramsmits.livejournal.com
Nagios, and rrd to make pretty pictures.

(no subject)

Date: 2005-10-11 03:27 pm (UTC)
From: [identity profile] syringavulgaris.livejournal.com
We use Cricket. I don't know whether it will be managerially shiny enough, but.

(no subject)

Date: 2005-10-11 04:17 pm (UTC)
From: [identity profile] zenmonkeykstop.livejournal.com
Vote #2 for Nagios+rrd. Works well enough here collecting custom data from 40+ servers.

(no subject)

Date: 2005-10-11 04:26 pm (UTC)
From: [identity profile] dan-lane.livejournal.com
I use Nagios & cacti... nagios is good for monitoring and alerting and cacti generates purty pictchoors.

(no subject)

Date: 2005-10-11 04:34 pm (UTC)
bob: (Default)
From: [personal profile] bob
you can write your own scripts for nagios
or indeed use ones people have already written.
http://www.nagiosexchange.org/ is useful for that.

also if you use net-snmp you can write scripts to get stuff.
forinstance we have one which works out the number of smb processes

(no subject)

Date: 2005-10-11 05:29 pm (UTC)
From: [identity profile] shamus9999.livejournal.com
Almost completely off-topic but if you need to monitor disk useage I recommend Sequoiaview. I don't know if there's other than Redmondware versions though. that site's loading too slowly here on this side of the pond so I'll leave it to you to research it.

(no subject)

Date: 2005-10-11 07:36 pm (UTC)
From: [identity profile] muddledslate.livejournal.com
I'm really liking this Uncyclopedia thing. I wrote some entries already.

(no subject)

Date: 2005-10-15 04:32 am (UTC)
From: [identity profile] lithiana.livejournal.com
i recommend ganglia. it's really easy to set up, however many machines you have (edit two lines in the gmetad config on the monitoring host, then copy the same gmond.conf everywhere).

and yes, you can graph arbitrary metrics with it.

mark likes cricket, but it's complicated and doesn't really aggregate data in an easy-to-read way (esp. if you want to show it to management...)