Friday, 17 July 2015

10 Bits of Logging and Monitoring for Architectural Success

I've been involved in a logging and monitoring project recently, and realised how close to their chests most vendors and other companies doing this type of work tend to keep their methodologies. And although a lot of people have done L&M projects, I wonder how much of the knowledge is retained, improved upon or disseminated?

With that in mind, I wanted to give a quick round-up of what I think makes a successful L&M architecture, completely generically, and without reference to tools:

1. Know your assets

If you don't have a CMDB, go and get one and postpone your project by a year. You will need AT LEAST your business critical assets listing, preferably with a quantifiable measure of their criticality assigned.

2. Map Business Risks to Technology Use Cases

You do not want to collect every log your infrastructure creates. If you know what risks your business faces, create use cases which reflect this - do they make sense? Can you collect logs that represent these use cases? This is not a quick process - I used specialists for many months to create this information.

3. Implement the Use Cases as rules

The Use Cases need to be implemented as rules, so you'd better be able to describe them in terms of collected and collated/correlated data...

4. Log for compliance as close to the source as you can

Don't waste bandwidth sending 100% of your logs over the network. Logs can be kept online in whizzy tech for a month, indexed and expensive, but when archived, you can keep them in your bog standard cheapo SAN storage.

5. Reactive monitoring only tells you what's already happened...

Obvious really, but for a really useful log monitoring solution, you should find something that can look for unknown signals in the noise, anomalies that might indicate attack, even if they are only minor.

6. ...but you still need it.

Once you've found a signal, create a rule for it, so you can see it happening!

7. Incident repose and Forensics teams need the same data as Compliance

Those log stores I mentioned, make sure you can get them back online, indexed and searchable pretty quickly when required. In an emergency, you don't want to wait a month whilst Johnny Forensics searches for an IP address or username....

8. Threat Data

Get some. There are a lot of technical feeds out there, but Threat Data and Threat Intel are not the same thing. Threat Intel needs people power and brains, not tools... as ever, the tools just help the processing of data sets.

9. Workflow/Case Management/Incident Management/Ticket Handling

Are all basically the same thing, just from different angles. SOC staff need to pass tickets, their management need a workflow to be followed. When an incident occurs, that ticket needs to have sensitive data added, which turns it into a case - this may involve reference to a different tool, the monitoring platform itself can often be used for this.

10. Automation

The nirvana... once everything runs, the tools and processes are a mass of moving parts. These inevitably suffer bottlenecks, usually waiting for humans to process data. Where this requires technological input/output, this can often be automated outside of the workflow technology itself.

I won't be there for a while, and some of this will need updating before I get there, but from what I've learnt so far, I hope this is useful to someone else out there embarking on an L&M project. You're welcome.

No comments:

MadKasting