Friday, 27 July 2007

E-discovery and data-classification

I talked earlier about de-duplication, and already have some feedback which suggests that it isn't as widely known as I'd assumed in the US. It seems like such a good idea, it saves money, so the CFO will like it, it cuts down on storage space, so the storage monkeys will like it, and it gives the security guy more control, so he'll like it. I expect the sysadmin will get a bit narked, but hey, he only gets one day of people listening to him in a year (damn, it's today).

I started off this little series of posts with a reference to e-discovery, and this is where the real benefits of data-classification come in. E-discovery is the process of investigating issues after a breach, during audit or for compliance/legal purposes. It can be expensive. Not only in terms of the initial breach, audit, etc. but also in terms of man hours, equipment and consultancy spent in trying to catch the culprit, prove compliance or back up claims. There are a few e-discovery companies out there, (Kazeon, Guidance Software, Archivas) and each one of them claims to save thousands if not millions of dollars doing what they do. So what is that?

In a nutshell, it is the process of collecting, searching, preserving and analysing digital information. All of these processes are simple enough, but keeping them all managed together is a real problem. Imagine for a moment that your data is properly classified however. The data will already be in a state where the processes become simpler. The real issue then is the gaps, not the processes. I find this very interesting, because it feels like proper security at last.

And there are some real security issues here:

  1. If I have collected information from a system, how do I know that information hasn't already changed en route to collection?
  2. How do I know it hasn't been seen and manipulated, or copied?
  3. Between collection and searching, how do I know the index hasn't changed, and therefore the information I am now looking at is redundant?
  4. How can I preserve information without it becoming prohibitively expensive?
  5. When I want to analyse this information, how do I know I'm analysing the right things?
So who's interested in this? Well, apparently not the real security guys. I asked Network Intelligence this last question 6 months ago, and it got as far as the Product Management meeting internally at RSA/EMC. We were trying to get them to look at our integrity software at the time, they said "no" because it would stop them selling as much WORM storage, as it's 100 times more efficient. RSA liked it because it was agent based, NI liked it because it was a differentiator, but EMC had the final say, which is a bit sad. So I went to SBN's own Mr. Anton Chuvakin at LogLogic with the pitch. He said it was a great idea, but guess what? No-one else is doing it, so there's not really a need as yet. "We'll keep your application on file, have a nice day"... Same story at NetForensics, ExaProtect ("bring business to bear"), SenSage, etc. etc. You name it, if they analyse logs or report on events, I've spoken to them at the highest level and been turned down. Perhaps it's because I spell analyse with an "s"?

I guess e-discovery isn't big business in the US yet either? Odd, seeing as how the savings claims are in the millions of dollars. The first company to produce a truly secure e-discovery platform will be raking it in. I just hope it isn't MS or Google.

The other questions have yet to be asked and answered, but I'm going to be asking them in the next few weeks and months. I'd be interested to hear other people's views on this.

No comments: