As part of an on-going attempt to spread love throughout the blogging community, I will attempt to explain data-centric security, a love in my life only secondary to my wife (by quite a long way you understand).
So, what is data-centric security?
Quite simply, data-centric security is any security applied to data as opposed to the network, or to users. Most people will be familiar with user security as authentication, authorisation and accounting. The rest of security normally seems to be network based, so you would be forgiven for thinking that this is what security is about. Certainly that's been the argument of a thousand vendors, but apart from securing applications and hosts so that data access cannot be manipulated, what else is really needed on the network?
Consider what you are protecting, and from whom you are protecting it. These are the 2 places you need to protect from manipulation. If you can trust the start of a transaction (user -> data), you can trust the end. That is to say, if you know that you have the correct user identified, you can then protect the data from access by anything other than that user or set of users which are authorised to access it. This of course takes perimeter controls around the data, as discussed with DRM in the previous post, which on a network are still best applied by a device between user and data.
What it does not mean is that devices which currently exist to protect the network are the right ones.
Without revealing my bias about firewalls, consider their purpose for a second. They are to protect networks from intrusion over certain ports. They even now inspect traffic coming over legitimate ports, in an attempt to prove their worth (oops, almost revealed my hatred of them for a moment then!)
Why is this an issue? Well, if an attacker gets onto an unprotected port on a PC or server in your network, he can traverse various controls, take ownership, forge access and steal data.
I'll ask the question again, why is this an issue? In reality it's because applications have holes. OSs have holes, databases have holes, etc. All that firewalls have achieved is to drive the threat inside the firewall. Nowadays 80-90% of attacks come from inside the firewall. Or if you're TJX, the guy outside with the Pringle's can pointing at your wireless network (which is inside the firewall).
Why not just put host based firewalls on each of these vulnerable points and save thousands, plus increase the security of your network from insider threat?
I can't think of one other device inside a network which could not be eliminated with an upgrade of hardware or securing of software. Reporting and management tools are just an additional piece of software, which could be incorporated into an upgraded piece of hardware. Proxies, load balancers, anything. There is no need for all of this network security if the host is properly configured.
For years more and more capabilities have been appearing on devices on the network, firewalls, IDS, IDP, AV, proxy type filters. These have begun to be placed on a single perimeter device which can deal with all of this functionality, much simplifying network security, and also proving a point. This can all be put on a host. If you put all your applications onto a single host as well, you could shove it all in the same box. What about your data? Why not? If the hardware had the performance capabilities, what's to stop you, apart from catastrophic systems failure? Get 2!
This seems unlikely, I admit, but where is convergence most likely to end up? Will we end up with the return to the mainframe as above, or diverge back out for convenience? I don't know. I wouldn't predict the mainframe model. It is more likely to end up somewhere in the middle.
In my opinion we will end up with a single device at the perimeter, then our web/app tier, then another single device for data control, then the data environment.
The network will be used for transmission of data only, and in the shortest routes possible between these 4 areas. The data device will apply access controls based on user attributes applied at the perimeter device. The data will be classified as to it's security level, and the users will be permitted access to the data based on their security clearance level.
The logic problem
The more secure the data, the more controls will need to be applied in storage. This is where data-centric security really begins, at the end. This is why I love it, because it starts at the end-point and works it way backwards, like a proper logic problem.
Basic level data does not need much protection. Confidential data requires encryption. Data required for legal proof or record retention requires integrity controls. All of it can be compressed to save space. This can all be done in different levels of physical storage, WORM for integrity, locked away for confidentiality, on high performance optical disks for high availability.
What this solution cannot do is protect the data once it is at the user. At this point, when the user has the data under their control, DRM comes into play. As discussed previously, on a network we have different issues than off a network. In this case we are on a network, but if our data is classified we can more easily apply controls at the application level once we have decided HOW we are going to apply them. A device between application and data can apply per-user application controls on a network which cannot be applied off the network. At this point I have to tip my hat to Rory McCune again, and admit that he is right that applications will have to recognise the data tags to apply the security levels off network, and this is where our original misunderstanding arose from.
On a network therefore, DRM effectively becomes a physical security problem. Stopping shoulder surfing, photographing screens, copying data with a pencil and paper. Off the network, the problems still exist, and this is why DRM is such a big issue.
Addressing this properly, as Rory correctly says again, will require standards, which are already being worked on for the transmission and protection of data on the Semantic Web. Applying the classification is the first step. I'll talk about this tomorrow.