The Evolution of Data Leak Prevention Technology
The first generation
In the beginning, companies did not think about data leakage. They thought about document control. This was a natural outgrowth of authoring tools and document management systems. The key concepts of this phase were about password protection and document repositories. Nobody was worried about the movement of sensitive information to unapproved internal and external users.
Customer information was kept in separate databases with database administrators having password control. Financial information was kept in separate databases as well. The thinking was all about a few people having password control to collections of company confidential data.
The second generation
Once electronic payment processing, email and online transactions become common, a new problem emerged. Vulnerabilities in common web, email and transaction processing software opened new doors for the misuse of sensitive personal and financial data. Structured data from credit cards, social security numbers, drivers’ licenses was now at risk. The solution was to create signature matching technology to inspect transactions for the presence of sensitive structured data. However, that technology was not able to detect breaches in unstructured data. It could only do matching of pre-defined categories of information.
The third generation
Email, web downloads, ecommerce and a host of new applications moved the area of greatest risk to unstructured data. Spreadsheets, presentations, documents, emails, text documents are all variable in length and content type. Most DLP providers have solutions for unstructured data, but they are incapable of consistently identifying sensitive data transmission. The very nature of unstructured data makes it difficult to determine the contents of those transmissions
The fourth generation
Fourth generation DLP will address the critical issues of the previous generations as well as solve the problem of unstructured data. This generation will have to automate the functions of classification, analysis and enforcement. The fourth generation will require automated analysis of data, automated generation of policies and automated enforcement of policy.
Identifying a fourth generation DLP system
There are four characteristics which separate the third generation of DLP from the fourth.
- Auto-classification of content – the system will be able to recognize sensitive data in any format
- Automated Identity and role based control – the system can recognize data breaches based on who should have access to what
- Automatic generation of policy – the system will be able to create comprehensive policies based on automatic classification of content and role based content control
- Real time enforcement – the system will be able to classify content, apply policy and delay or stop exfiltration attempts in real time