Analytics, big data, automation, and machine learning are all terms we use when talking about the future of cybersecurity. As the volume of security data increases, data science will become an important weapon to disrupt adversaries.
Too often, these terms are used as synonyms, but they refer to different parts of the domain of data science. To stay ahead of threats and predict vulnerabilities, we should all have a basic understanding of the fundamental security building block of data science.
What is data science?
Data science is the confluence of math, statistics, hardware, software, and data management. Data scientists apply mathematical algorithms and models to solve problems—such as detecting an attack before it happens or stopping ransomware before it takes over a computer or network. Data management covers the processes of gathering data throughout software and hardware environments, as well as governance, policies, security, storage, and mathematical boundary conditions. Effective data management is as important as the algorithms themselves.
What is big data?
Big is the essential part of big data. Security tools can collect massive quantities of data, which are necessary to develop sustainable patterns of normal and anomalous behavior. The quantities are mind-boggling—data scientists often work in yottabytes (1024 bytes) of data.
What are analytics?
Analytics are the scientific process of transforming data into business insight. This involves mining big data to identify patterns, build models, test those models against real scenarios, and iterate through the process to improve the ultimate effectiveness. There are four basic types of analytics: descriptive (what happened?), diagnostic (why did it happen?), predictive (what will happen?), and prescriptive (this is what is recommended because that will happen).
What is automation?
Automation (as it pertains to machine learning) is simply the process of having computers execute analytic models. Automation can be applied to many parts of cybersecurity and data science by removing repetitive tasks, summarizing datasets that are larger than humans can handle, identifying patterns, and performing mitigation functions, among others.
What is machine learning?
Machine learning is the action of automating analytics to the point that the computer builds on and enhances the model over time, identifying new patterns and relationships to which it can apply rules and policies. When working at the predictive or prescriptive levels, the machine will calculate the expected future value of a particular variable.
What are some common myths?
Big data, analytics, and machine learning are very powerful, but they cannot solve every problem.
Some key myths of analytics:
- They can be done quickly.
- The results are always right.
- You don’t have to know any math or statistics.
Some key myths of machine learning:
- Human involvement is not required.
- You can just pick a model and apply it to your data.
- It is hack proof.
Analytics, big data, automation, and machine learning can be applied to a wide range of business challenges. For cybersecurity, the opportunity is to identify anomalous behavior and other indicators of attack sooner, and even predict future attacks based on context, learned patterns, and shared threat intelligence. Understanding the basics of data science is important to be able to effectively apply these tools to current and future business and security needs.
For the full crash course in security data science, analytics, and machine learning, download the McAfee Labs Threats Report: September 2016.