This is the second part in the IR Metrics blog series where we are dividing up the Incident and Remediation timeline into smaller stages and then looking at how we can optimise these to reduce the overall duration and thus the impact of an incident.
The initial investigation stage can be quite short provided we are prepared and have maximised the access provided to the team. Now the team should simply be looking to see if the alerts they are getting are false positives or if they are true positives. They should look to see what they indicate and how urgently we need to start the next steps of the incident.
To recap, the attacker has made an initial breach and has compromised a system. As a result of the attackers actions, some logs, alerts or even a call by a user has been made and now the analyst has been called to action to answer the big question:
Have we been hacked?
So, if we want to improve this stage’s timings we need to look at factors that will influence it and to see if these are mainly technology and access based. Some of these factors could be:
- Staff that are looking at/working in the logging system or SIEM console they will respond faster than those that are away from the office/desk/network.
- To investigate this alert staff will need access to SIEM, the logs that generated it and potentially the end system itself.
- If they understand the system they may understand what that system should be doing etc.
- If they know the network and the activity baseline, they will be able to make a decision faster.
Remember in this stage we are looking to either give the all clear, to declare an incident or to request more staff/access/data to continue the investigation further.
Therefore, again visibility is crucial to improving the speed of analysis, and automation really helps here. But if IR staff are not able to get host and network logs relating to the initial investigation then the response will be delayed. By delaying the initial investigation the attacker is able to gain further advantage in the network and can move laterally or cause more potential damage.
To minimise the delay organisations should work to do the following:
- Ensure there are enough staff to actually cover for training, holiday and other tasks as well as doing analysis.
- Ensuring that there is mealtime cover or you will be blind for at least 3 hours a day (2 teams x 1 hour for lunch and 2 x 15 minute coffee breaks), and that’s 1/8th of the day.
- Reduce the clutter on the analysts desktop environment by centralising SIEMs/logging systems:
- If you have different networks with different security levels there is a temptation to split the SIEMs and logging systems, but by doing so you remove any AI, rule or logic based correlation of events. To get around this pull logs from lower systems to higher ones via on-way-gateways.
- Reduce the volume of data going into the SIEM
- This sounds odd but we often see SIEMs and logging servers choked with high volume low quality events. This in turn slows queries performed by the analyst and extending the time necessary to complete the investigation.
- Ensure analysts have fast access to additional logs from the targeted systems as the centralised logs may not give the full picture (more likely if you are filtering logs at the lower level).
- Provide analysts with access to network logs (DNS, NetFlow, WebProxy and FW logs)
- Provide the analyst with access to the live system if possible (noting that this will take time in EU to get approval in all countries [we will cover this in a later post]).
- Give the analyst the tools to be able to slice and dice these logs in a save and secure workstation (we will cover building your IR workstation network in a later post).
- Get the team briefed on how the organisation is currently connected as this will vastly improve their understanding of how data moves about but also what is normal and how an attacker would move about. They should have documents (ideally live access to IT’s planning and documents) that details:
- Connectivity to the internet (all hops, technologies, locations of the systems and detection capabilities)
- How the various internal organisational sites are linked – both the speed and the features of the routers/FWs is ace to understand
- The key systems on each site. Even if only a list of DC’s, LDAP and other security servers as well as the location of what the business considered important data (PII, PCI, HIPA, SOX and all those things 😀 )
- Key roles of users/departments to understand normal/abnormal user access/activity
With these in place your staff should be able to provide an accurate and expeditious response to the question
“Are we hacked or not?”
Then you can move on to the equally big question:
“How bad is it?”
We’ll cover that in the next post (due Friday 22nd June).