Wednesday, May 20, 2015

Data Driven Security, Part 3 - Finally some answers!

I am continuing my exploration inspired by Data Driven Security

In Part One, I imported some data on SSH attacks from Outlook using AWK to get it into R.

In Part Two, I converted some basic numbers into graphs, which helped visualize some strange items.  The final graph was most interesting:

It was strange that there were twice as many attacks at the Dev SSH service as the DR service.  What is going on here?

Well, we've got over 37,000 entries in here over a couple of years.  Let's break them out and get monthly totals based on target.  First, I'm going to convert those date entries to a number I can add up.
Remember the data looks like this:




Each entry has a date and what target location was hit.  So I'm going to use an ifelse to add a new column with a "1" in it for every matching target

So now I have this

Now I add another vector the dataframe, breaking dates down by month

There might be an easier way to do this, but I'm still an R noob and this seemed the easiest way forward for me. Anyway, I can plot these and compare.


You can do this for all four and compare:


This gives me two insights.  First, different SSH systems began receiving alerts at different times.  Basically there are twice as many Dev alerts as DR alerts simply because the DR system didn't generate alarms until the middle of 2014.  Same with the Tst SSH system.  So there is a long tail of Dev alarms skewing the data.  In fact, the Dev system was the SSH system to go live.  Pretty much any system on the Internet is going to receive some alarms, so a zero in the graph means that service was not alive yet.


Second, we confirm and can graphically show what IT orginally asked back in part 1. To refresh your memory: said "The IT Operations team complained about the rampup of SSH attacks recently. "

Here we can visually see that ramp up, beginning at the end of 2014 and spiking sometime in the first quarter.

The next step in our analysis would be to see who was in the spike: are these new attackers?  Where are they from?  The traffic to the targets seem to peak at different times, so there might be something worth investigating there. 

And why did the traffic die down?  Were the IPs associated with a major botnet that got taken down sometime in April 2015?  A quick Googling says yes: "A criminal group whose actions have at times been responsible for one-third of the Internet’s SSH traffic—most of it in the form of SSH brute force attacks—has been cut off from a portion of the Internet."


I hope this series was informative to you as it was to me.  I was pleasantly surprised in being find some tangible answers on current threats from simply graphing our intrusion data.  Now it's your turn!




Data Driven Security, Part: the Second

In Part 1, Last time we loaded up 2 years worth of SSH attacks (37,000 entries) into R for analysis. A quick summary command gave us some interesting highlights:

But we can make some pretty pictures for upper management because they like graphs and it's easier to show differences.

First up, let's look at our the column on the far right, top countries banging on our door.  A quick query


Builds us an object with the top ten countries in it. Now after we make sure the gplot library is loaded library(ggplot2) and then we graph it:



  And we get:
If wanted to go a little deeper, we can look at our top IP addresses... maybe make the graph easier to read. Click to embiggen.

The other interesting thing in the original summary was the totals of which different service ("target") was attacked.  We can pop out a graph on those in pretty much the same way




Hmm... the Dev SSH server has been attacked more than twice as much as the DR server.  What is going on here?

We'll find out in part 3



Monday, May 18, 2015

Getting the Data for Data Driven Security, Part One


Continuing to building on my earlier post, "Questioning your security data", I'm now going to dive into looking into looking at active attacks.

The IT Operations team complained about the rampup of SSH attacks recently.  These particular services were protected by a variety of controls (which I won't get into). One of these controls is a detective controls which generates alerts when the service is attacked.

Historically, I've found this particular control has a very low false positive rate, which makes me happy. The bad news is that the alert goes via email, which makes it hard to suck into my ELK stack. The good news it that I've got a nice historical archive of these alerts. I've got about 37k logs of attacks against 4 different SSH servers in 3 different locations from Jan 2013 to present.

Well, I've had Jay & Bob's book sitting around so now I figured it was time to do some Data Driven Security.

Note: the purpose of this post is to give you ideas and pointers to learn some new techniques.  I'm not going to define an exact step-by-step on how to do this.  If you want to learn more on specific commands, follow the links or drop a comment.  Also, I know my code is crude and lame, if you've got pointers to improve, comment away.

First off, the data exists as email alerts.  I was able to simply do an export from Outlook to text (file->save-as) into one large msg file. So the contents of the file look like:





What's nice is that we've got a real time GEOIP look up, which is handy when reviewing data that's a few years old and IP ownership might have shifted.  The question is how to get this into a form usable for analysis?   The answer for me was AWK.  This quick and ugly AWK script quickly tears through this text file:

BEGIN      {  FS = " " ; x = 0 }
    $1 == "Sent:"  { month=$3; day=$4; year=$5;str = day; sub(/,/, "", day); time1=$6;  time2=$7; }
    month == "January" { month="1" }
    month == "February" { month="2" }
    month == "March" { month="3" }
    month == "April" { month="4" }
    month == "May" { month="5" }
    month == "June" { month="6" }
    month == "July" { month="7" }
    month == "August" { month="8" }
    month == "September" { month="9" }
    month == "October" { month="10" }
    month == "November" { month="11" }
    month == "December" { month="12" }
    $1 == "Subject:" { who=$5; where=$2 ; }
    $1 == "country:"  { print where "," who "," month "/" day "/" year " " time1 " " time2 ", " toupper($2);  }   
    $1 == "missing"  { print where "," who "," month "/" day "/" year " " time1 " " time2 ", " who;  }   
END        { print " " }


You can see that my script works email headers and mostly converts the date into a format that's easier for a machine to read. I just shove my file through this script and it gives me a nice CSV that looks like this:




Column one is the targeted SSH system (I'm giving them some cryptic names here based on asset classification), the attacker's IP, the date/time of the alarm and the GEOIP lookup.   Nice.  I could pull this into Excel and bang away at it, but instead I'll yank into R for some deep analysis.

A quick aside, I'm using RStudio for this example, so
At the R command prompt, I'm pull in the csv file into a dataframe with
alertlist <- file="alerts.csv" pre="" read.csv="">

And name columns:
colnames(alertlist) <- arget="" c="" ountry="" pre="" ttackerip="" ventdate="">


Oh, and I better convert the dates into R format:
alertlist$Eventdate <- alertlist="" as.date="" d="" m="" pre="" ventdate="">
which yields
Now I'm all set to do some analysis.  A quickie summary already tells me lots of interesting things:


Stay tunes for Part 2 where I do some of that.

Wednesday, May 6, 2015

Assuming your breached, what do you look for?

Building on my earlier post, "Questioning your security data", I thought I would share some details on how I'm querying my SIEM. Right now, I'm using ELK to correlate security event data from a variety of sources (firewalls, IDS, HIDS, antivirus, load balancers).

The first question that I care most about is "Have I been pwned?", calling back to the title of this blog.   So here's what one of my dashboards looks like:


The big donut chart on the left is a breakdown of all malicious activity with the inner pie-chart showing country of origin and the outer spoke giving me what port the alert was detected on.  A few interesting tidbits here:  China likes to send a lot of attacks via email, Australia is tracerouting us... but I want more about what's really going on inside my network.  So let's dig deeper.


Here's another pie-chart just showing virus alarms and what port they were detected coming in on. 



















Hmmm.. lots of email malware, but a fair bit of drive-by and possible botnet cnc activity.  Good to know.

Meanwhile, the graphs on the dashboard are serving up visualizations on network incidents detected by internal firewalls and IDS  (You do segregate your internal network with firewalls don't you?)  Here's a blow up of one of those:



The query driving these graphs is in the form of  "Show all IDS-alarms and firewall blocks where the Source-IP is an RFC1918 address"


The big spike on the left is damned suspicous... but on closer inspection, I see it's my vulnerability scanning box.  Ah, that's cool.  The next highest box is an IT inventory tool, which also does some active discovery.   Nice to know I can quickly spot who's scanning on my inside network.

If this is useful or interesting (or way off), let me know.  I can share more of theses as I build out my dashboards and queries.

Wednesday, April 29, 2015

Prioritizing patching

Last night, I was discussing vulnerability risk prioritization with the very bright folks who put on a very interesting talk at RSA on vulnerability management and I'm working on a lecture for a class tonight on impact and risk analysis.All of this got me thinking about the problem of vulnerability management and patching at my organization… and why risk is a minor factor compared to operational resources and compliance regimes.
  
Up until now, my primary prioritizer builds on the premise of “if the vuln is easily sploitable, it’s across the Mendoza line for security” and needs to be fixed.   I can filter with Nessus like this:
This is simple and does better than plain ole CVSS.  And now we have some new great new tools to examine vulnerabilities in the domain on risk of exploitation in the real world.  So naturally, folks would want to use this to prioritize patching.  But not me. For me and my organization, patching priority is less dependent on the risk of exploitation of the vulnerability than operational factors.  It’s counter-intuitive and backwards but it’s also my reality.  

Why?  Lemme break this down - I have basically 3 classes of vulnerabilities to deal with on an on-going basis: Externally facing services, internal boxes, and internal-developed applications.

Patching externally facing services
These are servers that are customer facing and subject to change control and maintenance windows.  Getting these patched, whether the vulnerability is high or low, is more of a question of sysadmin resources during a limited change window than anything else. Since they’re subject to compliance requirements and auditor-visible, anything higher than a low vulnerability must be patched within 30 days, end of discussion. I know the CVSS ratings are lame, so I prefer to clear nearly everything when we patch- and compliance is the wedge that gets me access to a change window.So this is kind of autopilot and driven by the IT operations team availability more than anything else.It’s as easy for them to patch all the lows and highs as it is to patch just the highs.  In theory, I can scream and yell to get things to go faster in times of remotely exploitable horror (a case where risk is actually a major factor) or Times of Great Panic.

Patching Internal boxes
This breaks further into workstations and servers.  Workstations we patch every MS cycle automatically, the only complaint I have here is you need a tool to deal with the non-windows stuff (Java, Adobe, Firefox) but that’s pretty much a solved problem.   And most of the time, we roll up and patch everything, it’s easier than picking and choosing individual patches.

Internal servers are pretty much the same deal as external service, subject to maintenance windows and sysadmin resources.  The problem is usually that there are more of these than external services (you hope!) and for some reason, seem to need to be MORE available than customer-facing systems.  These boxes usually have less high-availability capability because there is no upper management perceived need for 24x7 availability… right up until we need to reboot them. But no one but the users cares about internal productivity, so really, no one with money cares. 

Patch all or pick patches?
The nature of the patching usually means it’s harder to cherry pick patches than just catching all my vulnerabilities in one big swoop , so most of the time we don’t do much prioritizing. It's so hard to get a patch done once a month, it's easier as an all-or-nothing affair. Especially if we're dealing with operating system patches (which are the most frequent). If we have to reboot the box (and fight for time and resources for a reboot) then we might as well "yum update all” and test that.  There are a handful of problem children that aren’t getting patched fast enough and this often because the box needs replacing, fixing or it’s too hard to patch.  Again, it usually doesn’t matter what the patch is, it’s more about IT operational resources and the nature of the box than risk/vulnerability rating.
 
Patching internally-developed applications
This is the only category where risk of exploit plays a major factor in remediation priority, but not the only factor. To patch, we gotta apply developer time and worse, bump off desired money-making features to fix something. However, the priority of fixing is also driven heavily by the customer’s perception of risk. I may be able to calculate and show a risk is at level X but if the user of that application says this risk is Y and I can’t convince 'em otherwise, then Y priority trumps and that drives the schedule.  Sometimes the internally-developed application is in use by an outside customer, who is strictly following a compliance regime that says patch based on CVSS.  Then I’m even more at the mercy of the winds.
 
So this is where I land... I appreciate a great tool to help me get better data on vulnerabilities.  I realize that the vulnerability prioritization has more uses beyond patching. But because my IT operational resources are so constrained, it's marginally useful. 

Paper plating 
Well, if we'd virtualized/cloudified/automated all our critical 24x7 servers, then this patching itself gets easier with "paper plating". Paper plating (I heard this term somewhere) refers to not washing the dish (aka patching the live box) but throwing it away and getting a fresh one. With virtualization, automation, data segregation and teensiest bit of high availability, you can clone+patch+swap the running instance with minimal interruption, so patching becomes a minor housekeeping chore.  In this case, we could afford to be choosy about patches and even roll them out continuously based on vulnerability priority. Unfortunately for us, until most of the operationally critical (but not politically critical) internal boxes are still physical and we are subject to porcelain plating... and usually a patch-all-at-once mode since it's easier.

 

 

Friday, April 10, 2015

Questioning your security data

Everyone's getting excited about big data for security.  The tools get fancier and give you a lot more data, but I barely see folks using them for much else than producing pretty graphs for management.  Some security pros use this data to figure out where to focus their remediation.  Some seem to think that our magic SEIM boxes will detect the RFC 3514 tagged packets.  No such luck.

As Lévi-Strauss said, “The scientist is not a person who gives the right answers, he's one who asks the right questions.”  So what am I trying to get out of my Big Pile of Security data? 

Have I been pwned?

I want to know which of my boxes have been hacked, what are the effects and how certain can we be about that.  The effects that I’m concerned about are: did they steal data or credentials?  Did they plant malware?  Or is it just ransomware or for click fraud?

Am I under active targeted attack?

 Just being on the Internet means Im under attack.  I want know if someone has singled my organization out and knows where to hit me and how hard.  And how long has this been going on?  Can I identify the attackers, what else have they done that might be slipping under my radar?


What is changing on my critical systems?

Change management is a critical piece of good IT hygiene and operational sanity.  The key is detecting unauthorized changes.   When things start getting modified that you dont know about, it almost always means something bad is going on.


What is the state of the background radiation of the Internet?

I also want to know about the non-directed attacks that are just sweeping across the Internet.  What is being targeted now?  What ports are getting a lot of attention?  Does what Im seeing match up to what everyone else is seeing?  How visible is my Internet footprint?  What kinds of exploits are in use?  What malware is popular this week?

What can you tell me all about this particular IP address?

Is it a customer?  Is it one of my road warriors?  Is it a security researcher or a vulnerability tester?  Is it someone Ive never seen before? How suspicious has it been?  What do other organizations think about it?

What are you seeing that you didnt see yesterday?

As Marcus Ranum said, "By definition, something we have never seen before is anomalous  It may be evil or maybe that Operations changed something.  But its always useful to know.

So that's what I'm looking for in my great collections of security data.  What are you doing with yours?