Continuing to building on my earlier post, "
Questioning your security data", I'm now going to dive into looking into looking at active attacks.
The IT Operations team complained about the rampup of SSH attacks
recently. These particular services were protected by a variety of
controls (
which I won't get into). One of these controls is a detective controls which generates alerts when the service is attacked.
Historically,
I've found this particular control has a very low false positive rate,
which makes me happy. The bad news is that the alert goes via email,
which makes it hard to suck into my ELK stack. The good news it that
I've got a nice historical archive of these alerts. I've got about 37k
logs of attacks against 4 different SSH servers in 3 different locations
from Jan 2013 to present.
Well, I've had
Jay & Bob's book sitting around so now I figured it was time to do some
Data Driven Security.
Note: the purpose of this post is to give you ideas and pointers to
learn some new techniques. I'm not going to define an exact
step-by-step on how to do this. If you want to learn more on specific
commands, follow the links or drop a comment. Also, I know my code is
crude and lame, if you've got pointers to improve, comment away.
First off, the data exists as email alerts. I was able to simply do an
export from Outlook to text (file->save-as) into one large
msg file. So the contents of the file look like:
What's nice is that we've got a real time GEOIP look up, which is handy
when reviewing data that's a few years old and IP ownership might have
shifted. The question is how to get this into a form usable for
analysis? The answer for me was
AWK. This quick and ugly AWK script quickly tears through this text file:
BEGIN { FS = " " ; x = 0 }
$1 == "Sent:" { month=$3; day=$4; year=$5;str = day; sub(/,/, "", day); time1=$6; time2=$7; }
month == "January" { month="1" }
month == "February" { month="2" }
month == "March" { month="3" }
month == "April" { month="4" }
month == "May" { month="5" }
month == "June" { month="6" }
month == "July" { month="7" }
month == "August" { month="8" }
month == "September" { month="9" }
month == "October" { month="10" }
month == "November" { month="11" }
month == "December" { month="12" }
$1 == "Subject:" { who=$5; where=$2 ; }
$1 == "country:" { print where "," who "," month "/" day "/" year " " time1 " " time2 ", " toupper($2); }
$1 == "missing" { print where "," who "," month "/" day "/" year " " time1 " " time2 ", " who; }
END { print " " }
You
can see that my script works email headers and mostly converts the date
into a format that's easier for a machine to read. I just shove my file
through this script and it gives me a nice CSV that looks like this:
Column one is the targeted SSH system (I'm giving them some
cryptic names here based on asset classification), the attacker's IP,
the date/time of the alarm and the GEOIP lookup. Nice. I could pull
this into Excel and bang away at it, but instead I'll yank into R for
some deep analysis.
A quick aside, I'm using RStudio for this example, so
At the R command prompt, I'm pull in the csv file into a dataframe with
alertlist <- file="alerts.csv" pre="" read.csv="">->
And name columns:
colnames(alertlist) <- arget="" c="" ountry="" pre="" ttackerip="" ventdate="">->
Oh, and I better convert the dates into R format:
alertlist$Eventdate <- alertlist="" as.date="" d="" m="" pre="" ventdate="">->
which yields
Now I'm all set to do some analysis. A quickie summary already tells me lots of interesting things:
Stay tunes for
Part 2 where I do some of that.