This is a tutorial I posted on Anti-Online back in 2006 – just thought I’d update it and pass it along. It makes me laugh when I see some of this old scripting “Kung Fu” I had to do with Grep, Awk, Sed in order to do something that takes seconds with a good CLM or SIEM tool!
DISCLAIMER: This is a tutorial of sorts that takes you through a day-to-day problem and solution that I was often faced with in my Security Planning / Operations role for a large Telecommunications company. I am not making any assumption as to where in the curve people reading this will be situated and I don’t even guarantee this will be a good read. In fact, given my exposure and expertise of the tools used in this article, I may be missing the plot and some may find an easier, softer way of doing what I was tasked to do. Having said all of this, for those I’ve confused, sorry, I tried to provide links for further reading. For those I’ve disgusted with my simplicity or seeming Lamer approach, well, like you, I’m always learning and I’m open to criticism and advice.
Why is it when you Google for something you absolutely need you can never find it? Well, case in fact, I had a Squid proxy server left over from a decommissioning project that was still seeing tons of traffic when it shouldn’t be seeing any! The Linux server was locked down using sudo and no one knew the root password so we had very little choices as to what programs we could run to view activity. The server was flaky and Netstat would never finish outputting the current activity. So the server folks approached me and asked if there was any way to find out what unique IP addresses internally were connecting to the five pre-configured proxy ports (8080, 8082, 8084, 8086, 8888).
As it turns out, the Squid admin user had access to the Tcpdump application and could run the application against Eth0. I got him to run Tcpdump and output it to a dump file for three hours worth of activity during the lunch hour web traffic spike. This produced a 470MB text file that I had to SFTP from his server to my Linux box.
Alrighty then! What do I do with a honkin’ text file that repeats the same info endlessly? We have hits from employees and internal servers hitting the proxy ports, the proxy itself establishing connections to the web, the foreign sites replying to the proxy and then, finally, the proxy returns the data to the corporate host. One conversation from an internal host connecting to the homepage of their favorite security tutorial site could warrant four times the number of HTTP flows. I needed to strip out extraneous information and narrow down the million+ lines of data to something sensible. So, I started thinking of the commands that would be required so that eventually I could write a shell script.