Data leakage has become a hot topic in information security. But what if you can't afford the tools that are specifically designed to keep employees from intentionally or mistakenly leaking private or valuable corporate data to the outside? It turns out there are some creative ways to use what you have (or can easily get) to tackle the problem.
Data can leak from a network in many ways, and the focus here will be on understanding what exits your network and how it can best be protected. Other leakage issues, such as lost, unencrypted USB drives or laptops, will not be discussed.
The Beginning: browsing
The place to start is with the fundamentals. When was the last time you sat down and had a good look at your network? This sounds like a huge, unorganized waste of time, but you might be surprised.
Open up whatever tool fits your needs (Windows or Mac Explorer, Samba for Linux) and spend some random time checking out what you have out there. You probably have an asset inventory that is always up to date and captures everything, but sometimes it takes looking at the same thing in a different way to truly understand what is out there.
Manual browsing also allows you to do some things that you wouldn't normally pick up in an asset management system, such as finding open file shares or other resources that may provide data. Multi-purpose scan/fax/print machines are one such repository that aren't treated like normal repositories with appropriate permissions and such.
If browsing isn't your thing, then how about nmap? The Network Mapper, written by Gordon "Fyodor" Lyon, has been around for a long time. It has many uses, one of which is literally mapping out what you have on your network. So, for example, say you wanted to see what systems were available on a particular class B subnet using just ping. You might issue the following command:
nmap -v -sP 10.150.1-255.1.255 -oN scan_results
This would then report back the list of hosts, up and down, into a file called "scan_results". The entries would look something like this excerpt:
Host 10.150.9.153 appears to be down.Host monkey (10.150.9.154) appears to be up.MAC Address: 00:13:21:60:17:28 (Hewlett Packard)
Any IP address that doesn't have a live host associated with it reports as down. The second entry, monkey, did respond to the probe with the IP address and in this case, since we were on the same subnet as the one scanned, the MAC address with the associated card manufacturer. Already, with just this little scan, we have learned we have a lot more printers than we thought were available, as well as some other interesting responses to investigate later. This is just the start.
If for some reason the ping option doesn't work, experiment with the various options that nmap offers by reviewing the information on the Web site or by running nmap -h to get the help file. Nmap runs on *nix boxes or on Windows.
We've browsed and scanned; let's try to get a picture of what we just did. Visual representation of all of this information sometimes also helps narrow in on things that might not have been obvious before.
A great tool is zenmap, a GUI version of nmap. Zenmap has a useful feature called Fisheye that helps you visually manage the hosts that you have found in your nmap scan. At times, seeing the hosts all in a picture helps you see trends that you might not be able to see otherwise.
Zenmap also helps you scope out services and open ports in a summary fashion that you might not otherwise be able to see compiled. Lastly, you can use the services of vulnerability scanners such as MBSA or Nessus to start understanding what types of services are available, which should coincide nicely with your regular vulnerability scanning processes.
Getting close to your applications
As you continue on your hunt for potential data leaks, one major area is often overlooked. Do you really know what applications are installed on your network? The nmap and vulnerability scanning can help identify open ports which, in turn, should help you identify applications on your network you may not of have known about.
For example, here's a sample of an application log for a web application:
3/2/2008 20:26:58 W3SVC1 WEB01 192.168.1.12 GET /organization/blah/legacy/VendorDirectory/default.aspx - 80 - 71.x.x.56 HTTP/1.1 Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.4322) __utma=178966772.771491592.1178202784.1202835103.1204482274.80;+__utmz=178966772.1194020027.58.1.utmccn=(direct)|utmcsr=(direct)|utmcmd=(none);+__utmb=178966772;+__utmc=178966772;+ASPSESSIONIDQQQDACSA=XXKBBHXXKJDCBBBIPCGIKILEM https:// www.sample.com /organization/blah/legacy/VendorDirectory/(cuuoki452rllrb3nzlpk3j)/frame_Main.aspx www.sample.com:443 200 0 0 7298 825 3328
3/2/2008 20:36:44 W3SVC1 WEB01 192.168.1.12 POST /organization/blah/Legacy/VendorDirectory/ABC/frm_BillingDetail.aspx - 80 - 71.x.x.56 HTTP/1.1 Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.4322) __utma=178966772.771491592.1178202784.1202835103.1204482274.80;+__utmz=178966772.1194020027.58.1.utmccn=(direct)|utmcsr=(direct)|utmcmd=(none);+__utmb=178966772;+__utmc=178966772;+ASPSESSIONIDQQQDACSA= XXKBBHXXKJDCBBBIPCGIKILEM https:// www.sample.com /organization/blah/Legacy/VendorDirectory/(iid89cv9kpvx4m45iculg145)/ABC/frm_BillingDetail.aspx www.sample.com:443 200 0 0 15409 5819 4265
This is not enjoyable to read, but it might turn up something you didn't know, like the fact IP address 71.x.x.56 is accessing this page at 8:30 p.m., and even more interesting, posting to some form called "BillingDetail.aspx". In this case, you may not have data leakage (perhaps the reverse, an integrity issue), but you see something is going on and maybe you ought to have a closer look at the account that logged in and posted the data.
How about that e-mail?
While looking at log files can provide a treasure trove of information about data leakage, e-mail is another area that needs special attention. Have you ever actually monitored what is sent out over e-mail? The answer is likely no unless you have a specific device to do this. If you don't have that device, then an open source intrusion-detection system can become an e-mail filtering tool as well.
Let's take Snort for example. Snort, started by Martin Roesch is one of the oldest open source, signature-based, IPS/IDS systems available today and can be used for much more. Over time, the open source community has come up with a variety of rules to look for data that shouldn't be traversing your network in clear text. For example, do you want to know if Social Security numbers or credit card numbers are leaving your network? They have a signature for you.
# The word "private"#alert tcp $SMTP_SERVERS any -> $EXTERNAL_NET 25 (msg:"ET POLICY SMTP Private"; flow:to_server,established; content:"Subject|3A|"; pcre:"/\Wprivate\W(?!/(25)?X[1-9])/ism"; classtype:policy-violation; reference:url,doc.emergingthreats.net/bin/view/Main/2002458; reference:url,www.emergingthreats.net/cgi-bin/cvsweb.cgi/sigs/POLICY/POLICY_Classified_Information; sid:2002458; rev:3;)
This particular rule looks for the data label "private" in e-mail, but there are a range of other labels that can be used as well, like confidential, restricted and top secret, as well as rules that will look over http. These of course assume you label your data, but you can get the gist here of how these rules can apply to other areas.
By placing snort sensors with adequate network cards around your network, choke points can provide just what you need without a lot of extra cost. The nice part about this technique is that the logs tell you exactly the source of the leak. This is very helpful. Snort runs on *nix and Windows.
OK, you found all this stuff, what now?
You went on a treasure hunt and found ... stuff. Besides sounding all the alarms and running around the building, you can now set up a plan to start securing and monitoring the stuff.
First things first: does the data actually belong where you found it? Once found and validated, you can look at the processes around the data (e.g., is there a development or QA environment? Do you keep copies of live data there? Where else are the copies and reports of this data?) and tighten down the controls around where the data exists.
This is the hard part -- where we impose rules that make people uncomfortable. So role up your sleeves and let's get started.
One important note as you start your endeavor: remember that when you use free tools (OK, any tool), you need to validate what you see. Running off and accusing the network guys that they have an MP3 server plugged directly into the Internet is not a good thing to do without validating first. Also, it really is crucial that proper data classification be done before you start on your quest. Otherwise, what are you looking for? How do you know what to tackle first? Prioritization will only come after you know what is important.
Take the time, meet with the business owners and understand what they are doing before you make any further recommendations.
Once you have validated that there really is a problem, you will need to prioritize your plan of attack because you can't do everything at once. Take a look at your data classification policy and tackle the most sensitive data first. Make a project plan and share this with stakeholders so that everyone is on the same page.
The majority of your work entails relocating the data to a secure location. Some of the tasks will include ensuring that copies of confidential data are either deleted or moved to a more secure location where it can be protected. In addition, you need to make plans to educate the people who saved the information to the wrong place on how to properly store the data so the problem does not proliferate.
Once you have determined where that the data belongs, you need to ensure that only the people who should have access do have access. Time to review current permissions on the repositories, and the memberships of the groups that have these permissions. You need to ask the questions:
* Who owns the data?
* Have they reviewed who currently has access?
* Should this user or group have permission?
* What permissions should they have?
An in-depth review will help show if what is in place is adequate. Then test it, and continue to do so periodically. Making sure the controls really work on a continual basis is critical to ongoing assurance that it's protected.
Finding the nooks and crannies of your network from which data can leak isn't an easy task. You'll need some patience and determination to do it properly. You also probably won't uncover all the data that is available for mishandling. But, you can make an effort to clean up all of that random data that lies in wait to be improperly leaked.
Please also note these are just some suggestions about a creative and proactive approach to finding potential sources of data leakage on your network. There are many other ways to do this as well. Let me know what works for you.