Mark Twain once said, “The reports of my death are greatly exaggerated.” A thousands apologies for not posting in awhile. My only excuse is that I have been buried in work. Even today’s post will be brief. I wanted a few moments to indicate some of the work I am doing and provide a few pointers. I hope to follow this post with more details later.
First, a little about some of the work. I have had to evaluate IPs for an indication of their security threat. One method of evaluation is to compare the IPs to know bad actors. In this post, we will discuss a few data sources that are freely available, a few software packages that might prove useful, and finish up pointing to some sources for further evaluation.
You can use various data feeds. Misbehaving IPs that are identified by your IDSP/IPS, honeypots, firewall logs, router logs, syslog servers, etc. will be of particular interest, being specific to your organization. For the sake of discussion, I wanted to point out some freely available sources of IPs that are blacklisted by the Internet community.
- The Harimau Watchlist – Mel Mudin (spoonfork) provides this valuable source of information. Please read his post, “The Harimau Watchlist” for additional information. The information is updated daily.
- Malware Domain Blocklist – this information is maintained as part of the DNS-BH project and represents a list of domains that are known to be used to propagate malware and spyware.
The sources for the Harimau Watchlist include:
- Dshield Top IPs
- Dshield Top Blocks
- ShadowServer’s Know Russian Business Network
- ShadowServer’s Known Bot Command & Control IPs/Blocks
- EmergingThreats Known Compromised IPs/Blocks
- Spamhaus Top IPs
- Atlas (Arbor Networks) Top Threat Source
- TrustedSource.org Top Email Senders
- TrustedSource.org Most Active Storm Web Proxies
- TrustedSource.org Most Newly Activated Storm Web Proxies
- TrustedSource.org Most Recently Seen Storm Web Proxies
- Projecthoneypot.org’s Most Recent Email Harvesters
- Projecthoneypot.org’s Most Recent Spam Servers
- Projecthoneypot.org’s Most Recent Comment Spammers
- Projecthoneypot.org’s Most Recent Dictionary Attackers
- Senderbase.org Top 100 Spammers
- Senderbase.org Top 100 Virus Senders
I will not go into details now, but it is easy enough to setup a cron job to pull the information down and add the IPs to a database. If you decide to do this in Perl, a few modules that will come in handy:
- LWP::UserAgent – can be used to dispatch web requests.
- DBI – Perl database interface.
- Net::Abuse::Utils – provides functions to lookup information about an IP or ASN. Information includes country code for an IP or ASN, ASN announcing an IP via BGP, CIDR network an IP is announced in, contact email addresses based on IP whois info, contact email addresses for a domain based on abuse.net data, contact email address from the SOA record for the rDNS zone for an IP, and listing information for an IP in a specific DNSBL.
- Geo::IP – provides a simple file-based database. The GeoIP database simply contains IP blocks as keys, and countries as values. The data contains all public IP addresses and should be more complete and accurate than reverse DNS lookups.
- Net::DNS – allows the programmer to perform nearly any type of DNS query.
A few other software packages you will likely use:
- MySQL – is a multi-threaded and multi-user SQL (Structured Query Language) database server.
- GeoLite Country – is similar to the GeoIP Country database, but is slightly less accurate. Please review Instructions on how to use our CSV databases with a SQL database.
- GeoLite City – is similar to the GeoIP City database, but is less accurate.
- Geo/IPfree – Perl module for looking up country of IP Address.
A Few Interesting Possibilities
One thing that can be done with the IPs is to map them using Google Earth. This will require you to create KML files, which are not difficult once you have the IPs along with their DNS and GeoIP data. Two scripts that help generate KML files from security data are:
- Cosight – the security log file visualization tool used by the Colorado ISOC. Cosight parses logfiles looking for connections to or from internet addresses. It then uses the geolocation database from Maxmind to convert those addresses to coordinates for output as a KML overlay file.
- KisGearth – a small perl script to convert kismet xml and gps logfiles to google earth kml files.
A few months ago, I did a post “Unclear and Present Danger.” The post outlined some of the electronic dangers facing an organization on the Internet. Thanks to the fantastic work done by the Shadowserver Foundation, we have a nice collection of some very interesting statistics mapped by country. Those examples can be very useful when mapping misbehaving IPs. Rather than repeat what has previously been posted, I’ll leave it to the reader to visit that entry.
While searching for an interesting way to represent and drill down from continents, to countries etc., I came across GeoTree, a hierarchical toponym browser for GeoNames. GeoNames is part of the Linked Data project, which brings together data from public sources and builds a web of open and free data where data sets are interlinked with each other. The Linked Data project represents a great wealth of information. Below is a mapping done by Richard Cyganiak of the projects involved in the Linked Data projects:
Walter Rafelsberger provides two interesting examples, that can be adapted for security representation and interpretation. Both examples make use of the Processing language. Processing is a data visualization programming language. Read more about Processing on Ben Fry’s or Casey Reas‘ blog.
- Geosketch of world cities with a population of more than 1000, labeling those cities with more than 5 million:
- The second example visualizes conversations of about 1500 users from Twitter. The arcs link positions of people who talk to each other:
Nathan Yau, from Flowing Data posted about “40 Essential Tools and Resources to Visualize Data.” The post contains valuable information with additional resource links. I came across Nathan’s post, while checking out FlowingData’s graphic post “Watching the Growth of Walmart Across America.” I was not able to embed the object. You will need to click on the image to view the growth of Walmart.
What is really nice is that you can downloaded the code, including the Actionscripts with the openings data from FlowingData’s site . With that code other types of growth can be illustrated in a similar manner. That is really nice. Modest Maps, a BSD-licensed display and interaction library for tile-based maps in Flash (ActionScript 2.0 and ActionScript 3.0) and Python was used to map the data. This reminds me of code_swarm:
If you have never watched the code_swarm video, you have to check it out. It was done by Michael Ogawa. The example above shows the commit history of the Eclipse open source project. To quote Michael:
code_swarm, shows the history of commits in a software project. A commit happens when a developer makes changes to the code or documents and transfers them into the central project repository. Both developers and files are represented as moving elements. When a developer commits a file, it lights up and flies towards that developer. Files are colored according to their purpose, such as whether they are source code or a document. If files or developers have not been active for a while, they will fade away. A histogram at the bottom keeps a reminder of what has come before.
It is a great example of visualizing something we traditionally would not think of outside of your run of the mill reports and numbers.
Take a look at Jamie Wilkinson’s post “Obama Wikipedia page edits,” which is a visualization of people who have contributed to the Barack Obama page on Wikipedia between October 2005 – November 2008. Users who edit a lot drift toward the center. Visualized using code_swarm (Processing) and Jamie’s Wikipedia page history parser Wikiswarm (Ruby). Code and instructions on how Jamie created this visualization can be found in his post “Wikiswarm: visualize Wikipedia page histories.”
Most important, the code_swarm source if freely available.
Today we explored a few interesting paths for representing data. Three excellent books to help guide us further on the visualization paths are:
- Security Data Visualization by Greg Conti.
- Applied Security Visualization by Raffael Marty.
- Processing: A Programming Handbook for Visual Designers and Artists by Casey Reas and Ben Fry (forward by John Maeda) .
We have all heard the proverb, “A picture is worth a thousand words.” Another famous quote states, “The devil is in the details.” Or, if you prefer, “God is in the details.” If life was a Star Trek episode, Kirk could have used those two quotes to cause a computer to explode. Both statements are true and false, depending on the circumstances.
It is wise to remember the words of Siddhartha Gautama: “These blind men, every one honest in his contentions and certain of having the truth, formed schools and sects and factions.” Geocoding and data visualization simply provide tools to help interpret information. Interpretations are not absolute. If you are looking for a silver bullet that will help the blind see, and the ignorant smart, I am afraid your search must continue. The author A. L. Linall, Jr. once wrote, “Visualization and belief in a pattern of reality, activates the creative power of realization.” The best solutions will come from using a combination of tools to help explore the possibilities, discover insights, view the results from different views which helps with realization, and provide a way to effectively communicate results.