“Don’t lower your expectations to meet your performance. Raise your level of performance to meet your expectations. Expect the best of yourself, and then do what is necessary to make it a reality.” — Ralph Marston
While one should not rush to be an early adapter, it is good to keep in mind what is coming down the pike. If you are doing work with an network intrusion detection system, such as Bro, FreeBSD 7 looks to have some key performance improvements that makes it a solid choice. The post titled, “FreeBSD 7 will be revolutionary” made this observation:
Also of importance are the improvements to the networking stack. With gigabit (or faster) network cards being the norm these days, FreeBSD’s support for TCP/IP Segmentation Offload (TSO) and Large Receive Offload (LRO) will no doubt prove to be very useful. Along with the new sendfile() implementation, and the improved sosend() functionality, we will likely see some large networking performance boosts.
The latest developments in FreeBSD can be found at the “What’s cooking for FreeBSD 7?” page. Several performance improvements are outlined by Kris Kennaway of the FreeBSD Project in his presentation titled, “Introducing FreeBSD 7.0.’ The presentation compares performance increases shown by PostgreSQL and MySQL. These database packages utilize some complex operations that demonstrate the improvements under FreeBSD 7.
How does that help Bro? That requires some examination and explanation. From the “Hardware and Software Requirements” section of the Bro Wiki, it states:
Operating System Recommended: FreeBSD. Bro works with many Unix systems, including Linux and Solaris, but has been primarily tuned for FreeBSD. We currently recommend using FreeBSD version 4.10 for Bro. If your site has a large number of packets or connections per second you shouldlook at the section on Hardware and OS Tuning. FreeBSD 5.x should work, but is not quite as fast as 4.10.
Before you start looking for version of FreeBSD 4.10 on ebay, there have been significant improvements made in later versions of FreeBSD. Note that there is no date associated with this Wiki entry. The lack of mention of FreeBSD 6.x indicates the entry was made prior to 6.x being released. This appears to be an outdated post. To provide a better sense of performance between the various FreeBSD versions, Chris Buechler wrote the blog entry titled “Network Performance Update.” Chris describe, in relation to the pfSense firewall/server platform:
m0n0wall 1.2 still makes us look silly (1.5 times as fast), but that’s to be expected with its FreeBSD 4.x base. FreeBSD 6.2 has closed that gap considerably from the disaster that was FreeBSD 5.x, and FreeBSD 7 looks to draw nearer to 4.x performance. Note that I’m strictly talking about single processor machines, SMP systems are a much different story, but I won’t comment on those until I get a chance to do some testing.
Holding the comparisons to a single processor machine is an important point. There have been discussions on the Bro mailing list concerning whether multiple processors are helpful. How FreeBSD performs with multiple processors depend on what version is being used. We will examine that a little later. Robin Sommer, one of the Bro developers stated, “All of the main analysis is done in a single process and not able to make use of multiple CPUs.” It was reported that top-of-the-line dual Xeon CPUs (>$4,000 of CPU) performed ~5% better than a single PentiumD at under $500.
What version and hardware setup are the Bro developers using and what recommendation would they make on tuning the operating system? The article titled, “Operational Experiences with High Volume Network Intrusion Detection” by Holger Dreger, Anja Feldmann, Vern Paxson, and Robin Sommer stated that the system they were using for high volume network intrusion detection was “the primary NIDS monitor is a Dual Athlon MP 1800+ with 2 GB Memory, currently running FreeBSD 5.2.1. It is connected via a Gigabit Ethernet.” They tested it against, “The others are separate Athlon XP 2600+ based systems with 1 GB of RAM running Linux 2.4.” Their goal was to compare tuning parameters. In Robbin’s thesis, he states in order to get the performance they desired they, “patched the kernel to increase the NIC driver’s internal receive buffers. Moreover, we patched the packet-capture sub-system to increase its buffers by three orders of magnitude.”
Kris Kennaway did a very interesting presentation comparing FreeBSD 4.x, 5.x, and 6.x titled “Filesystem Performance on FreeBSD.” Kris tested on a 4 CPU AMD64 system, so he could only test 5.4 vs 6.0 (the latest release at that time). The results will depend on what is being tested. For full details, please view the presentation. An interesting result was that FreeBSD 6.0 performed 30% faster than 4.11 for concurrent writes. 6.0 was 15% faster than 5.4 for concurrent reads. Kris has also done some performance test involving BIND and found FreeBSD 7 had a 60% higher peak performance over version 6.1. The point is, it would appear that later version of FreeBSD made significant performance improvements over FreeBSD 5.2.1 (the version used by the Bro development team in their paper). FreeBSD 5.x and FreeBSD 6.x where to some degree transitional operating systems moving 4.x to 7.x.
To help make all these performance reports make sense, let’s begin with an overview/history of the versions pulling liberally from Kris’s presentation, “Introducing FreeBSD 7.0 “:
FreeBSD 4.x is a single-threaded kernel with limited multiprocessor support.
- Able to run user code on multiple processors
- Only one process at a time can execute in the kernel (“Giant lock” around entire kernel)
- Device interrupts may be processed in parallel, subject to some constraints
The historical BSD kernel architecture worked very well for single-processor systems. It fundamentally does not scale to multi-processor systems, which are now becoming universal.
In the Bro discussion list, performance is often discussed. The FreeBSD group started working on multiprocessor support with the SMPng project. Bascially, we see the development in FreeBSD 5.x and 6.x:
FreeBSD 5.0-5.2.1 (2003-01-17 – 2004-02-22)
Debut of the new architectural model for symmetric multiprocessor support in FreeBSD.
FreeBSD 5.3 (2004-11-06), 5.4 (2005-05-09)
- The fundamental architectural changes were largely in place
- Some initial progress with kernel parallelism by 5.3 and 5.4 (network stack, virtual memory, …)”
SMPng was improved in 6.x:
FreeBSD 6.0 (2005-11-01), 6.1 (2006-05-08), 6.2 (2007-01-15)
- Stabilized the work of the 5.x branch
- Performance benefits from subsequent development work
e.g. Virtual File System (VFS) and Unix File System (UFS) now allow parallel access- Large parts of the kernel may now operate in parallel, with significant performance gains on many common workloads
With FreeBSD 7.x, the kernel will be a fully parallel system. The “Giant lock” is no longer present on almost all possible workloads. Major shift of focus from correctness to optimization. The above mentioned document will demonstrate impressive results. The document lists the following improvements:
New filesystems
- ZFS
- Sun’s amazing new filesystem moves the goalposts. Stay tuned for more in the presentation from Pawel.
- unionfs: overlay multiple filesystem hierarchies into one. Broken for many years but now usable again.
- XFS support (read-only)
- CODA distributed filesystem support fixed
- UFS quotas are now parallelized
- NFS client and server parallelized
- Performance improvements for NFS client
- SCSI layer (CAM) is now parallelized, including many drivers. Performance benefits for SCSI device access.
- iSCSI initiator (in base system) and target (in ports), allowing remote exporting and local mounting of SCSI devices over TCP/IP
New GEOM (pluggable storage layer) modules
- gjournal; block level journalling provider (can be used with UFS for journalling support)
- gvirstor; virtualized storage provider (create a huge disk image sparsely populated with disks, add more later)
- gcache; read cache for storage layers with small request sizes
- gmultipath; support for multiple paths to the same storage provider (fiber channel, etc)
- gpart; virtualized partitioning support (GPT, APM, …)
Network Stack Changes
- Complete elimination of giant lock from network stack
- On-going cleanup and development work
- Socket buffer automatic sizing; dynamically responds to network conditions for improved throughput
- SCTP (Stream Control Transmission Protocol)
- Migration from KAME IPSec to Fast IPSec
- Improved performance
- Hardware acceleration with cryptographic accelerators
- Both IPv4 and IPv6
- Direct dispatch of inbound network traffic
- Avoids context switching, improves CPU cache locality, allows concurrency
- Significant performance benefits on many workloads
- Optional in-kernel Just-In-Time compiler for Berkeley Packet Filter (BPF) programs (tcpdump, etc)
- In-kernel Network Address Translation (NAT) modules for natd(8)
- Link aggregation (create virtual interfaces for fault tolerance and higher capacity)
- Rapid spanning tree protocol support
Network Drivers
- Support for commonly encountered 10 gigabit ethernet drivers: Chelsio (cxgb), Intel (ixgbe), Myricom (mxge), Neterion (nxge)
- Transmit Segmentation Off-load (TSO)/Large Receive Off-load (LRO); off-load send/receive into the ethernet driver
- New devices supported
Wireless
- Wireless 802.11 layer is stable
- high power ath cards (Senao, Ubiquiti, Wistron)
- 900MHz ath cards (Ubiquiti, Zcomax)
- ath (Atheros), iwi, ral (Ralink), ural (RT2500USB) drivers are high quality
- New drivers
- rum (Ralink RT2500USB, RT2601USB)
- Intel wireless drivers: ipw (Intel PRO/Wireless 2100), iwi (2200BG/2225BG/2915ABG) works out of the box
- ZyDAS ZD1211/ZD1211B
- WPA (Wifi Protected Access) support stable
- New scanning support (background scanning, roaming)
- Atheros protocol extensions 802.11n support (forthcoming standard)
- higher performance: up to 135 Mb/sec, channel bonding, improved range, etc
- drivers not yet committed
- Preparation for future changes (virtual access points, etc)
New CPU Architectures
- Improved support for ARM architecture
- Improved AT91RM9200 (Atmel) support
- support for Avila Gateworks Xscale boards was added, including a rewrite of the Intel code
- permission from Intel to bundle u-code
- Boot loader can load from Secure Digital (SD) flash cards
- FreeBSD/ARM used as the basis for growing number of embedded devices
- Sun Ultrasparc T1 (preliminary)
- 8 cores, 4 threads per core = 32 logical CPUs per package
- A very interesting new CPU architecture, and one to watch in the future
- T2: 8 threads * 8 cores = 64 logical CPUs per package!
- X-box!
Security Subsystems: Audit subsystem
- Fine-grained, configurable logging of security-relevant events: System calls, application and user space activities
- Now available by default in GENERIC kernel
- Originally developed for Mac OS X, ported and enhanced. A nice example of code-sharing in the Apple ! FreeBSD direction
- Builds on the other advanced security features developed by the TrustedBSD project for FreeBSD priv(9) API
- common interface for kernel privilege checking
- Privilege model can be modified by Mandatory Access Control (MAC) modules
User-level Changes
- Many updates to system applications and utilities
- cached: caches queries to nsswitch (\name service switch”: user/group/host lookups) for improved performance
- Ports collection
- Currently contains 17692 ported third-party applications (1774 more than 6.2)
- Major changes since 6.2:
- X.org 7.3 (many improvements, e.g. working composite support for improved visual effects)
- KDE 3.5.7
- GNOME 2.18.3
- More than 24000 other changes and updates
Performance
- Performance optimizations throughout the system
- The ULE scheduler is now recommended instead of the historical 4BSD scheduler
- Better interactive performance on desktop systems
- Significantly better performance on SMP systems
- 4BSD will remain the default scheduler in 7.0 to be conservative, but likely to switch for 7.1
- If you find a workload that FreeBSD 7.0 performs poorly on, we want to hear about it!
Other Kernel Changes
- Partial linux 2.6.16 emulation support (not enabled by
default)- Support for Message Signaled Interrupts (MSI) and Extended Message Signaled Interrupts (MSI-X)
- IPMI (Intelligent Platform Management Interface); monitoring system hardware.
- Improved support for legacy-free hardware (e.g. MacBook pro)
- FireWire support for the boot loader
- Asynchronous I/O (AIO) support is parallelized: used by e.g. qemu
- New pseudo-tty system, allocates on demand without built in limits, and without requiring root privilege
Development Tools Internal
- GCC 4.2.1
- Improvements to hwpmc (CPU performance counters)
- symbol versioning added to several libraries
- New scalable malloc(3) (jemalloc)
- Optimized kernel locking primitives (sx, rwlocks)
- POSIX message queues
- contigmalloc(9) with buddy allocator
- kernel malloc(9) red zone debugging support
- Improved kernel lock profiling infrastructure
- mini-dumps
It looks like FreeBSD has some nice changes that should be released shorty. The changes will position FreeBSD well to deal with the high network demand that a Bro box may encounter.