Deployment
When deploying AKIPS, the following information can help to guide your deployment approach (VM or bare metal) and determine the hardware specifications required for your network
Platform
Recommendation
Specifying a VM or bare metal platform is difficult because every network is different (i.e. number of users, devices, polled MIB objects, syslog/trap/NetFlow rates). AKIPS recommends starting with a VM installation to determine a resource baseline required for monitoring your infrastructure and then increase the CPU/RAM/Storage resources as needed.
As a general rule, we recommend:
a. VM Deployment
- Commercial grade VM (e.g. VMware)
- Dedicated CPU cores
- Ample RAM (50% free)
- SAN with thick provisioned preallocated storage
OR
b. Physical / Bare Metal
- Off the shelf server (e.g. Cisco, Dell, HP, IBM, etc)
- Ample RAM (50% free)
- RAID 1 or 10
NOTE: Before purchasing physical hardware, contact AKIPS support with your intended vendor/model/spec so we can confirm the operating system has the appropriate disk and Ethernet controller driver support.
AKIPS is known to work on the following virtual machine platforms:
VMware
VirtualBox
Hyper-V
KVM
Minimum recommended platform
Network Size | Minimum Platform |
---|---|
Small 50,000 interfaces | • Virtual Machine • 2+ CPU Cores • 8 GB RAM • 200 GB disk space |
Medium 100,000 interfaces | • Virtual Machine • 4+ CPU Cores • 16 GB RAM • 500 GB disk space |
Large 250,000 interfaces | • Virtual Machine • 8+ CPU Cores • 32 GB RAM • 1 TB disk space |
System resources
Ping and SNMP polling
Syslog and SNMP trap
NetFlow
The AKIPS flow collector and meters were engineered in the expectation of a large number of flow records (e.g. 1 million flows per second) from a small number of flow exporters (e.g. 50 to 100). The software performs as expected in that environment when ample CPU cores and RAM is available.
What was unexpected was customers wanting to send flows from 1000s of flow exporters. A flow meter process is started for each flow exporter, which means 1000s of concurrent meter processes. This issue is being investigated and will be rectified by allowing a meter process to handle data from many flow exporters, therefore significantly reducing the number of running processes.
Increasing the specs of a VM
The procedure of increasing CPU/RAM/Storage sizes in a VM is simple:
- Shutdown the VM using the Admin -> System -> System Shutdown menu.
- Wait for the VM to completely shutdown.
- Increase the number of CPU cores, memory size or disk space.
- Start up the VM.
The AKIPS startup script will automagically detect the expanded disk space and do the appropriate partition and file system commands.
System performance graphs
AKIPS provides internal system and application performance information under the Admin -> Performance menus. The important things to note are:
System graphs
- Memory usage should be fairly static over a day. Assign enough memory so the memory usage graph generally shows less than 50% usage. Lots of free memory is always useful because the operating system will consume as much memory as you give it (e.g. for disk caching).
- CPU load, System Calls, Context Switches, Interrupts and Disk I/O will spike every 80 minutes when the background data processing occurs. This is normal.
Poller graphs
- Ping rate should be constant.
- SNMP requests should start at second 5 and complete by second 45 (i.e. a 40 second polling window each minute).
- Poller memory should be constant.
- Poller CPU usage should be under 50%. In most cases it will be below 10%.
- Poller Context Switches should be mostly Voluntary. If there are a lot of Involuntary context switches, then additional CPU cores may be required. High levels of involuntary context switches is a sign of process CPU contention.
Database graphs
- Compression Runtime should be less than 20 minutes. The database compression works on 30 day data blocks. At the end of 30 days, a new block will be created and the compression runtime will drop. The compression runtime is usually CPU limited. Adding CPU cores to the system should have a linear decrease in the compression runtime, unless the limiting factor is the back-end storage speed.
- Rotation Runtime should be less than Compression Runtime. The limiting factor will be the storage speed. A database file rotation occurs when they become more than 1% fragmented.
CPU
General notes
- The number of required CPU cores depends entirely on the size of your configuration (i.e. number of monitored devices, MIB objects, syslog/trap rate, NetFlow exporters and flows/sec).
- Hyperthreading on modern Xeon, Core i3/5/7 CPUs works fine. Leave it turned on.
- In a VM environment, always assign dedicated CPU cores. Do NOT over provision CPU cores. Over provisioning CPU cores will lead to significant pauses in real-time data polling and processing, and large jumps in time.
Number of CPU cores
CPU clock speed
Comparing raw CPU core clock speeds is a fairly meaningless due to differences in core architectures (e.g. number of on die cores, L1/2/3 cache sizes and speeds). AKIPS performs various CPU speed tests for gzip/md5/sha which can be viewed in the Admin -> System -> System Info menu.
The following are some examples:
Seconds | |||
---|---|---|---|
CPU Model | GZIP | MD5 | SHA |
Xeon E5-2683 v3 2.00GHz | 1.7 | 2.9 | 3.6 |
Xeon E5-2660 2.20GHz | 1.9 | 4.0 | 3.7 |
Xeon E5-2670 v3 2.30GHz | 1.4 | 2.8 | 2.9 |
Xeon E5-2630L v2 2.40GHz | 1.5 | 3.0 | 3.3 |
Xeon E5-2670 2.60GHz | 1.2 | 2.7 | 3.3 |
Xeon E5-4650 2.70GHz | 1.3 | 2.8 | 3.4 |
Xeon X5660 2.80GHz | 2.6 | 3.7 | 4.8 |
Core i5-2500K 3.30GHz | 1.1 | 2.4 | 2.9 |
Core i7-5820K 3.30GHz | 1.1 | 2.2 | 2.3 |
Memory
Memory speed is fairly critical for performance. The Admin -> System -> System Info menu will display the memory speed of your system. A value of 8 Gigabytes/sec or greater is recommended. Older/legacy systems appear to have poor memory speed (e.g. 5 Gigabytes/sec or less).
Over provisioning memory in a VMware VM works fine because AKIPS loads the necessary kernel module that performs memory ballooning. Memory ballooning allows the guest VM to gracefully hand back unused free memory to the host machine.
Storage
Storage size
UNIX file systems require plenty of spare space so they can write files out sequentially. In a VM it isn’t such an issue because increasing the storage size is trivial. When deploying on physical hardware, it’s best to install enough disk space up front for the entire life cycle of the box (e.g. several terabytes). Disks are cheap. Contact AKIPS support if unsure on disk space requirements.
Sequential read / write performance
Databases typically access storage in a random order, but AKIPS databases are arranged in a manner so the majority of read/write I/O is performed sequentially. The large databases are repacked if they become more than 1% fragmented. Good sequential I/O performance is important in large installations.
Spindles vs SSD
A modern SATA 2Tbyte disk typically gets over 200Mbytes/sec sequential transfer rates, whereas a SSD typically gets ~400Mbytes/sec read, but somewhat slower write performance because SSD uses a copy-on-write mechanism where every write operation has to be written to a zeroed disk block. That is how SSD works. The painfully slow part of SSD is zeroing disk blocks. If there are no zeroed blocks available for a write operation, write performance falls off a cliff while unused blocks are zeroed.
Having a large pool of pre-zeroed blocks greatly enhances consistent write performance. The SSD trim feature (turned on by default in AKIPS) allows the operating system to inform the SSD when a disk block can be zeroed. Some SSDs also have a hidden pool of pre-zeroed blocks.
DAS vs SAN vs NAS
Storage types:
- DAS – Direct Access Storage
Physical disks presented as:- JBOD (Just a Bunch Of Disks)
- RAID 0 (stripe)
- RAID 1 (mirror)
- RAID 5/6 (parity)
- RAID 10 (stripe/mirror)
- SAN – Storage Area Network
- NAS – Network Attached Storage (e.g. 10G over NFS)
AKIPS preferred order of storage:
- SAN
- DAS RAID 10
- DAS RAID 1
- DAS RAID 0
- DAS JBOD
- NAS (thick provisioned)
DAS and SAN provide efficient “block level” storage to the operating system, whereas a NAS is just a “file store” accessed over 10G Ethernet/IP/NFS. A NAS will have significantly higher latency and fragmentation performance issues compared to a SAN/DAS.
Thick vs thin provisioning
- Thick provisioning – storage is preallocated when created (preferred)
- Thin provisioning – storage is allocated on-the-fly (slow, poor performance)
Do NOT use thin provisioned dynamically allocated storage. It ALWAYS leads to massive database performance problems due to fragmentation. AKIPS reads/writes large sequential database files and expects minimal underlying block level fragmentation and latency.
Using thin provisioned storage is also pointless because AKIPS uses a copy-on-write file system, therefore all disk blocks on the virtual storage will be quickly allocated and consumed, but in a highly fragmented order.