Basic Log Storage Calculations

Determining the sizes of log management systems requires knowledge of the number of devices being monitored and the anticipated event rates for each class of system. In many customer engagements, Professional Services time may be required to measure the event rate calculations from all of the monitored devices. This is important since there are too many variables to predict the average or peak Events Per Day (EPS) of any given system. I would caution any customer that if the vendor they are working with gives them “magic” calculations and pricing without gathering the necessary information regarding customer-specific speeds and feeds, they can expect to spend a lot more money later once the vendor gets their foot in the door. Basically, poor planning will result in unavoidable OP/EX costs later.

EPS is one metric used by many log management and SIEM vendors to determine such factors as licensing, storage and peak system loads. Another variable used could be Events Per Day (EPD), especially when it relates to storage sizing and license enforcement. This is why it’s imperative that accurate device counts and product types are audited when planning a centralized log management or SIEM solution.

As an example, a PIX firewall logging via Syslog using a notification level of logging could be anywhere from 1-20 eps, depending on location, susceptibility to un-trusted networks, number of filters, 3DES services (SSH or IPSEC), proper configuration and many other factors specific to each configuration and networks that is intended to protect. If that same firewall were logging with an informational level or debugging level of logging then it would generate between 3-5 times the events that informational level logging would generate.

Next, event size is crucial in properly designing log management as every device vendor will have a different log format, event size, transport mechanism, logging levels, etc.  The difference between a 300 byte message and 700 byte message is significant when you are capturing >1000 EPS (~26 GB/day vs ~60 GB/day).  Syslog messages in accordance with RFC 3164 may not be larger than 1024 bytes, but structured or “normalized” event data can reach upwards to over 5,000 bytes (with padding and fragmentation). In some cases, when a vendor tells you they normalize all of the event data, this simplifies the sizing and capacity planning because every event message, regardless of vendor, will be a consistent size (not to mention easier to read, search, index, etc). Some vendors even allow customers to normalize the and reserves a field within the normalization schema to attach the original “raw” event. This is great for litigation and forensics but almost doubles the storage requirements ! As an example, if the normalized event becomes 1500 bytes (regardless of whether the raw event was only 600 bytes) the final event size, with a 500 byte “raw” event attached, would be somewhere around 2Kb.

One way of measuring the event rates and event sizes for Syslog is to use a protocol analysis tool such as WireShark, Etherpeek or TCPdump to capture the events on the sending or receiving host or off of a spanned port on a layer 2 switch. Filter the capture for only UDP 514. The analysis does not need to capture any payload and can be run for 24 hours. Once the capture is complete, take the total count of UDP 514 packets and divide that number by 86400 (number of seconds in a day) and that should give you a rough average of the total events per second (eps).

Additionally, calculating the EPS generated by a log file is much easier since you just need to count the number of lines captured to a log file in a 24 hour period and, again, divide that number by the number of seconds in a day (86,400). Then you would multiply either EPD or EPS by the message size to determine storage.

My explanation to on sizing log management has always been:

Variables:

RAW event = ~600 bytes

NORM event = ~1500 bytes

DAY = 86,400 seconds

EPS = Events Per Second

EPD = Events Per Day

SIZE = Amount in bytes

DISK = Disk space requirements

COMPRESS = Assume 10:1 ratio


First, we must determine the EPD, therefore:

EPS x DAY = EPD

(i.e 1000 EPS x 86,400 seconds = 86,400,000 EPD or 86.4 MEPD)

Then we must determine how much disk space that will yield depending on whether they are RAW events or normalized events:

EPD x RAW = SIZE

(i.e. 86.4 MEPD x 600 = 51,840,000,000 bytes)

~ or ~

EPD x NORM= SIZE

(i.e. 86.4 MEPD x 1500 = 129,600,000,000 bytes)

Then we need to compress those events with 10:1 compression to get an approximate daily disk requirement. To do this we divide the maximum daily size allocation for events by 10:

SIZE / COMPRESS = DISK (RAW)

(i.e. 52 GB / 10 = 5,184,000,000 bytes)

~ or ~

SIZE / COMPRESS = DISK (NORM)

(i.e. 129 GB / 10 = 12,960,000,000 bytes)

Finally, we determine the annual required disk space by calculating the daily disk requirements by 365:

DISK (RAW) x 365 = YEAR

(i.e. 5,184,000,000 x 365 = 1,892,160,000,000 or 1.8 Terabytes)

~ or ~

DISK (NORM) x 365 = YEAR

(i.e. 12,960,000,000 x 365 = 4,730,400,000,000 or 4.7 Terabytes)

Once the EPS has been determined for each device across all categories of devices, that number can then be summed by the number of monitored devices in total to provide an estimated total average EPS.
To determine the amount of storage requirements for this measure of EPS, use the formula described above which is:

EPD * RAW / 10 * 365 = YEAR (compressed)

~ or ~

EPD * NORM / 10 * 365 = YEAR (compressed)

I hope this is useful in determining the amount of storage required. I have a handy calculator you can use once you determine the EPS for all the different event source types at http://www.netcerebral.com/?p=125#more-125

 

Facebooktwittergoogle_plusredditpinterestlinkedinmailFacebooktwittergoogle_plusredditpinterestlinkedinmail

2 Responses to Basic Log Storage Calculations

  • Awesome!… Good to know and great explanation! 😀

  • Very great post. I simply stumbled upon your blog and wanted to say that I’ve truly
    enjoyed browsing your blog posts. In any case I will be subscribing to your rss feed and I hope
    you write once more soon!

Leave a Reply

Your email address will not be published. Required fields are marked *

Are You Human? * Time limit is exhausted. Please reload CAPTCHA.