
Good day!
Today we will look at a common question that everyone who handles logs face or is going to do it and is now looking at various processing and storage solutions. What volume of logs per day / week / month will we receive from different systems and what storage resources should we use?
It is quite difficult to say for sure, but we will try to help you to sort out the estimated volumes, based on our experience.
Our assessment method is based on the use of statistical information on the number of logs in various sources, all the values that will be presented below are average values of the results of work on various logging projects.
For example, take a few common sources:
- Windows Event Logs
- Windows domain
- Cisco ASA
- Cisco ESA
- Cisco IPS
- Cisco IOS
- Palo alto
- * nix-syslog
- MSExchange-mail
Collecting logs
Previously, we measured the average number of bytes in one event in each source. Then they calculated the approximate number of events per day that fall on one source and calculated how many logs in GB will be collected from each source from one device.
WinEventLog~ byte in event = 1150
Wed Number of events per day (lot) = 25 000
Gb / day (assign.) = 1150 * 25 000/1024 ^ 3 ≈
0.03Windows domain~ byte in event = 1150
Wed Number of events per day (lot.) = 250 000
GB / day (assign.) = 1150 * 250 000/1024 ^ 3 ≈
0.3Cisco ASA~ byte in event = 240
Wed Number of events per day (lot) = 1 600 000
Gb / day (assign.) = 240 * 1 600 000/1024 ^ 3 ≈
0.35Cisco ESA~ byte in event = 100
Wed Number of events per day (lot.) = 200 000
Gb / day (assign.) = 100 * 200 000/1024 ^ 3 ≈
0.02Cisco IPS~ byte in event = 1200
Wed Number of events per day (lot.) = 500 000
Gb / day (unit.) = 1200 * 500 000/1024 ^ 3 ≈
0.6Cisco IOS~ byte in event = 150
Wed Number of events per day (lot) = 20 000
Gb / day (assign.) = 150 * 20,000 / 1024 ^ 3 ≈
0.003Palo alto~ byte in event = 400
Wed Number of events per day (lot.) = 500 000
Gb / day (unit.) = 400 * 500 000/1024 ^ 3 ≈
0.2* nix-syslog~ byte in event = 100
Wed Number of events per day (lot.) = 50 000
Gb / day (assign.) = 100 * 50 000/1024 ^ 3 ≈
0.005MSExchange-mail~ byte in event = 300
Wed Number of events per day (lot) = 100 000
Gb / day (unit.) = 300 * 100,000 / 1024 ^ 3 ≈
0.03Further, in order to determine the volume of all logs, it is necessary to determine from which number of devices we want to collect and store information. For example, consider the case if we have 30 devices that generate WinEventLog, 1 device each - Windows Domain, Cisco ESA, Cisco IPS, Palo Alto.
1150 * 25 000 * 30 + 1150 * 250 000 + 100 * 200 000 + 1200 * 500 000 + 400 * 500 000 = 1 970 000 000 bytes / day =
1.8347 Gb / day ≈
12.4 Gb / week ≈
55 Gb / monthOf course, when using this method of calculation, a significant error may occur, since the number of logs per day depends on many factors, for example:
- Number of users and their roles
- Enabled audit services
- Required Severity Level
- And much more
Essential plus of a similar method that if there are statistical data, then the approximate volume of logs can be counted even on a napkin. Minus - possible large error. If significant deviations are unacceptable, then you can configure the download of data from all sources to the test system, for example,
Splunk provides a trial license with sufficient resources to test a large number of sources. This method gives an accurate result, but the deployment of any test systems will require time, labor and technical resources.
Data storage
Let us briefly touch upon another question on the subject of logs: how much resources will be required for their storage.
To answer this question, first of all you need to understand in what form your log processing tool stores data. For example,
ELK , along with logs, also stores information about selected fields, which can increase the volume of one event up to 3 times, while Splunk stores data just raw, compressing it further, and metadata is stored separately from events.
Then, you need to understand what period of historical data you need to store, the
"temperature" of the data, RAID, and so on. A convenient calculator can be found at this
link .
Conclusion
One of the topical issues that caused us to touch upon the volume of logs is that the Splunk license depends on the amount of data being indexed per day. If you want to use Splunk for processing your logs, then after calculating the approximate volume, you can estimate the cost of the required license. The license calculator can be found
here .
How do you estimate the volume of your logs? Share in the comments your experience, tools, interesting cases.