Setting Your Data Retention Policy

An important consideration in performance management and capacity planning is your data retention policy. Some data consumes a tremendous amount of space and loses its value quickly, such as process data. The granularity of data also loses its value over time. For example, for analyzing bottlenecks, detailed performance and process data is quite useful. For Service Level Reporting, a small subset of data is required and no process data is needed. For trending and capacity planning, only highly summarized workload and resource utilization data representing peak periods is necessary.

A sound data retention policy provides for storing different types of data for different periods. It may be advantageous to store detailed performance and process data for two weeks to one month, while data summarized hourly will be stored indefinitely. It may also be desirable to store daily capsules of peak demand periods that contain fine granularity and process information. For example, month-end processing demand may be worth storing permanently as a baseline of peak demand. Government regulations may also specify data retention rules.

The capacity to store collected data directly on the monitored system in a host trace file (HTF) provides a temporary store of data which facilitates flexible strategies for downloading data to the central Sightline server. Sightline Enterprise Data Manager (EDM) or Expert Advisor/Vision (EA/V) acts as the central management console within the Sightline deployment. It aggregates, manages, displays and analyzes heterogeneous information from networks, platforms, operating systems, databases and applications. Your data storage strategy is implemented at this level of the Sightline deployment. This offloads a significant portion of performance management overhead.

By default, individual HTFs are uploaded or transferred to the Sightline server, where data is maintained at the raw data collection interval. Summarization can then be applied, allowing for long-term storage of data. Using Sightline EDM, the connection template allows you to set your data retention policy once, and then apply it to all of your new and existing connections. What’s our recommendation? We start with these settings:

  • Live data: 14 days at the data collection interval, including process-level data
  • Archive data: 32 days of daily archives, including process-level data
  • 10-minute summary: 90 days, process data is not included. In EDM, this is often used for visualization.
  • 30-minute summary: 13 months, process data is not included. This long-term data is used for visualization, trending and forecasting.

Keep in mind that forecasting requires at least twice the amount of data as the intended forecast. For example, a one-month forecast requires at least two months of input data. As with many things, more is better! The more historical data you can put into the forecast, the more accurate the forecast will be.