Sightline's Comments on the Agent vs Agent-less Monitoring Debate

If the Agent-based or Agent-less debate has got you down, why not use both? For over fifteen years, Sightline Systems has delivered industry-leading agent-based data collection capabilities for mission-critical IT systems. This detailed data is critical for forensic root cause analysis and detailed capacity planning activities. But what about when this level of data collection is not required? Sightline Systems has the answer — our Agent-less data collection product allows you to collect data from systems without the need to deploy local agents.

Need high-level performance data quickly from a farm of servers? No problem!

Want to gather performance metrics from network or storage devices? We have you covered!

Our new framework offers the ability to take advantage of a variety of collection options without additional software installations. Agents are and will continue to be the industry standard mechanism for the collection of performance data on mission-critical systems. They offer security and availability that cannot be matched by Agent-less collection products. So end the debate — you can have your cake and eat it, too! Contact Sightline Systems today and enjoy the best of both worlds. (703) 563-3000

Installing and Updating the Power Agent on Windows Systems

Sept30

Creating a Response File

We have had several inquiries about silent installs and updates for the Sightline Power Agents. Silent installs and updates for Windows Power Agents are accomplished by creating a response file, documented here.

The Problem With Ghost Servers

Ghost servers are a growing problem for enterprise infrastructures. The term, which denotes unused or underutilized servers, has grown recently as companies perform cost, security or performance plans for their big networks. Back in 2013, a BizTech article entitled The IT Monster in the Closet: Ghost Servers cited an industry estimate that ghost servers may make up to 15 percent of the enterprise marketplace. With the growth of global enterprise networks over the last two years, 15 percent might be a conservative estimate in 2015.

In terms of cost, ghost servers consume power and require cooling. Typically the use of these systems have either have been deprecated but not removed from the network because they might still run one or a few VMs or other applications. Time and power equals money and while ghost servers might not break the bank, finding, removing or optimizing them can make IT teams gain visibility when companies are looking to lower IT costs.

Having any unmonitored systems attached to a secure network always increases risk. Ghost servers might not directly contribute to a significant rise in security breaches but they could, at least in theory without monitoring, supply hackers with a potential backdoor into your secure network. Back in 2006, Computer World reported that such a server was used to hack into Ohio University’s alumni database and obtain 170,000 social security numbers and personal information. How’d it happen? The IT team thought the server was offline.

Now for the good news. Ghost servers might also provide a vital and inexpensive way to increase infrastructure performance for teams looking for more processing power on-premises. Applications that might consume significant CPU or memory resources on a multi-VM servers might be moved to a ghost server or teams can increase performance by adding such servers to their load balancing activities.

So where do you start? First, teams need to find the servers and use a monitoring solution such at Sightline’s EDM to look at all the servers on a network to see which servers are on, operational and what applications are running. For systems that are powered on but idle, teams can use EDM to create groups of potential ghost servers to watch.

Next, teams need to discover when and how those servers have been used and resolve ownership of them. This might be the hardest part since teams might not have access to that data. Yet, it’s a vital way to ensure IT tranquility by avoiding the potential for shutting down a seldom used but vital server or VM.

Once identified, teams will need to figure out of the server is needed for another task or should be removed from your network. Regardless of the decision, optimization or offline, identifying ghost servers should be a priority for any company looking to increase performance or ensure additional security for their infrastructure.

VMware released vCenter 6.0 in April 2015. Like many IT professionals, we were interested in seeing what changes were made. After we upgraded to vCenter 6.0, we discovered that while it was more locked down, its shell could still give us more access.

While Sightline can monitor vCenter, ESX hosts and VMs agentlessly, our Power Agents offer a lot more data about what’s going on inside VM’s (mainly process level information), including the vCenter appliance. In fact, Power Agents included with Enterprise Data Management provide you with the real-time data you need to make smarter, more cost effective decisions. EDM is an award winning platform for managing the continuous stream of time series data that is being produced and will help you:

  • Monitor systems
  • Analyze trends and patterns
  • Diagnose costly issues quickly
  • Reduce cost
  • Conduct root cause analysis
  • Automate capacity planning

These are steps you can take to access, and add a firewall port exception to the built-in firewall that comes with the appliance.

VMware, of course, provides instructions on how to manipulate the firewall. But it only allows adding an ip, or ip range to the allowed list of systems that can communicate with vCenter.

In short, it doesn’t allow you to open a port. That was a problem since our Power Agent uses port 1645 for communicating and sending detailed performance data back to our analytics engine. We needed to open that port and that proved to be harder than we thought.

Adding a Port to vCenter:
1) First, you’ll need console access. This presents a familiar screen for admins who have accessed the ESX server consoles before. This is new for the vCenter 6.0 appliance.

vm1

2) Here, you’ll want to navigate to a hidden screen by pressing ALT+F1. Then, you’ll get this login screen:

vm2

3) Here, login with admin credentials and you’ll get a list of help commands.

4) Now, run the following:

vm3

After running “shell.set –enabled True” and “shell”, you’ll get a standard Linux-style prompt.

There is a warning about using the pi shell, and it’s only for advanced troubleshooting. As such, continue at your own risk.

5) Navigate to /etc/vmware/appliance

Here is where you can add custom firewall port changes in the services.conf.

vm5

6) WARNING: Initially, we tried to add a new group to the json in service.conf, and we ended up losing SSH access to the VM. It seems that VMware has a hardcoded limit of 4 rules. Adding a 5th seems to bump the first out.
7) To get around this, we just added our rule to the ssh rule.
run “vi services.conf”

8) We added a comma, and then the section in red.

vm6

9) Then, reload vSphere vCenter 6.0 Appliance FW rule by executing:
/usr/lib/applmgmt/networking/bin/firewall-reload
or simply reboot the vm.

After we rebooted… we could now access our performance monitoring tool on port 1645.

Do You Need an Alerts Team?

alerts team

If you’re part of a team that’s involved with increasing or monitoring system, component or infrastructure performance, you already know about the “alert flood.” It’s that constant deluge of emails to notify you that the systems you or your team is responsible for has an issue. And thankfully, we’re working on decreasing that flood in our next release of EDM.

The flood part is because there are so many alerts. It’s a problem that we often hear from teams. To solve it, many choose to simply filter their reports, choose to not receive them, delete them or change their monitoring thresholds to receive less alerts.

Yet, those alerts may be part of a company’s SLAs or other agreements. When the important alerts happen, IT staff might simply miss the message. These alerts are important and in cases where there’s an internal review, it might come down to who was alerted and who was required to react. The results of that review might not be good for individuals who were required to react.

One idea to minimize that problem might be to create an Alerts Team to not only manage alerts but also set the rules on what’s monitored, develop new thresholds based on accumulated historical data and use industry or company best practices to minimize the flood to something more akin to a kiddie pool. Plus, as components are added, the Alerts Team can set the rules based on their expectations and not simply rely on the monitoring solution’s default settings.

We’ll agree that alerts can be a necessary evil. Yet, when one user changes the thresholds in order to minimize their alerts — it might be a mandated threshold alert for another staff member. For companies with a wide variety of operating systems, storage solutions and other devices, team members should reflect experts with knowledge of each system. For instance, few would want a Windows expert to set operating thresholds for a Linux server.

The concept is simple, with expertly-set thresholds the Alerts Team can keep alerts to a minimum and deliver the right message to the right person. For larger organizations, the establishment of an Alerts Team can help experts understand their role in the overall infrastructure and limit the amount of alerts going to the teams.

Virtual Appliances Made Easy With Oracle Linux

0811_3It’s no surprise that most of today’s IT world lives within a virtual computing environment.  The ability to cut costs, save energy, and reduce hardware footprint are just a few of the many advantages of being virtual.  However, the pains and headaches of installing operating systems and other various software applications still persist.  Enter the beauty of the virtual appliance.

A virtual appliance is a pre-configured, self-contained virtual machine that typically includes a pre-installed minimal operating system along with other desired software applications.  Virtual appliances are usually exported as an OVF (Open Virtualization Format) file.  This file can be re-deployed to an existing virtual environment, then simply turned on for use.

During the build process of our Sightline EDM virtual appliance, Oracle Linux was selected for several benefits including:

  • Free and redistributable operating system
  • Reliable, tested operating system from a trusted company
  • Option for enterprise-class support at a significantly lower cost from Oracle if needed
  • Small footprint with a minimal OS install
  • Quick operating system boot, which leads to faster Sightline software startup.

Many obstacles were presented while building the Sightline appliance. However, the finished product made the journey well worth the time.  Creating the appliance not only involved installing Oracle Linux and our software, but also included a number of system configurations,  such as:

  • Setting the software services to start up in a certain order on system boot.
  • Opening firewall ports.
  • Adding a few simple scripts to pre-populate files before being read on system boot.

Different applications will require different needs from the system, so not all journeys will share the same path when creating an appliance. Some paths may be harder than others, but in either case, the finalized appliance can be extremely beneficial to both vendors and their customers, as the appliance can:

  • Simplify deployment – users won’t have to worry about resolving any potential errors from installing all the required components needed for the software application to function. The appliance can simply be deployed and started without the hassle.
  • Become an excellent selling tool – the last thing a salesperson needs is to watch the customer run into a problem with installation. Even with a sales engineer by their side to help fix the problem, the situation can leave an embarrassing mark for a company. The ease of deployment through an appliance allows the salesperson to focus on what they do best: present and sell a good product from a team that knows what it is doing.
  • Reduce customer costs – since the appliance comes pre-installed with a redistributable operating system, customers won’t have to worry about using or obtaining any extra licenses to stand up a clean OS to install a vendor product.

With the creation of any virtual appliance, there’s a great deal of planning and strategy in order to ensure a great product presentation. At the conclusion of the project, Oracle Linux was clearly the right choice. Not only did it give us a free and distributable operating system — it answered our customers’ needs for support, reliability and dependability.

Using Root Cause Analysis to End Your Next Infrastructure Fire Drill

firedrillImagine that you’re in a house and the room next to you is burning. You would never simply close the door to the room and go about your business, because you know eventually that the fire will consume the house. The obvious reaction is to put out the fire. Yet, when companies see a crucial part of their infrastructure on fire, many times they simply close the door and go about their business.

It’s an overly simplified analogy but in today’s enterprise networks, their sheer complexity not only makes it hard to put out a fire, it often makes it hard to tell where the fire is. Teams do their best by making educated guesses with a preference for operational uptime and performance – goals to ensure that the least amount of users are affected by the fire. The solutions crafted by some teams often do minimize the problem but don’t fix them.

Back to the fire analogy, teams might close the door to the room next door but need to access the room on the other side of it. When a part of an organization’s infrastructure goes down, IT teams are asked to ensure that the least amount of end-users are affected. So they create pathways to circumvent the room, or build additional rooms to get around the fire. In the real world, that could mean adding more capacity with new servers, expensive emergency services and engineering untested solutions that lead to more complexity to an infrastructure that still has a burning room within it.

The better answer is in root cause analysis (RCA), a methodology that looks at current and historical infrastructure data to set a benchmark for how everything should run in a stable environment and using analysis to tell IT teams where the problem is within their infrastructure.

With a powerful root cause analysis solution such as EDM, IT teams spend less time searching for the problem, less effort on guessing how to minimize the issue and less money on trying to circumvent the problem. We believe that teams should spend more time fixing an infrastructure problem the first time and less time searching for it. Root cause analysis enables teams to keep the infrastructure they have, open the door and simply put out the fire.

Interested in finding more about automating root cause analysis? Read our ebook that details the five steps proactive IT operations teams take to leverage anomaly detection and event correlation to reduce mean time to resolution.

Configuring Array Alerts in EDM

The ability to set alerts is a powerful feature that allows you to monitor your systems for abnormal events or behavior without actually seeing the event occur.

An alert can be configured for all subscripts of a single metric. This eliminates the need to create an alert for each subscript of the array. For example, a disk array may have several subscripts, and configuring individual alerts would be tedious and time-consuming, as well as error-prone. In addition, subscripts may be different on different systems. Thus, the ability to configure an array alert that can be applied to multiple systems provides many benefits.

Click here to read more about configuring array alerts.

What Can EDM Do for Me?

If you’re a long-time EA/V user, you might be asking yourself that question right now. As we know, EA/V is a phenomenal tool. We’re not suggesting that you get rid of it, but with the addition of EDM into your environment you can gain some incredible capabilities. Below is just a sample of what I am talking about.

  • Web-based Access
  • EDM is a Java-based application, so it’s available from any web browser. Whether you’re in the office, at home, or on the road, you can user your desktop, iPad, or mobile device to look into the production of your IT environment. This also answers the question on how to easily share reports with management and others. All you need is your EDM login and Internet access and you can create, view or manipulate your reports from anywhere. Agent administration is easier, too. It’s no longer necessary to access each system to update the Power Agent’s configuration file; just access it directly from EDM.

  • Create a multi-platform world view
  • With EDM you can look at your environment in a single pane of glass. View your world as a whole or look at individual systems or groups. Perform deep dive analysis and look into systems, workloads or processes. And then send the URL to your team so they can see the same thing that you do.

  • Alert Tracking
  • When an alert is triggered in your environment, EDM doesn’t just send the notification and forget about it. EDM keeps track of your alert, so that you can see how long they’ve been active and when they were resolved. In EDM you can assign alerts to a user, add your resolution notes or simply see which alerts are active most often. And assigning alerts is easy in EDM, too, using alert groups and connection templates!

  • Scalability
  • EDM is designed to scale to the largest environments. Because EDM is a Java-based application, it scales it to whatever size your environment may be.

  • Clairvor for Automated Root Cause Analysis
  • Clairvor should be the determining factor when looking at EDM. Clairvor is an EDM-specific tool that lets you go from triggered alert to issue diagnosis in a matter of minutes. With Clairvor you can quickly get to the root of an IT issue, and reduce system outages, downtime, slow response time, etc. Clairvor gives you the power to correlate, drill down and visualize in a matter of minutes, all through our automated RCA process. In a test environment, Clairvor can give your team the ability to predict abnormal occurrences and have the solutions ready. Clairvor is proven to reduce your downtime when it matters most, and by doing this Sightline is able to save you money in lost labor, lost productivity and most importantly lost revenue.

Would you like to see EDM in your environment? Contact us today!

Sightline Clairvor Providing Clarity Into IT Like Never Before!

Root Cause Analysis (RCA) is a problem-solving methodologyfocused on identifying the root cause of a fault or problem, rather than its symptoms. Currently, many IT professionals use a combination of cause-and-effect determination and thetest-and-retry method to identify and resolve problems in the IT environment.
Clairvor_Img1-300x227

Problem analysis and resolution can go something like this: an issue is detected, the IT team investigates, a suspect system or application is identified and the owner is contacted; the owner denies that there is a problem; the above steps are repeated. Finally, the issue is identified, resolved, and documented. Going through these steps is like finding a needle in a haystack. It is time-consuming, frustrating, and ultimately costs money in both resource hours and service disruption–a critical system or application issue or outage can cost tens of thousands of dollars.

Clairvor helps you analyze the situation and quickly diagnosis the issue before the impact is felt and outages occur. Clairvor helps ensure the stability of your IT environment in three easy steps—going from alert to diagnosis in minutes and ensuring your IT team resolves issues before the impact is felt.