Thursday, April 9, 2020

Top 10 Log Sources You Should Monitor in your Environment


here are literally hundreds of possible types of log sources around your environment and choosing which bubble to the top of your IT consciousness can be difficult. In a job where everything seems to be a top priority, understanding all the log types and sources available for selection can be daunting. In your environment, some logs may be more valuable than others, but having general guidance about logging and what types of logs may be available to monitor can help make you a better technologist.

There’s no way we could ever think to cover every possible source of logs, but let’s start with some of the classics and go from there.

1 – Infrastructure Devices
These are those devices that are the “information superhighway” of your infrastructure. Switches, routers, wireless controllers, and access points can be teased to provide logging information about the health and state of your environment. The logs can provide insights ranging from wireless AP hopping to hardware failures. Probably most impactful to your environment are notifications of configuration changes. Knowing who changed what and when can help you diagnose and recover from any misconfigurations.

2 – Security Devices
As organizations push towards a cloud-first methodology, the edge devices in your environment can become even more vital to your business. Your firewalls and other security devices are handling more and more traffic as loads are shifted to cloud infrastructures. The logs on these security devices can provide a plethora of interesting information—not least is blocked traffic, health of the VPN, intrusion detection and prevention systems, and unusual user activity. These Security Information and Event Management (SIEM) logs may be your first defense in understanding an attack or isolating an anomaly in your user experience.

3 – Server Logs
It may go without saying, but I’m going to say it anyway: server logs can offer abundant information about the state of your environment. Windows and Linux servers are constantly pumping out logs that give you an understanding of how and why systems are behaving the way they are. There are literally hundreds of thousands of events that can trigger within an operating system and its associated applications. Knowing which log events are frivolous and which require immediate action is a skill honed on the battlefield. Regardless, you shouldn’t overlook server logs as a viable source of information.

4 – Web Servers
Yes, I’m aware that capturing web server logs can be construed as a tedious process, but it is one of the best ways, if not the best way, to understand how end users interact with your web properties. IIS, Apache, Tomcat, Web Sphere, NGINX, and every other web engine out there can provide some measure of web server logging. Depending on your needs, sometimes just understanding when people are going to your site and from where can prove invaluable to understanding the needs of your customers. Unfortunately, a web server log is a common log type that can sometimes be overlooked when organizations are developing their logging strategy.

5 – Authentication Servers
Whether you use Active Directory, an implementation of OpenLDAP, or another alternative, knowing who and what is poking around your infrastructure can be key to a maintaining a good security posture. Each of your authentication servers will provide some measure of logging, but what’s key for you to is understanding what to look for. Most commonly, you should be looking for token requests, authorization revocation, and authentication failures. These types of logs can aid in determining failing logins due to account expiration, isolate the source of a potential attack, and pinpoint problem areas that need to be addressed.

6 – Hypervisors
Hypervisors can let us IT professionals do our jobs better by balancing workloads and utilizing resources more efficiently. Clusters can now run hundreds, if not thousands, of simultaneous workloads.  However, much of the work associated with hypervisors is behind the curtain, and you never get to see the wizard. Your hypervisors are juggling all the time—allocate resources from this virtual machine to this one, move the storage from this cluster node to this other one, shift this entire virtual machine to another node—and it’s a precarious balance. Capturing and monitoring hypervisor logs can be one of the best ways to understand what your hypervisors are doing when you aren’t watching.

7 – Containers
Although relatively new compared to most other log types on this list, containers are becoming more and more business critical. Extrapolating to a higher-level would be container management services like Kubernetes, Docker Swarm, and Apache Mesos. These services are like hypervisors in many ways, but just different enough to warrant a separate category. Understanding why the host felt it was necessary to drop back your scaled-out deployment from eight endpoints to only four would prove useful in diagnosing and tuning. Most of this information is located only in the container logs, so make sure that you get them.

8 – SAN Infrastructure
This may seem an odd addition to this list of the best log types to monitor because of the IT trend to move towards a more hyper-converged infrastructure or moving everything to the cloud, but it’s something that’s frequently overlooked. If your fibre switch loses connectivity to a server-side transceiver, then that data is no longer available to that server. In today’s world, there are normally redundant pathways so that connectivity is not truly lost, but the scenario still applies in a multi-path environment. Say you have four connections from your server to your SAN infrastructure, but after a series of unfortunate events over several months, three of them have failed. This means that you have restricted data movement by 75%. You’ve not encountered a failure in the traditional sense, because the connectivity still exists, and data is moving, but with performance hampered this badly, is it any wonder end users are complaining? In my opinion, this is one of the top overlooked log sources.

9 – Applications
This applies to pretty much any application log. Although some software applications will leverage the operating system’s existing logging functionality for log management, these are becoming fewer and fewer. Most critical logs for applications are stored in flat files on your disks somewhere. Often, these logs are used by your application support people for troubleshooting, but what about multitier applications? If you have a front-end, middleware, and back-end deployment, each may collect logs slightly differently. Make sure you aren’t sleeping on collecting and monitoring these logs—from each tier—and getting them into a system so that you can compare transactions by lining up the timestamps.

10 – Client Machines
Yes, really. In IT, a common trope is to blame the end user, but sometimes it’s not their fault.  Sometimes it’s the fault of the endpoint itself. I’m not saying that every log on every machine needs to be collected all the time—in fact, I’m saying that you should probably not do that, but selective log collection from endpoints can be critical in gaining a larger grasp of the scope of the problem. This is probably the most overlooked log type needed for actively troubleshooting issues.

Everything Else
There are additional log sources that I’ve neglected, like proxy servers, load balancers, and cloud management systems, to just name a few, but this isn’t meant to be an exhaustive list. Hopefully, after reviewing these ten log types, you gain a little perspective into what would be relevant for your situation. It’s also something to keep in mind as new hardware and software enters your infrastructure.

Whether you choose one, all, or none of these as potential log sources to monitor is dependent on your exact needs. Simply thinking about what types of monitoring or log analysis tool you need moving forward could help you choose those relevant to your situation. Every bit of information can help you gain a deeper understanding of your infrastructure and how to best handle its care and feeding. Remember, it’s not if something will go sideways, it’s when. Having the best log types to back up your decision-making can be a welcome tool in your IT arsenal.

Why You Should Monitor Windows Event Logs for Security Breaches
The ability to create custom views is only useful if you know what events might indicate an attempt to compromise your systems or an unsanctioned configuration change. In this Ask the Admin, I’ll outline some of the most important events that might indicate a security breach.

Change Control and Privilege Management
Before data in the event logs can become truly useful, it’s essential to exercise some governance over your server estate and establish who is allowed to change what, where, and when through tested business processes. When change control is implemented alongside privilege management, not only can you be more confident in maintaining stable and reliable systems, but it will be easier to identify malicious activity in the event logs.

The information in this article assumes that auditing has been configured according to Microsoft’s recommended settings in the Window Server 2012 R2 baseline security templates that are part of Security Compliance Manager (SCM). For more information on SCM, see Using the Microsoft Security Compliance Manager Tool on the Petri IT Knowledgebase.

Account Use and Management
Under normal operating circumstances, critical system settings can’t be modified unless users hold certain privileges, so monitoring for privilege use and changes to user accounts and groups can give an indication that an attack is underway. For example, the addition of users to privileged groups, such as Domain Admins, should correspond to a request for change (RFC). If you notice that a user has been added to a privileged group, you can check this against approved RFCs.

The Event Viewer User Account Management and Group Management task categories. When auditing is enabled on a member server, changes to local users and groups are logged, and on a domain controller changes to Active Directory. To enable auditing for user and group management, enable Audit Security Group Management and Audit User Account Management settings in Advanced Audit Policy. For more information on configuring audit policy, see Enable Advanced Auditing in Windows Server on Petri.



Additionally, you should check for the events listed in the table below:

Event Log              Level      ID            Error Name           Source
Security Informational        4740       Account Lockouts Microsoft-Windows-Security-Auditing
Security Informational        4728, 4732, 4756 User Added to Privileged Group       Microsoft-Windows-Security-Auditing
Security Informational        4735       Security-Enabled Group Modification             Microsoft-Windows-Security-Auditing
Security Informational        4724       Successful User Account Login          Microsoft-Windows-Security-Auditing
Security Informational        4625       Failed User Account Login Microsoft-Windows-Security-Auditing
Security Informational        4648       Account Login with Explicit Credentials          Microsoft-Windows-Security-Auditing
Application Hangs and Crashes
Frequent application hangs on crashes can indicate an attempt to disrupt service and other kinds of attack. As such, it’s prudent to monitor line of business applications for disruptions. Check the Application log for the following event IDs:

Event Log              Level      ID            Error Name           Source
Application            Error       1000       App Error              Application Error
Application            Error       1002       App Hang               Application Hang
Application            Informational        1001       WER        Windows Error Reporting
System   Error       1001       BSOD      Microsoft-Windows-WER-SystemErrorReporting
Event Logs and Audit Policy
If someone has cleared the event logs or changed audit policy, there’s a good chance that they’ve been trying to cover their tracks. As such, any such behaviour should ring alarm bells:

Event Log              Level      ID            Error Name           Source
System   Informational        104         Event Log was Cleared       Microsoft-Windows-EventLog
Security Informational        102         Audit Log was Cleared        Microsoft-Windows-EventLog
System   Informational        4719       System audit policy was changed     Microsoft-Windows-EventLog
Group Policy and Windows Firewall
Configuration settings are usually managed on workstations and servers using Active Directory Group Policy, so any failure to apply policy or make unsanctioned changes to policy objects in AD could indicate a security issue. Additionally, Windows Firewall provides an important line of defense, and any changes to firewall rules could signal an attempt to gain additional access to systems.

Event Log              Level      ID            Error Name           Source
System   Error       1125       Internal Error       Microsoft-Windows-GroupPolicy
System   Error       1127       Generic Internal Error        Microsoft-Windows-GroupPolicy
System   Error       1129       Group Policy Application Failed due to Connectivity    Microsoft-Windows-GroupPolicy
Windows Firewall WithAdvancedSecurity/Firewall      Informational        2004       Firewall Rule Add Microsoft-Windows-Windows FirewallWith Advanced Security
Windows Firewall WithAdvancedSecurity/Firewall      Informational        2005       Firewall Rule Change          Microsoft-Windows-Windows FirewallWith Advanced Security
Windows Firewall WithAdvancedSecurity/Firewall      Informational        2006, 2033            Firewall Rules Deleted        Microsoft-Windows-Windows FirewallWith Advanced Security
Windows Firewall WithAdvancedSecurity/Firewall      Error       2009       Firewall Failed to load Group Policy Microsoft-Windows-Windows FirewallWith Advanced Security

Protecting Your Business From Your Remote Employees

A significant portion of your workforce is currently moving to perform full- or part-time remote work as a result of COVID-19.  As you modif...