here are literally hundreds of possible types of log sources
around your environment and choosing which bubble to the top of your IT
consciousness can be difficult. In a job where everything seems to be a top priority,
understanding all the log types and sources available for selection can be
daunting. In your environment, some logs may be more valuable than others, but
having general guidance about logging and what types of logs may be available
to monitor can help make you a better technologist.
There’s no way we could ever think to cover every possible
source of logs, but let’s start with some of the classics and go from there.
1 – Infrastructure
Devices
These are those devices that are the “information superhighway”
of your infrastructure. Switches, routers, wireless controllers, and access
points can be teased to provide logging information about the health and state
of your environment. The logs can provide insights ranging from wireless AP
hopping to hardware failures. Probably most impactful to your environment are
notifications of configuration changes. Knowing who changed what and when can
help you diagnose and recover from any misconfigurations.
2 – Security Devices
As organizations push towards a cloud-first methodology, the
edge devices in your environment can become even more vital to your business.
Your firewalls and other security devices are handling more and more traffic as
loads are shifted to cloud infrastructures. The logs on these security devices
can provide a plethora of interesting information—not least is blocked traffic,
health of the VPN, intrusion detection and prevention systems, and unusual user
activity. These Security Information and Event Management (SIEM) logs may be
your first defense in understanding an attack or isolating an anomaly in your
user experience.
3 – Server Logs
It may go without saying, but I’m going to say it anyway:
server logs can offer abundant information about the state of your environment.
Windows and Linux servers are constantly pumping out logs that give you an
understanding of how and why systems are behaving the way they are. There are
literally hundreds of thousands of events that can trigger within an operating
system and its associated applications. Knowing which log events are frivolous
and which require immediate action is a skill honed on the battlefield.
Regardless, you shouldn’t overlook server logs as a viable source of
information.
4 – Web Servers
Yes, I’m aware that capturing web server logs can be
construed as a tedious process, but it is one of the best ways, if not the best
way, to understand how end users interact with your web properties. IIS,
Apache, Tomcat, Web Sphere, NGINX, and every other web engine out there can
provide some measure of web server logging. Depending on your needs, sometimes
just understanding when people are going to your site and from where can prove
invaluable to understanding the needs of your customers. Unfortunately, a web
server log is a common log type that can sometimes be overlooked when
organizations are developing their logging strategy.
5 – Authentication
Servers
Whether you use Active Directory, an implementation of
OpenLDAP, or another alternative, knowing who and what is poking around your
infrastructure can be key to a maintaining a good security posture. Each of
your authentication servers will provide some measure of logging, but what’s
key for you to is understanding what to look for. Most commonly, you should be
looking for token requests, authorization revocation, and authentication
failures. These types of logs can aid in determining failing logins due to
account expiration, isolate the source of a potential attack, and pinpoint
problem areas that need to be addressed.
6 – Hypervisors
Hypervisors can let us IT professionals do our jobs better
by balancing workloads and utilizing resources more efficiently. Clusters can
now run hundreds, if not thousands, of simultaneous workloads. However, much of the work associated with
hypervisors is behind the curtain, and you never get to see the wizard. Your
hypervisors are juggling all the time—allocate resources from this virtual
machine to this one, move the storage from this cluster node to this other one,
shift this entire virtual machine to another node—and it’s a precarious
balance. Capturing and monitoring hypervisor logs can be one of the best ways
to understand what your hypervisors are doing when you aren’t watching.
7 – Containers
Although relatively new compared to most other log types on
this list, containers are becoming more and more business critical.
Extrapolating to a higher-level would be container management services like
Kubernetes, Docker Swarm, and Apache Mesos. These services are like hypervisors
in many ways, but just different enough to warrant a separate category.
Understanding why the host felt it was necessary to drop back your scaled-out
deployment from eight endpoints to only four would prove useful in diagnosing
and tuning. Most of this information is located only in the container logs, so
make sure that you get them.
8 – SAN
Infrastructure
This may seem an odd addition to this list of the best log
types to monitor because of the IT trend to move towards a more hyper-converged
infrastructure or moving everything to the cloud, but it’s something that’s
frequently overlooked. If your fibre switch loses connectivity to a server-side
transceiver, then that data is no longer available to that server. In today’s
world, there are normally redundant pathways so that connectivity is not truly
lost, but the scenario still applies in a multi-path environment. Say you have
four connections from your server to your SAN infrastructure, but after a
series of unfortunate events over several months, three of them have failed.
This means that you have restricted data movement by 75%. You’ve not
encountered a failure in the traditional sense, because the connectivity still
exists, and data is moving, but with performance hampered this badly, is it any
wonder end users are complaining? In my opinion, this is one of the top
overlooked log sources.
9 – Applications
This applies to pretty much any application log. Although
some software applications will leverage the operating system’s existing
logging functionality for log management, these are becoming fewer and fewer.
Most critical logs for applications are stored in flat files on your disks
somewhere. Often, these logs are used by your application support people for
troubleshooting, but what about multitier applications? If you have a front-end,
middleware, and back-end deployment, each may collect logs slightly
differently. Make sure you aren’t sleeping on collecting and monitoring these
logs—from each tier—and getting them into a system so that you can compare
transactions by lining up the timestamps.
10 – Client Machines
Yes, really. In IT, a common trope is to blame the end user,
but sometimes it’s not their fault.
Sometimes it’s the fault of the endpoint itself. I’m not saying that
every log on every machine needs to be collected all the time—in fact, I’m
saying that you should probably not do that, but selective log collection from
endpoints can be critical in gaining a larger grasp of the scope of the
problem. This is probably the most overlooked log type needed for actively
troubleshooting issues.
Everything Else
There are additional log sources that I’ve neglected, like
proxy servers, load balancers, and cloud management systems, to just name a
few, but this isn’t meant to be an exhaustive list. Hopefully, after reviewing
these ten log types, you gain a little perspective into what would be relevant
for your situation. It’s also something to keep in mind as new hardware and
software enters your infrastructure.
Whether you choose one, all, or none of these as potential
log sources to monitor is dependent on your exact needs. Simply thinking about
what types of monitoring or log analysis tool you need moving forward could
help you choose those relevant to your situation. Every bit of information can
help you gain a deeper understanding of your infrastructure and how to best
handle its care and feeding. Remember, it’s not if something will go sideways,
it’s when. Having the best log types to back up your decision-making can be a
welcome tool in your IT arsenal.
Why You Should Monitor Windows Event Logs for Security
Breaches
The ability to create custom views is only useful if you
know what events might indicate an attempt to compromise your systems or an
unsanctioned configuration change. In this Ask the Admin, I’ll outline some of
the most important events that might indicate a security breach.
Change Control and Privilege Management
Before data in the event logs can become truly useful, it’s
essential to exercise some governance over your server estate and establish who
is allowed to change what, where, and when through tested business processes.
When change control is implemented alongside privilege management, not only can
you be more confident in maintaining stable and reliable systems, but it will
be easier to identify malicious activity in the event logs.
The information in this article assumes that auditing has
been configured according to Microsoft’s recommended settings in the Window
Server 2012 R2 baseline security templates that are part of Security Compliance
Manager (SCM). For more information on SCM, see Using the Microsoft Security
Compliance Manager Tool on the Petri IT Knowledgebase.
Account Use and Management
Under normal operating circumstances, critical system
settings can’t be modified unless users hold certain privileges, so monitoring
for privilege use and changes to user accounts and groups can give an
indication that an attack is underway. For example, the addition of users to
privileged groups, such as Domain Admins, should correspond to a request for
change (RFC). If you notice that a user has been added to a privileged group,
you can check this against approved RFCs.
The Event Viewer User Account Management and Group
Management task categories. When auditing is enabled on a member server,
changes to local users and groups are logged, and on a domain controller
changes to Active Directory. To enable auditing for user and group management,
enable Audit Security Group Management and Audit User Account Management
settings in Advanced Audit Policy. For more information on configuring audit
policy, see Enable Advanced Auditing in Windows Server on Petri.
Additionally, you should
check for the events listed in the table below:
Event
Log Level ID Error
Name Source
Security Informational 4740 Account Lockouts Microsoft-Windows-Security-Auditing
Security Informational 4728,
4732, 4756 User Added to Privileged Group Microsoft-Windows-Security-Auditing
Security Informational 4735 Security-Enabled Group Modification Microsoft-Windows-Security-Auditing
Security Informational 4724 Successful User Account Login Microsoft-Windows-Security-Auditing
Security Informational 4625 Failed User Account Login Microsoft-Windows-Security-Auditing
Security Informational 4648 Account Login with Explicit Credentials Microsoft-Windows-Security-Auditing
Application
Hangs and Crashes
Frequent
application hangs on crashes can indicate an attempt to disrupt service and
other kinds of attack. As such, it’s prudent to monitor line of business
applications for disruptions. Check the Application log for the following event
IDs:
Event
Log Level ID Error
Name Source
Application Error 1000 App Error Application Error
Application Error 1002 App Hang Application Hang
Application Informational 1001 WER Windows Error Reporting
System Error 1001 BSOD Microsoft-Windows-WER-SystemErrorReporting
Event
Logs and Audit Policy
If
someone has cleared the event logs or changed audit policy, there’s a good
chance that they’ve been trying to cover their tracks. As such, any such
behaviour should ring alarm bells:
Event
Log Level ID Error
Name Source
System Informational 104 Event Log was Cleared Microsoft-Windows-EventLog
Security Informational 102 Audit Log was Cleared Microsoft-Windows-EventLog
System Informational 4719 System audit policy was changed Microsoft-Windows-EventLog
Group
Policy and Windows Firewall
Configuration
settings are usually managed on workstations and servers using Active Directory
Group Policy, so any failure to apply policy or make unsanctioned changes to
policy objects in AD could indicate a security issue. Additionally, Windows
Firewall provides an important line of defense, and any changes to firewall
rules could signal an attempt to gain additional access to systems.
Event
Log Level ID Error
Name Source
System Error 1125 Internal Error Microsoft-Windows-GroupPolicy
System Error 1127 Generic Internal Error Microsoft-Windows-GroupPolicy
System Error 1129 Group Policy Application Failed due to
Connectivity Microsoft-Windows-GroupPolicy
Windows
Firewall WithAdvancedSecurity/Firewall Informational 2004 Firewall
Rule Add Microsoft-Windows-Windows
FirewallWith Advanced Security
Windows
Firewall WithAdvancedSecurity/Firewall Informational 2005 Firewall
Rule Change Microsoft-Windows-Windows
FirewallWith Advanced Security
Windows
Firewall WithAdvancedSecurity/Firewall Informational 2006, 2033 Firewall Rules Deleted Microsoft-Windows-Windows
FirewallWith Advanced Security
Windows
Firewall WithAdvancedSecurity/Firewall Error 2009 Firewall
Failed to load Group Policy Microsoft-Windows-Windows
FirewallWith Advanced Security
No comments:
Post a Comment