Ensuring you have a healthy server infrastructure is a full-time job. There are tasks that should be performed on a daily, weekly, and monthly basis to keep everything up and running.
Use our server maintenance checklist to maintain a healthy and reliable Windows Server infrastructure. Download the excel checklist at the end of this post.
Antivirus & EDR Software
Antivirus is a critical security component for your Windows Server. Make sure it is updated daily. You can check on this daily or weekly, but verifying your antivirus or EDR software is updating on a regular basis is very important. Most antivirus software has a GUI management tool to review clients that are not updating.
If you perform nightly, weekly, or monthly backups, verify your backups are successful. Backups are critical in insuring that you can recover in case of a failure. Create alerts in your backup software to send you an email if a backup fails.
Consider implementing the 3-2-1 backup strategy. The 3-2-1 backup strategy means having at least 3 copies of your data. Two of these can be onsite and one offsite. The two onsite should be on different forms of media.
Perform backup integrity checks whenever possible.
Monitor Server Services
Actively monitor your servers and network. Monitor critical server services. If your infrastructure is cloud-based, monitor resource consumption.
Investing in server and network monitoring software helps with being proactive in supporting your infrastructure. You can monitor in real-time to ensure high availability and to address issues quickly.
Check for hardware errors on critical server infrastructure including network equipment. Check RAID alarms.
Active Directory Replication
If you utilize Active Directory, check logs and replication often. You can check replication health by using the repadmin command. AD replication issues can cause all kinds of weird authentication issues so it is critical that replication is working without errors.
Weekly Server Checklist
Check Server Resources
To ensure the best performance and to avoid outages, check resource usage weekly. This is especially critical for high-availability server infrastructures. You should review the following server resources:
- Disc space
Recommended tools to monitor server resources.
Checks logs in Event Viewer including application, security, and system logs for any critical errors.
If you are using Active Directory, check DNS logs to monitor for any suspicious activity or latency. Check DHCP logs for failed leases or depletion of IP pools. Review security logs for logon failures, bad password attempts, and account lockouts. Also, review privileged and sensitive groups for any changes in access.
If you have a large server infrastructure, consider implementing a centralized logging server and a log analysis tool.
Yes, I have backups listed multiple times in the checklist. Backups are that important. You should check them daily and also do a weekly review. It is also a good idea to test restoring a VM to a sandbox environment each week.
Monthly Server Checklist
Windows Security Updates
Scan servers regularly for missing Microsoft patches. Microsoft releases patches on the second Tuesday of every month, also known as Patch Tuesday. These patches are critical for your servers. They address security vulnerabilities as well as software updates.
You can automate patching using Windows Server or there are a ton of third-party software solutions to help you automate installing patches.
Patch 3rd Party Software
Not only are Windows Updates critical for your server, but upgrading and patching 3rd party software that is installed on a Windows server is crucial. Always verify with the software vendor that your Windows Server version, including service packs, is compatible with the newer version.
Scan Servers for Vulnerabilities
Scan servers and networks for vulnerabilities monthly. There are several software solutions on the market to help identify vulnerabilities in your infrastructure. The Nessus scanner is a great tool for vulnerability management and works with Windows servers and Linux operating systems.
Reset Admin passwords
If you use local administrative accounts, consider resetting passwords on a monthly or quarterly basis. Also, consider using Microsoft LAPS which will set a unique password for each local administrator account.
Check battery utilization and voltage on UPS equipment. Perform a self-test on the UPS if it has one. If possible, unplug the UPS to truly test its performance. Also, download the manufacturer’s software, if available, to monitor your UPS remotely.
Audit your user/admin accounts regularly, especially those with passwords that must be set to never expire. Scan for inactive user accounts in Active Directory and disable them. Accounts that are not being used and are still active are a security concern.
Check Server Uptime
Check server uptime to determine that resources are consistent and reliable. Server reboots typically occur after installing windows updates but there are times when it is skipped. It’s helpful to run a script or tool to check windows server uptime on all servers. This will help you and your team verify the uptime and last boot time of your servers.
Log and Temp Files
Routinely delete temp files to clear up space in C:\windows\temp and any other temp locations you may have defined. Purge any log files that are no longer needed.
Domain Controller Health Check
If you have domain controllers you should run an Active Directory health check on them once a month. Active Directory is a critical service and you want to ensure the domain controllers are healthy.
DNS is often the root cause of many network problems. It’s a good idea to check your DNS servers for any issues and remove any stale resource records.
Active Directory Cleanup
It is important to run an Active Directory cleanup each month to delete users and computers that have been disabled for 90 days (or according to your company policy). Disabled accounts can be a security concern and also cause a headache for asset management.
Audit Active Directory Permissions and Group Memberships
Audit privileged group membership and remove users who do not need access. Audit permissions to file shares with sensitive or privileged data and make changes if needed. Always apply the principle of least privilege (PoLP) when giving access to data and resources.
Check out the AD Group Membership Report Tool to easily audit Active Directory groups and group members.
Restore Integrity Tests
Test restoring data and servers monthly or quarterly. Having a restore plan that has been tested will ensure a successful restore in the event of a failure. Depending on your backup infrastructure, create a test environment or sandbox to restore your servers and data routinely. Document the restore process.
Folder Permissions Audit
You should review folder security permissions on a quarterly basis to ensure the security of your files and folders. You can use PowerShell scripts or tools like the NTFS Permissions Reporter to easily generate NTFS security permissions reports.
Review Group Policies and settings. Group policy can control critical policy settings such as passwords, audit logs, application settings, and client operating system settings. It is important that group policy changes are logged and reviewed on a quarterly basis.
SSL certificates are critical and if they expire, they can cause outages to critical applications and services. Create a list of your organization’s SSL certs and review them often.
Firmware and BIOS
Check firmware and BIOS versions on your equipment. Upgrade to newer versions when available, after confirming they are compatible with all equipment in your environment. Always follow instructions carefully when upgrading. Take a backup of the configuration before upgrading firmware and store that configuration in a place you can access in case you need to restore to the device. Plan for downtime if needed.
Cleanup Server Room and Equipment
It is always a good idea to look over your equipment, even when it’s in an environmentally controlled room. Dust will still collect on the components and can cause equipment to overheat. Clean your equipment when needed.
Have your air conditioner serviced at least once or twice a year. If you have temperature monitoring equipment, check to make sure it is working correctly.
If you use fire suppression systems, have them checked once or twice a year.
If you have a generator, test it at least once or twice a year to confirm it is working correctly.
Annual Server Checklist
Review Hardware and Software Maintenance Contracts
Review hardware and software maintenance contracts yearly. Make a note of when they are due for renewal or set them to auto-renew. Contracts should be kept up to date, so you can update the firmware, and software, and get service on your equipment.
Network Penetration Testing
Performing a network penetration test can help identify vulnerabilities in your infrastructure. Cyberattacks and data breaches are major concerns for many organizations. A Pen-Test (network penetration test) can identify areas where you need to improve your security. This is beneficial in many ways:
- It gives you an idea of how vulnerable your organization is to attacks.
- It can help an organization prioritize where they need to spend money to secure the infrastructure.
- Having a third party do the Pen-Test gives you an unbiased and fresh pair of eyes on your infrastructure.
Test Disaster Recovery Plan
Review and test your disaster recovery plan at least once a year. You need to be confident you can recover from disasters such as equipment failures, natural disasters, or cyber attacks. The only way you can be sure your recovery plan and backups work is by testing them.
Windows Server Maintenance Checklist Template
Here is a simple excel template you can download and modify as needed. If you have multiple admins you can assign owners to different tasks.
Did I miss anything? Let me know in the comments below.