Delivering a Customizable, Graphical Insight into Azure VM Security, Health and Connectivity Using Several Azure Services Together

In this blog I want to walkthrough a solution I recently architected and implemented along with a two other MTC architects to deliver a solution we needed for two reasons:

  1. To provide insight into the VMs hosted in Azure across the global Microsoft Technology Center environment
  2. Showcase the use of some key Microsoft cloud technologies

The Requirement

The global MTC organization is made up of around 30 offices which each have several Azure subscriptions to host the projects they are working on and environments used in customer activities. Additionally, there are several global, shared Azure subscriptions that host core infrastructure and experiences. These subscriptions are tied to various Azure AD tenants depending on requirements. The primary subscription for each MTC also hosts a virtual network that is part of a global IP space that is connected via one of four regional ExpressRoute circuits to the MTC worldwide VPN that provides connectivity between all MTC offices.

While there is a standard governance and process guide each MTC has control of their own subscriptions and resources however from a central MTC organization perspective insight into several key factors was required.

  • Are the VMs registered with the central Log Analytics instance to report inventory and patch state. Log Analytics is part of the Operations Management Suite and is used to accept log information from almost any sort and then provides power analytical capabilities to use that information to provide insight into the environment. A number of solutions are included that provide visibility into best practices, patch status, anti-malware status and much more. For OS instance visibility Log Analytics uses the Microsoft Monitoring Agent (MMS) which is the same agent used by System Center Operations Manager.
  • What is the current patch status of the VM. This is provided by information to Log Analytics and to Azure Security Center if registered. Azure Security Center (ASC) provides a central security posture location for Azure resources including VM health, network health, storage health and more.
  • Is the VM connected to ExpressRoute. This can be found by checking the virtual network a VM is attached to and if that virtual network has an ExpressRoute Gateway connected.
  • Does the VM have a public IP and is it health. Public IP existence can be found through the properties of VM IP configurations and the health which is based on use of Network Security Groups to lock down communication through ASC.
  • Is the VM older than 30 days. Object creations are logged in Azure. By default, these are kept for 60 days which enables a search of the logs for the VM creation. If not found it would mean the VM is older than 60 days and if found the exact age can be determined. The age is useful as short-term VMs do not have the same levels of reporting requirements, i.e. does not have to be registered to OMS.

The insight into the health needed to be in the form to provide easy overall insight while allowing detail to be exposed through drilling down into the data.

The Solution

I started off crafting a solution in PowerShell through which I can access the full knowledge of the Azure Resource Manager via the AzureRM module and also other solutions such as Log Analytics, Azure Security Center and Azure Storage.

If you like to read the end of the book below is the final solution and what I will walk through is some of the detail you see in the picture.

The first challenge was the context to run the script under since multiple Azure AD tenants were utilized and I didn’t want to have to manage multiple credentials. Therefore, Azure AD B2B (business to business) was utilized. A single identity in the main Azure AD tenant was created and then a communication sent to each MTC to add that identity via Azure AD B2B to any local Azure AD tenant instances and then to give that account Read permissions to all subscriptions. This enabled a single credential to be used across every subscription, regardless of the Azure AD tenant the subscription was tied to. This same credential was also give rights to the Log Analytics instance all VMs reported to which enables queries to be run.

Now the access was available the next step was the actual PowerShell to gather the required information. A storage account was created that would be used to store the output of the execution which would be a basic execution report and two JSON files that contained custom objects representing the VM state and Azure subscription information.

The basic PowerShell flow is as follows:

  • Import the ASC and Log Analytics PowerShell modules
  • Access the credential that will be used
  • Connect to Azure using the credential
  • Store a list of every subscription associated to the credential in an array
  • Connect to the Azure Storage account to create a context for BLOB storage
  • Connect to the Log Analytics workspace and trigger two queries whose results were stored in two arrays
    • List of all machines that report to the instance that are stored in Azure
    • List of all machines that are missing patches that are stored in Azure
  • Create three files that contained todays date; log files, VM JSON files and subscription JSON file
  • Create two empty arrays that will store custom objects for VM state and subscription information
  • For every subscription perform the following:
    • List the administrators and write to the log
    • Retrieve the ASC status for the subscription and store in an array
    • For every Resource Group
      • Find the virtual networks connected to ExpressRoute gateway and store in an array
      • For every VM in the Resource Group
        • Find the creation time by scanning the operational log of Azure. Save the creation time if found and if older than 30 days or report older than 30 days if no log found
        • For each NIC inspect the IP configurations
          • Is it connected to a virtual network that has ExpressRoute connectivity
          • Does it have a public IP address and if so what is the health of that public IP based on information previously saved from ASC
        • Is the VM registered in OMS
        • Is the VM missing patches based on information from OMS or ASC
        • Create a custom object using a hash table with all desired information about the VM and add to the VM object array
    • Add a subscription information custom object to the subscription array
  • Upload the three data files generated to the Azure storage account as BLOBs

To actually run the PowerShell I used Azure Automation which not only provided a resilient engine to run the code but capabilities such as credentials which could securely store the identity that was used removing any need to hardcode it in the script itself. The schedule capability was used to trigger the runbook (the container for the PowerShell in Azure Automation) to right daily at 11pm.

At this point in an Azure Storage account was a report and two JSON files with one of them, the VM state JSON file, the most useful which enabled all information to be queried easily however the goal was to have it more easily digestible which meant PowerBI and ideally getting the data more easily available to everyone, e.g. Teams along with a notification that the nights execution was successful.

The solution was to use a Logic App (created by Ali Mazaheri, https://blogs.msdn.com/alimaz) which enables activities to be chained together using various connectors which include Azure Storage, Teams and SharePoint. The Logic App was designed with a recurrence trigger (but could also trigger based on object creations and other triggers) and to then perform the following:

  • List the blobs in the azurescan container (a container is like a folder in Azure Store)
  • For each object that is not empty
    • Get the BLOB content
    • Create a file containing that content in SharePoint
    • Copy the BLOB to an archive BLOB
    • Delete the original BLOB

 

  • Write a message to a team’s channel that the log migration was completed (Or send an email, notification to phone, etc.)

A great feature of Logic Apps is that they are implemented by adding the built-in connectors or your own API apps, Azure Functions and then graphically laying out the flow using conditions, branches and those connectors by passing output as an input for next connectors and in this case some custom expressions. Below is the key content of the Logic App (as an alternative we could have also used Azure Functions and EventGrid to achieve the same goal).

The final step was the Power BI portion to read in the file from SharePoint and provide a visualization of the data contained in the JSON. David Browne created this powerful dashboard that enabled various visualizations of the data and easy access to change the criteria of the data contained.

The Power BI Service can connect directly to SharePoint Online to read the files.  Power Query in Power BI is used to identify the latest data files, convert them from JSON to a tabular format and to clean the data.  The data is then loaded into an in-memory Tabular Model hosted by Power BI and configured for daily refresh.

Using the Azure PS Drive

If you leverage the Azure Cloud Shell in the Azure portal its a very convenient way to manage Azure resources using PowerShell and the CLI but you may have also noticed an actual Azure drive, i.e. Set-Location azure: and you can navigate around your Azure resources (this is actually the default location when the cloud shell opens). At the top level are subscriptions and you can then navigate to resource groups, VMs, WebApps and more.

The Azure drive is provided via the Simple Hierarchy in PowerShell (SHiPS) provider which you can see via Get-PSProvider.

The actual functionality is evolving, its a project on GitHub at https://github.com/PowerShell/SHiPS but this also means you can run this same provider outside of the Azure Cloud Shell.

You need to ensure you are running the latest version of the AzureRM module then download, install, add an Azure account and add the provider:

You can now navigate to Azure: and enjoy the same feature as when in the Azure Cloud Shell.

Note this is completely different from the Azure Cloud Drive which is the persistent file storage you have in the Azure Cloud Shell that is backed by Azure Files and enables data to be saved and used between sessions. Use Get-CloudDrive to see the current configuration and if you wish to change it simply run Dismount-CloudDrive and then restart the shell and select Advanced options to customize the location.

Writing to files with Azure Automation

Azure Automation enables PowerShell (and more) to be executed as runbooks by runbook workers hosted in Azure. Additionally Azure Automation accounts bring capabilities such as credential objects to securely store credentials, variables, scheduling and more. When a runbook executes it runs in a temporary environment that does not have any persistent state and so if you want to work with files you need to save them somewhere, for example to an Azure storage account as a blob, before the runbook completes.

You can actually create and use files as normal using the default path within PowerShell during execution, just remember to save the files externally before the script completes.

For example create a file as usual:

Then before ending the PowerShell, copy it to a blob (as an example storage place):

 

 

Easily create multiple subnets in an Azure Virtual Network

I recently needed to create a whole set of subnets in a large number of virtual networks of various sizes. I thought some variables would be a great way to quickly create the set of subnets in each virtual network which were each /20 networks in a shared class B IP which enabled 16 virtual networks per Class B IP space. The goal was to show that each subnet didn’t need to be a full class C (/24) in instead we could use smaller subnets based on the number of hosts actually required. I’ve included the comments which explains the subnets created and the number of hosts supported in each.

 

 

Use an Application Image from the Azure Marketplace using PowerShell

I recently needed to deploy a special type of VM from the Azure Marketplace using PowerShell and the deployment was not the same as regular Windows or Linux VM.

First I knew the app I wanted to use, e.g. https://azuremarketplace.microsoft.com/en-us/marketplace/apps/microsoft-ads.windows-data-science-vm but wasn’t sure of the publisher or offer. With hindsight it shows you right in the URL, Microsoft-ads is the publisher and windows-data-science-vm is the offer but I initially just searched for what I wanted using the following and looked for it (first part of the code), then got the detail as usual (last two lines):

Now I knew the publisher, offer and SKU I could add that configuration to my VM.

However, when using the application there are a few other steps. You need to set a plan and also accept the terms of the app. Fortunately it’s easy to do.

That was it. Basically the only additional work is setting the plan and accepting the marketplace terms and the data needed for those commands are the same values used for the source image, just PublisherName > Publisher, Offer > Product and SKUs > Name. The exact same would apply if using JSON. Easy!

Deploying an Azure IaaS VM using PowerShell

I recently had to deploy some new VMs and wanted to use PowerShell and also join them to a domain and get the anti-malware extension used. Below is the PowerShell I used. You would need to modify the variables in the below to match your own domains.

 

Using Azure Application Gateway to publish applications

I was recently part of a project to deploy SharePoint and Office Online Server (OOS) to Azure IaaS as part of a hybrid deployment. A requirement was to make the SharePoint available to the Internet in addition to the OOS (enabling editing of documents/previews online).

The deployment was very simple, 3 VMs were deployed to a subnet that has connectivity to an existing AD:

  • SQL Server – 10.244.3.68
  • SharePoint Server – 10.244.3.69, alias record sharepoint.onemtcqa.net
  • OOS Server – 10.244.3.70, alias oos.onemtcqa.net

The alias records were created on the internal DNS and external DNS, a split-brain DNS. We also had a wildcard certificate for onemtcqa.net which we could therefore use for https for both sites.

Azure has two built-in load balancer solutions (with more available through 3rd party solutions and virtual appliances).

  • The layer 4 Azure Load Balancer which could have been used by configuring the front-end as a public IP and supports any protocol
  • The layer 7 Azure Application Gateway that in addition to providing capabilities like SSL offload and cookie based affinity also has the optional Web Application Firewall to provide additional protection. More information on the Application Gateway can be found at https://docs.microsoft.com/en-us/azure/application-gateway/application-gateway-introduction. The front-end IP can be internal or public and the back end can load balance to multiple targets (like the layer 4 load balancer option).

Because the services being published were HTTP based, it made sense to utilize the Azure Application Gateway and would provide a great reason to get hands on with the technology. Additionally the added protection via the WAF was a huge benefit.

There are various SKU sizes available for the Azure Application Gateway along with the choice of Standard or WAF integration. Information on the sizes and pricing can be found at https://azure.microsoft.com/en-us/pricing/details/application-gateway/. I used the Medium size which is the smallest possible when using the WAF tier.

There are a number of settings related to the App Gateway which all relate to each other in a specific manner which provides the complete solution. A single App Gateway can publish multiple sites which meant I only needed a single App Gateway instance with a single public IP for both the sites I needed to publish.

Below is a basic picture of the key components related to an App Gateway that I put together to aid in my own understanding! The arrows show directions of link, so the Rule links to three other items which really bind everything together.

When deploying the Application Gateway through the portal there are some initial configurations:

  • The SKU
  • The virtual network it will connect to and you must specify an empty subnet that can only be populated by App Gateway resources. This should be at least a /29
  • The front end IP and if a public IP is created it must be dynamic and cannot have a custom DNS name
  • If the listener is HTTP or HTTPs and the port

Note, if using a public IP, because it is dynamic and cannot have a custom DNS name you can check its actual DNS name using PowerShell and then create an alias on the Internet to that DNS name. Use Get-AzureRmPublicIPAddress and use the DnsSettings.Fqdn attribute. For example:

The name will be <GUID>.cloudapp.net. I created two alias records, sharepoint and oos, both pointing to this name on the public DNS servers.

Once created we need to tweak some things from those created by the portal wizard.

The virtual subnet that is used for the App Gateway needs its NSG modified as some additional ports must be opened from the Any source to the Virtual Network (this is in addition to the AzureLoadBalancer default inbound rule). Add an inbound rule to allow 65503-65534 TCP from Any to VirtualNetwork. Note this only needs to be enabled on the NSG applied to the Application Gateways subnet and NOT the subnets containing the actual back-end resources. Also ensure the Application Gateway subnet can communicate with the subnets hosting the services.

By default the built-in probe that checks if a backend target is healthy and a possible target for traffic looks for a response between 200 and 399 as a healthy response (per https://docs.microsoft.com/en-us/azure/application-gateway/application-gateway-probe-overview) however for the SharePoint site this won’t work as it prompts for authentication so we need to create a custom probe on HTTPS which accepts 200-401. This can be done with PowerShell (I’m using the internal DNS name here which is the same as external):

Open the HTTP Settings object, ensure it is HTTPS, upload the certificate and select to use a custom probe and select the probe that was just created.

A default listener was created but this can’t be used so instead create a new multi-site listener.

  • Use the existing frontend IP configuration and 443 port
  • Enter the hostname, e.g. sharepoint.onemtcqa.net
  • Protocol is HTTPS
  • Use an existing certified or upload a new certificate to use

Open the backend pool and add the internal IP address of the target(s).

The initial default rule created should work which links to the listener created, the backend pool and the HTTP setting that was modified.

If you open the Backend health under Monitoring it should show a status of healthy and you should be able to connect via the external name (that points to the DNS name of the public IP address).

Now the OOS has to be published which does not require authentication which means a different probe must be used which means a different listener and different targets. Even though it will be a different listener its not like old style listeners where only one can listen on a specific port. This is rather just a set of configurations and so multiple 443 listeners can share the same frontend configuration (and therefore public IP).

  1. Create a new Backend pool with the OOS machines as the target
  2. Create a new multi-site listener that uses the existing Frontend IP configuration and port with the OOS public hostname, HTTPS and OOS certificate (same if a wildcard or subject alternative names)
  3. Create a new health probe. Use the OOS internal DNS name, HTTPS and for path use /hosting/discovery
  4. Create a new HTTP setting that is HTTPS, uses the certificate and uses the new health probe
  5. Create a new basic rule that uses the new listener, the new backend pool and the new HTTP setting

Click the below to see a large image of the OOS set of additional configurations.

Now your OOS should also be available and working! You have now published two sites through a single Application Gateway.

Azure NSG Integration with Storage and Other Services

Network Security Groups (NSGs) are a critical component in Azure networking which enable the flow of traffic to controlled both within the virtual network, i.e. between subnets (and even VMs), and external to the virtual network, i.e. Internet, other parts of known IP space (such as an ExpressRoute connected site) and Azure components such as load balancers. Rules are grouped into NSGs and applied to subnets (and sometimes vNICs however its easier management to apply at the subnet level). Rules are based on:

  • Source IP range
  • Destination IP range
  • Source port(s)
  • Destination port(s)
  • Protocol
  • Allow/Deny
  • Priority

In place of the IP ranges certain tags can be used such as VirtualNetwork (known IP space which includes IP spaces connected to the virtual network, e.g. an on-premises IP space connected via ExpressRoute), Internet (not known IP space) and AzureLoadBalancer. Additionally through the use of service tags other Azure services can be included in rules which include the IP ranges of certain services for example Storage, SQL and AzureTrafficManager. It is also possible to limit these to specific regions for the service, for example Storage.EastUS as the service tag to enable access only to Storage in EastUS. This could then be used in a rule instead of an IP range. This is very beneficial as now you can enable only specific machines in a specific subnet to communicate to specific services in specific regions. Without this functionality you would have to try and create rules based on the public IP addresses each service used. More information on service tags can be found at https://docs.microsoft.com/en-us/azure/virtual-network/security-overview#service-tags.

Another useful feature is application security groups. Using application security groups you can create a number of groups for the various types of application tiers you have (using New-AzureRmApplicationSecurityGroup), use them in NSG rules (e.g. -DestinationApplicationSecurityGroupId) and then you assign a network interface for a VM to be part of a specific application security group (using the ApplicationSecurityGroup parameter at creation time). Now you don’t have to worry about the actual IP address or subnet of the VM in the NSG rules. The NIC is now part of the application security group and will automatically have the rules applied based on that membership. Imagine you created an application security group for all the VMs in a certain tier of the application and they would all automatically have the correct rules regardless of their IP address or subnet membership.

On the other side of the equation you have Azure services like Storage and SQL and by default they have public facing endpoints. While there are some ACLs to limit access it can be very difficult/impossible to try and restrict them to only specific Azure IaaS VMs in your environment. For example you may have a storage account or Azure SQL database instance you only want to be accessible from VMs in a specific subnet in a virtual network. This is now possible through a combination of service endpoints and the Azure service firewall capability.

Firstly on the virtual network, service endpoints are enabled for specific services (e.g. Storage) for specific subnets. This now makes that subnet available as part of the firewall configuration for that target service.(note that if you skip this step it can be done automatically when performing the configuration on the actual service!).

Next on the actual service (which must be in the same region as the virtual network) select the ‘Firewalls and virtual networks’ option, change the ‘Allow access from’ to ‘Selected networks’, ‘add existing virtual network’, select the virtual network and subnets and click Add and then Save. Now the service will only be available to the selected virtual subnets.

When you put all these various features together there are now great controls available between VMs in virtual networks and key Azure services to really help lock down access in a simple way.

More information on service endpoints can be found at https://docs.microsoft.com/en-us/azure/virtual-network/virtual-network-service-endpoints-overview.

Checking the creation time of an Azure IaaS VM

I recently had a requirement to check the age of VMs deployed in Azure. As I looked it became clear there was no metadata for a VM that shows its creation time. When you think about this it may be logical since if you deprovision a VM (and therefore stop paying for it) then provision it again what is its creation date? When it was first created or when it last was provisioned.

As I dug in there is a log written at VM creation however by default these are only stored for 90 days (unless sent to Log Analytics) BUT if its within 90 days I could find the creation date of any VM. For example:

What if the VM was not created in the last 90 days? If the VM uses a managed disk you can check the creation date of the managed disk. If it is NOT using a managed disk then there is not creation time on a page blob however by default VHDs include the creation data and time as part of the file name. Therefore to try and find the creation data of a VM I use a combination of all 3 by first looking for logs for the VM, then look for a managed disk and finally try a data in unmanaged OS disk.