Network architecture or untangling cloud networks

Enterprise network architecture in the cloud tends to vary in complexity. How not to get entangled in cloud networks?

Jakub Procházka

The network architecture in the cloud can be differently complex, depending on the services and needs of the company – operational, security, legislative and others. The important thing is not to get too entangled in these networks, which I hope this article will help you to do.

Virtual networks

The elementary network element in the cloud is the virtual network. In Azure it is abbreviated as VNet, in AWS as VPC (virtual private cloud). This basic building block of network infrastructure, as the name implies, is the virtual equivalent of a traditional network connecting the systems connected to it (similar to what we know from on-premise environments).

In some ways, the virtual network differs slightly from on-premise, but overall its creation and management is much easier. There is no longer a need to run to the datacenter and connect cabling, or complicated configuration of switches and routers. We are able to do everything directly from the portal or from the CLI (command line), literally instantly.

In the case of virtual networks, we are at layer 3 (L3) of the ISO/OSI model. This means that we cannot, for example, use VLANs that are at layer 2 (L2). In the cloud we work from L3 upwards (with the need to use IP protocol).

Throughout this article, I will be describing technologies working at different layers, so it is useful to recall the OSI model in the figure below.

Source: https://www.cloudflare.com/learning/ddos/glossary/open-systems-interconnection-model-osi/

IP addresses in virtual networks

Virtual networks form an isolated, secure, highly available private network within a cloud environment. Like any internal network, this one must be assigned a private address range, specifically an IPv4 CIDR block.

It is good to keep in mind that we will always lose five IP addresses from the range – as opposed to the two IP addresses (three including the gateway) we are used to from on-premise. In addition to the traditional network address, broadcast and gateway, AWS and Azure always reserve two additional IP addresses for DNS and “future use”. The smallest supported network is with the prefix /29 (we get three free IPs) and the largest is /8 (16 777 211 free IP addresses).

Both of these providers also support IPv6, respectively. dual stack (IPv4/IPv6). Dual stack virtual networking allows applications to connect over both IPv4 and IPv6 to resources within the network or to the Internet. For both providers, the IPv6 range must also be exactly /64.

Source: https://docs.microsoft.com/en-us/azure/virtual-network/ipv6-overview

These are private address ranges, so we don’t have to save them. Even so, it is advisable to plan and think about the overall architecture in advance. This is being done in preparation for the so-called landing zone.

Although virtual networks are isolated elements, we should not overlap address ranges (this also app

lies to on-premise addresses). This is mainly for future planning. The moment we interconnect the same address ranges with each other, erratic behavior would occur, even total failure.

Naming conventions and network segmentation

For virtual networks, it is always a good idea to define a naming convention and then follow it. The general recommendation is to specify the type of source, the name of the application, the environment, the region, the ranking itself. A Vnet in Azure could be called, for example, vnet-shared-eastus2-001.

Source: https://docs.microsoft.com/en-us/azure/cloud-adoption-framework/ready/azure-best-practices/resource-naming

The address range must always contain subnets, at least one. Of course, this is usually not the optimal solution. We apply similar logic to network segmentation as we know from on-premise. This is usually defined by the network team together with the security team.

We just have to remember that we don’t use VLANs. Typically, we have at least a frontend and a backend subnet for the application. We can also have our own subnet for storage, AKS and more.

The golden rule is to leave a sufficient margin when addressing, as overaddressing can be problematic. We’d rather assign more addresses and not use them than assign fewer addresses and get into trouble in the future. If you’re not sure, you usually won’t make a mistake with the /16 address range and /24 subnets.

Source: https://docs.microsoft.com/en-us/azure/virtual-machines/network-overview

Less is sometimes more

Virtual networks are free – it doesn’t matter how many of them there are, the address range, or the number of subnets. However, the recommendation is not to overdo it with them, because more networks are harder to manage and overall we would make the environment more cluttered or complicated.

In general, it can be recommended to divide VNet/VPC for example by region, environment, department or application. If sources are to communicate with each other, they can be on the same network and avoid unnecessary peering.

Isolation and security can be managed to some extent by using security groups, which I will describe in more detail in the next section of this article. If, on the other hand, the applications are to be managed by different teams and they need to configure the network, it may be better to split them up.

Limits on the number of virtual networks vary significantly between providers: Azure has a limit of 1000 virtual networks per subscription and 3000 subnets per VNet. AWS has a default limit of 5 VPCs per region (can be increased to 100) and 200 subnets per VPC.

Network design

For both providers, virtual networks are region dependent. This means that it is not possible to stretch a single network across multiple regions. Sources connected to VNet/VPC must always be in the same region. Exceptions are some global CDN-type services that do not have a defined region, or only for metadata.

If we want to communicate with sources outside the region/VNet, we can set up peering between the networks. Peering itself is free, but you pay for the data that flows through it.

Network and PaaS integration

With a few exceptions, most services need the virtual networks they connect to in order to operate. PaaS services overwhelmingly also support integration with VNet/VPC, but some are able to operate without it.

In this case, they connect to an internal virtual network, which is not visible to the user in the portal and cannot be managed in any way. Depending on the specific PaaS service, additional functionality such as firewall, dns, etc. may be available. Communication then takes place over the cloud provider’s backbone network, or over the Internet, and allows the service to, for example, access the Internet, connect a public IP address and protect against DDoS. An example of such a service is App service/Elastic Beanstalk, where users connect only from the Internet.

However, services not connected to the customer network are rather isolated cases and in many of them I would still consider communication within VNet for security reasons. We can achieve this by using service and private endpoints, which I will write about in more detail below.

Integration of cloud networks and on-premise environments

For on-premise connectivity, we consider either a site-to-site VPN or, in the best case, an Express route/Direct connect (or a combination of both).

I don’t need to introduce S2S VPN as an encrypted tunnel through the Internet, Express route (Azure) and Direct connect (AWS) are services to create a private dedicated line between the on-premises datacenter and the provider’s datacenter. These lines are more reliable, faster and provide lower latency than Internet communication.

If the traffic is via Express route/Direct connect, you pay according to the price list of the service. Otherwise, you pay classically for data traffic from the cloud, so-called. egress. There is no charge for ingress (uploading data to Azure), as mentioned in my previous article.

What about network and subnet isolation?

Thanks to the isolation property, VNets/VPCs do not see each other in the default state – the sources do not communicate with each other, or communicate less securely outside the VNet (Internet, internal Azure network).

This can be “bypassed” if necessary by setting the aforementioned peering between VNets. The peering traffic is not free, both egress from the outgoing VNet and ingress to the destination VNet are charged. Prices also vary depending on whether the peering is within a region or between regions. Peering within a region is cheaper, but even in the case of global peering, it is not a staggering amount of money (however, it is good to keep this in mind).

Subnets are used to divide the address range into multiple blocks for better visibility, segmentation, and security. Unlike VNet/VPC, communication between subnets within one virtual network is not restricted by default. Resources are not isolated by subnets and can communicate with each other. This communication can be restricted using a security group.

Basic network firewalling

When it comes to filtering and restricting traffic for the first time, we start talking about security groups. These are called AWS Security groups in AWS and Azure Network security groups (NSG). It is a basic firewall that can be connected to the whole subnet or to a specific VM/EC2, resp. to their network interface card (NIC). In simple terms, it is a list of rules for outgoing and incoming traffic. These rules have different priorities, according to which they are then evaluated (similar to iptables).

To give you an idea: in Azure, a security rule must always contain source, port range, destination, protocol (any/tcp/udp/icmp), action (deny/allow), and priority. The resource can have a value of Any, IP address, service tag, or Application security group. A service tag is a label for a group of defined groups, such as the Internet, virtual network, etc. The target can be Any, IP, VNet, Application security groupa.

Example: Disabled all (ANY) inbound with priority 1000 and enabled port 80 from the Internet with priority 500. In this case, if an NSG packet from the Internet arrives on port 80 of the destination IP, it will be allowed, anything else will be dropped. If the inbound ban had a higher priority (lower number), communication to port 80 would also be dropped.

Network topologies

Probably the most common topology that we at ORBIT encounter in the cloud is hub and spoke. In this topology, there is one central VNet/VPC that is used for VPN connections (either S2S or P2S) and in which shared services for other networks are connected.

This network is then set to peer between the others. By not setting up peering between individual VNets (always only between the hub network and the spoke network) and not enabling forwarding on the hub network peering, the individual VNets remain isolated from each other. A slight disadvantage is that if a new network is added, it is always necessary to set up peering with the hub network.

Source: https://docs.microsoft.com/en-us/azure/architecture/reference-architectures/hybrid-networking/hub-spoke?tabs=cli

In addition to the hub to spoke topology, we often see the concept of single NVA (network virtual appliances), where traffic is routed through a security appliance to control both outgoing and incoming traffic. In this case, NVA is a single point of failure, so it must be deployed in HA mode. The NVA ensures that only traffic meeting clearly defined criteria passes.

Source: https://docs.microsoft.com/en-us/azure/architecture/reference-architectures/dmz/images/nva-ha/single-nva.png

Service and private lines

In the previous paragraphs I mentioned the possible integration of PaaS services with client networks. This can be done in two ways, which differ in terms of security. The first way is a service endpoint, which allows a certain group of services to connect to the subnet. The second option is a private endpoint that maps a specific resource directly to a VNet – a new NIC is created for it (similar to a VM).

While the service endpoint is set on the subnet, the private endpoint is set directly on the PaaS service.

Both options reduce the risk of exposing resources directly to the Internet and allow for security group connections (even at the NIC level for PEP). In neither case does the communication pass outside the provider’s network.

But there are two major differences. One is in scope: a private endpoint connects a specific resource and creates a NIC for it, a service endpoint enables connections for all resources of the same type (e.g. all storage accounts in a given tenant).

The second is in the connection method: while a service endpoint essentially “extends” the existing address range of a given network with a service (communication remains on the provider’s backbone), a private endpoint extends a specific service to a given VNet.

Protection against DDoS attacks

All Azure and AWS resources are protected by basic DDoS protection, which is free and without configuration. In Azure it’s DDoS Basic, in AWS Shield Standard. There are also paid options – in Azure DDoS Standard and AWS Shield Advanced. The paid options are not among the cheapest services, but they bring some added functionality over the basic plan – see the following image.

Source: https://docs.microsoft.com/en-us/azure/ddos-protection/ddos-protection-overview

The price of the paid DDoS protection option, which covers up to 100 public IP addresses, is approximately USD 3,000 per month for both providers. Each additional source, resp. protected public IP address over the limit, costs USD 30 per month. The good thing is that it is possible to have one Standard plan for the whole tenant across sub-scripts.

Application load balancing

Load balancers are also an important part of networks. In addition to the traditional ones that operate at the L4 layer, cloud providers also provide more advanced application load balancers (in Azure application gateway) that operate at the seventh layer (L7) of the OSI model to support some additional features such as cookie-based session affinity.

Traditional load balancers at the fourth transport layer operate at the TCP/UDP level and route traffic according to the source IP address and port to the destination IP address and port. In contrast, application load balancers can make decisions based on a specific HTTP request. Another example would be routing by incoming URL.

Application load balancers can also function as an application firewall or web firewall. Thanks to the fact that they operate at the application layer, they are significantly more advanced protection than, for example, the aforementioned security groups. Web application firewall(WAF) is a centralized protection against various exploits and other vulnerabilities (examples of attacks can be SQL injection or cross-site scripting).

Source: https://docs.microsoft.com/en-us/azure/web-application-firewall/ag/ag-overview

Accelerate and smooth the performance of web applications

Azure FrontDoor and AWS Global Accelerator are global services running at Layer 7 (L7) that aim to improve performance for target users around the world. It is claimed that these services can improve performance by up to 60%.

Some of the benefits of Azure Front Door:

application performance acceleration using split TCP-based anycast protocol
intelligent monitoring of backend resource health
URL-path based routing
hosting multiple websites for an efficient application infrastructure
cookie-based session affinity
SSL offloading and certificate management
defining your own domain
application security with integrated Web Application Firewall (WAF)
redirecting HTTP traffic to HTTPS with URL redirect
URL rewrite
Native support for end-to-end IPv6 connections and HTTP/2 protocol

Source: https://docs.microsoft.com/en-us/azure/frontdoor/front-door-overview

Traffic manager & Route 53

Now we are getting outside the OSI model. Traffic manager (Azure) or Route 53 (AWS) services do not work on any of its layers, but are based on DNS.

Traffic manager is a DNS-based load balancer that allows you to distribute traffic for applications accessible from the Internet across regions. It also provides high availability and fast response for these public endpoints.

What does it all mean? In simple terms, it is a way to direct clients to the appropriate endpoints. Traffic manager has several options to achieve this:

Priority routing
Weight routing
Performance routing
Geographic routing
Multivalue routing
Subnet routing

These methods can be combined to increase the complexity of the resulting scenario to make it as flexible and sophisticated as possible.

At the same time, the health of each region is automatically monitored and traffic is redirected if an outage occurs.

Source: https://docs.microsoft.com/en-us/azure/traffic-manager/traffic-manager-routing-methods

Conclusion

Today’s article was a bit more technical and comprehensive than the previous one. However, we went through together all the key components of cloud networks and corresponding services that can operate at different layers of the OSI model (specifically L3-L7).

The target concept of cloud networking can vary greatly depending on the application, corporate and security policies, as well as limited budgets. It is also true for networks to use only what makes sense to us and delivers sufficient benefits for its price. We may like the paid DDoS plan, but if we are a startup, we will probably do without it and rather save 3000 USD (about 65 thousand CZK) per month.

Whether you are more or less involved in networking and cloud, I believe today’s article has revealed the benefits you can achieve with the right network architecture.

If you’re interested in other topics related to the cloud, check out our Cloud Encyclopedia series – a quick guide to the cloud.

This is a machine translation. Please excuse any possible errors.