Insight: Multicloud Architectures and Data Duplication Problems
Multi-cloud is a cloud computing model in which an organization uses a combination of clouds to distribute applications and services, which can be two or more public clouds, two or multiple private clouds, or a mix of public, private, and edge clouds
Multi-cloud refers to more than one cloud deployment of the same type, which can be public or private and sourced from different cloud providers. Businesses use Multi-cloud to mix and match a variety of public and private clouds to use best-of-breed applications and services.
One should not confuse Multi-cloud architecture with multi-tenant architecture. They are not the same. Multi-tenant architecture refers to software architecture in which a single instance of software runs on a server and serves multiple tenants.
Why implement Multi-cloud?
Various Multi-cloud use cases can be leveraged to provide increased flexibility and control over workloads and data to IT teams. Because multi-cloud provides a flexible cloud environment, organizations can use it to meet specific workloads or application requirements, both technically and commercially.
Organizations see geographic benefits to utilizing multiple cloud providers to address app latency issues. Furthermore, some businesses may use specific cloud providers for a short period to achieve short-term goals and then discontinue usage. Concerns about vendor lock-in and potential cloud provider outages are also frequently raised when IT leaders advocate for a multi-cloud strategy.
Amazon Web Services (AWS), Elastic Compute Cloud (EC2), Google Cloud Storage, Google Compute Engine, Amazon Web Services Simple Storage Service (S3), and Azure Files are examples of cloud computing services.
Reasons to Adopt Multi-Cloud Strategy:
One reason you would use a multi-cloud strategy is to comply with data localization or data sovereignty laws. These are laws that specify where data should be physically stored (usually in the country where the data was collected originally). You may have difficulty complying if you only use one CSP because even the largest cloud providers do not have data centers in every country.
So, if your company operates globally and requires cloud services in countries with data localization laws, you may be forced to use a CSP with data centers in those areas. That CSP might not be the same as the one you have in another country. In such cases, a multi-cloud strategy is the only plausible option.
Another reason is that your CSP may not offer specialized cloud services (e.g., artificial intelligence and machine learning services), or if they do, they may not be as good as those offered by another CSP. Adopting a multi-cloud strategy increases your chances of obtaining best-in-class cloud services.
There are several other reasons to employ a multi-cloud strategy. We’ll go over them in greater detail in the Pros and Cons section further down. Meanwhile, let’s look at the six most popular multi-cloud architecture designs.
Different Strategies of Multi-Cloud Architecture
- Cloudification
In this multi-cloud architecture design, an on-premises application uses cloud services from various cloud platforms. The application in the following example stores data in AWS S3 and Azure Files. By utilizing a multi-cloud architecture, this application not only gains access to the cloud’s scalability, but also avoids vendor lock-in and improves availability and reliability. Even if the Azure service fails, the AWS service can keep the application running normally.
- Multi-Cloud Relocation
When an on-premises application is re-hosted onto a cloud platform (commonly referred to as “the lift and shift” method) and then configured to use a service from another cloud platform, the process is known as Multi-Cloud Relocation. In the following example, the application is moved to an AWS EC2 instance, but the data storage is obtained from Azure.
- Multi-Cloud Refactoring
Multi-cloud refactoring involves re-architecting an application so that it can be deployed in a multi-cloud environment. We have an on-premises application in this example. It has been re-architected into two components, each of which is deployed on a different cloud platform, one on Azure and the other on Google Cloud Platform (GCP).
It should be noted that, unlike relocation, which only relocates the application, refactoring necessitates a redesign of the original application and changes made to the application’s code.
- Multi-Cloud Rebinding
Rebinding, similar to refactoring, involves re-architecting the original application to deploy it to multiple environments across different CSPs. The distinction is that one component (or components) remains on-premises, whereas the other component (or components) is/are migrated to separate cloud platforms. The deployments on the second platform serve as failover targets if the first deployment fails.
In the following example, the AWS deployment of Component B serves as the active component, serving clients during normal operations, while the GCP deployment of Component B serves as the passive component, serving clients only if the AWS deployment fails. Component A is still on-premises.
* LB stands for load balancer.
- Multi-Cloud Rebinding with Cloud Brokerage
This is still multi-cloud rebinding at its core, but with the addition of a cloud brokerage service which is in charge of integrating various components of a multi-cloud infrastructure and ensuring that all components are operating optimally and securely.
- Multi-Application Modernization
Multi-application modernization entails re-architecting multiple applications as a portfolio and then deploying them in a multi-cloud infrastructure, as opposed to just re-architecting a single application for multi-cloud deployment.
Individual applications, even if re-architected for the cloud, will have some shortcomings such as data inconsistencies, duplicate functionality, higher maintenance costs, and so on. Analyzing individual applications prior to re-architecting them may reveal opportunities for consolidation and integration, allowing them to function more cohesively once deployed in a multi-cloud environment.
Benefits of Using a Multi-Cloud Architecture
The following are the benefits of using a multi-cloud architecture:
- Reliability and redundancy. Setting up redundancy across your IT infrastructure for reliability and availability can be more effective when done in a multi-cloud environment. In the worst-case scenario, if one of your CSPs’ entire infrastructure becomes unusable, perhaps as a result of a massive distributed denial-of-service (DDoS) or ransomware attack, you can still operate on the other CSP’s infrastructure(s).
- There will be less reliance on a specific seller. You can avoid vendor lock-in, which is a situation in which you are unable to migrate to another cloud even if you want to for compatibility reasons.
- Money saved: When you’re not completely reliant on a single vendor, it’s easier to negotiate or obtain better deals on cloud services.
Drawbacks of Using a Multi-Cloud Architecture
- The complexity involved in managing a Multi-Cloud architecture is one the major considerations to be thought off before diving in. You must understand the intricacies of each cloud platform and service. Integrating services from various platforms can also be difficult.
- An increase in latency: Although choosing different cloud providers for data localization may improve cloud-to-user latency, you may experience increased latency within your underlying cloud infrastructure, as a multi-cloud infrastructure will undoubtedly consist services running from various and most likely geographically dispersed — data centers
- Increased vulnerability: The more cloud providers you include in your multi-cloud strategy, the larger your attack surface will be. Furthermore, because each cloud platform has its own set of complexities, you will need to tailor security controls for each platform.
What Exactly Is Data Deduplication?
Data deduplication is a process that’s used to remove duplicate copies of data from a system through data compression. It helps reduce the amount of storage space required by removing redundant data and retaining unique instances. It is used in data backup and network data mechanisms to store a single unique instance of data within a database or information system (IS).
Intelligent compression, single instance storage, commonality factoring, and data reduction are all terms for data deduplication.
Data deduplication is a process that eliminates redundant data copies and significantly reduces storage capacity requirements. Deduplication can also be employed as an inline process which is actively working in the background to remove duplicates from the data as data is being written to the disc.
Deduplication is a zero-data-loss technology that can be used both inline and in the background to maximise savings. Deduplication is enabled by default, and the system runs it on all volumes and aggregates without user intervention.
The performance overhead for deduplication operations is minimal because it runs in a separate efficiency domain from the client read/write domain. It runs in the background, regardless of which application is running or how the data is accessed (NAS or SAN).
Deduplication savings are preserved as data moves — when it is replicated to a disaster recovery site, backed up to a vault, or moved between on-premises, hybrid cloud, and/or public cloud.
Process of deduplication?
Deduplication operates at the 4KB block level removing duplicate data blocks and storing only unique data blocks.
Fingerprints — unique digital signatures for all 4KB data blocks — are the core enabling technology of deduplication.
When data is written to the system, the inline deduplication engine scans the incoming blocks to create a fingerprint, which is then stored in a hash store (in-memory data structure).
Following the computation of the fingerprint, a scan of the hash store is performed. When a fingerprint match is found in the hash store, the data block corresponding to the duplicate fingerprint (donor block) is looked up in cache memory:
• Whenever there’s a match identified, the next step is to perform a byte-by-byte comparison between the current data block (a.k.a recipient block) and the donor block to confirm if they are indeed the same, in other words, an identical match. During verification, the recipient block is shared with the matching donor block without the recipient block being written to disc. To track the sharing details, only metadata is updated.
- If the donor block is not located in the cache memory, it is retrieved from the disk and loaded into the cache for a byte-by-byte comparison to verify a precise match. Without actually writing to the disc, the recipient block is marked as duplicate after verification. To track sharing details, metadata is updated.
The background deduplication engine operates similarly. It scans all data blocks in aggregate and eliminates duplicates by comparing block fingerprints and performing a byte-by-byte comparison to eliminate any false positives. This procedure also ensures that no data is lost during the deduplication operation.
For more insightful blogs and information on how to digitally transform your business, visit www.payoda.com.
Author: Sritharan Chellachamy