Table of Contents
Organizations that utilize Veeam Backup & Replication to store backups on Wasabi can now achieve a true disaster recovery capability without having to modify their existing backup strategy, infrastructure, or processes. Opti9 provides a fully managed Disaster-Recovery-as-a-Service (DRaaS) which utilizes Veeam replication jobs to replicate production servers in their native format to dedicated recovery infrastructure. The replication jobs run concurrently within Veeam, ensuring no interference with backup tasks.
As part of Opti9’s managed service, Opti9 will design, implement, test, monitor, and maintain a virtual recovery site that meets your business’s unique requirements. This service is specifically designed for organizations looking to ensure their applications will be available within the contracted SLAs for RTO & RPO. See DR-Readyfor use cases that do not require specific SLAs for RTO.
2. Reference Architecture
In the architecture referenced above, there are five main components –
- The Veeam Backup and Replication Server that protects the on-prem infrastructure. This is owned and managed by the customer.
- A Wasabi account that stores the backups from Veeam. This account is owned and managed by the customer.
- Opti9 Disaster Recovery Cloud – The cloud infrastructure where production server images are replicated to. This is managed and owned by Opti9.
- The OptiXdashboard – Opti9’s portal which provides the ability to initiate DR failover, tests, RPO monitoring, alarm notifications, 24x7 support, and other capabilities.
- Observr – An optional Opti9 service that constantly monitors customers’ Veeam backup and replication to detect ransomware, anomalies, and other suspicious activity. Read more on Observr here.
3. DRaaS Solution Features
Opti9’s Disaster-Recovery-as-a-Service (DRaaS) is a fully managed solution which includes:
- Strategy & Design: Opti9 will ensure applications can be securely consumed from the DR site the same way as production by building a network & security integration strategy based on your existing architecture, minimizing changes to production environments.
- Platform Support: Opti9’s DRaaS supports VMware, HyperV, Physical Servers, IBM iSeries, IBM AS/400, AWS, and unstructured file data (NFS/CIFS)
- Fully Managed: Including initial configuration, 24x7 monitoring, testing, ownership of failover & failback, and troubleshooting.
- As low as 5-minute RPO SLA, and 1-hour RTO SLA
- Performance SLA for compute, storage, and network
- Multiple Failover Scenarios: Support for full failover, application-specific failover, and server-specific
- Runbook Management: For the above failover scenarios and testing
- OptiXdashboard: Portal with complete self-service control, real-time & historical monitoring, DR compliance reports, and many other features
4. How to start using Opti9’s DRaaS solution with Veeam and Wasabi
To utilize Opti9's DRaaS offering, reach out to Opti9 (click here) The technical experts at Opti9 will work with you throughout the lifecycle of the DRaaS solution.
During the onboarding process, Opti9 will complete the following steps:
- Design a DR strategy based on your organization’s specific application, network, security, and platform requirements.
- Deploy a virtual private cloud environment to act as the target for DR at any of Opti9’s 12+ global cloud platforms.
- Configure encrypted connectivity between the production site and DR for replication.
- Deploy and configure network resources such as virtual firewalls to match application consumption strategy, or work with customers to deploy their own virtual or physical appliances.
- Group servers together per application to form Failover plans.
- Configure replication jobs within Veeam.
- Configure customized alarm notifications to Opti9 24/7 NOC and customer contacts for RPOs and other critical components.
- Author custom runbooks based on the organization’s unique application requirements.
- Conduct initial DR test with the customer and ensure RTOs are met.
- Configure schedule for automated DR testing reminders to be sent to customers from OptiXdashboard.
5. Requirements for successful DR strategy
To achieve a successful disaster recovery capability, the following components should be considered:
Shared services required for multiple applications to function should be run in an always-on capacity. This typically includes centralized authentication (i.e.: Active Directory, LDAP), networking (i.e. DHCP), and security (i.e. MFA proxies). Alternatively, these services can also be replicated, however, running them in an always-on capacity and syncing their configurations within the applications themselves will greatly reduce RTO.
Users should be able to consume applications at the DR site the same way they do in production, and without manual changes. The best way to achieve this is to match the network components and architecture in use at production within the DR site. Organizations should seek to utilize network devices such as routers and firewalls which support device pairing and syncing to minimize change control.
Software-defined platforms, such as SD-WAN, network-as-a-service platforms, and cloud-based web proxy services provide the ability to failover and back network resources in a modern, policy-based manner, without the need for manual intervention.
Additionally, the network strategy should include support to failover network resources for specific applications, and even individual servers, so that the DR site can be utilized for more than just a complete production site failure. Similarly, careful consideration must be made for DR testing to ensure that production data stored within 3rd party SaaS & cloud platforms are not altered as part of testing.
Compliance & Security
Security & Compliance of the DR site must match those of production. The best way to achieve this is to ensure that the same tools and service providers deployed at production are also present at the DR. Ensure those tools and services are compatible with the DR infrastructure and that the DR site meets all regulatory and compliance requirements such as HIPAA.
Additionally, since no immutability exists for replication, ensure proper monitoring is in place of the production Veeam backup & replication server so that the DR site cannot be destroyed or altered by an attacker. See how Observr + Wasabi addresses this concern by reading here.
6. How to perform a failover
Customers can initiate a failover in case of a disaster, or for testing purposes via multiple methods. If the Veeam Backup & Replication server deployed at production is still accessible, follow the steps detailed here.
If the Veeam server at the production site is no longer accessible, you can initiate the failover process, or declare a DR event for Opti9 to perform it via the OptiXdashboard by following these steps:
1. Navigate to https://opti9tech.com/ and click on the at the top right.
2. If you wish to initiate a failover directly, navigate to “Services --> DRaaS” click the “Start Failover” button for all applications you wish to failover.
This will initiate the boot up process based upon the pre-defined boot order and other configurations. You can now skip to step 6 to access the replicated servers.
3. Alternatively, to request Opti9 to initiate a managed failover on your behalf, navigate to "Services --> DRaaS" and click on "Declare Disaster".
4. When declaring a failover, ensure to select a test or live DR, a full site failover or partial failover is required, the specific Failover plans if a partial failover, and note any deviations to be made from the standard runbook. Please see the image below and confirm all form items are appropriately filled out.
This will create an escalated ticket with Opti9’s technical services team. A representative will contact you to initiate the DR runbook and ensure the required RTOs are met. Opti9 will continue to work with yourself, other members of your organization, and required 3rd party vendors to ensure applications are properly accessible.
5. You will now see all replicated servers online via the recovery environment and should be able to access the servers and applications via the agreed upon network consumption strategy agreed to during on-boarding. Click "Login to Recovery Environment" to access it.
6. The recovery environment is a dedicated VMware virtual private cloud instance. From the interface below, you can perform activities typically accessible from vCenter, including:
- Modifying VM resources (CPU, memory, disk) and other attributes
- Provision additional VMs required from templates or by cloning replicated servers.
- View real-time & historical VM performance.
- Manage storage policies.
- Console VMs
- Manage networking via the built in NSX virtual appliance, including local networks, VLANs, VXlans, DHCP, ACL, IPSEC and many other supported network configurations.
7. If a live failover was performed, it can optionally be made permanent by Opti9’s technical service team, or manually within Veeam by following the steps here.
Once the production site is ready to be utilized again, a Failback can be initiated by Opti9, or manually by following THESE steps.
8. Once the DR event or test has ended, you can view and export detailed reporting on the failover event from the OptiXdashboard by clicking on any Failover plan that was in scope, and then clicking the ‘Failover Job History’ Tab.
9. From here, you will see a list of all DR events, this application participated in. Clicking ‘details’ for any event will provide a more specific report including start/end times for the entire failover event, as well as per server statistics, success state of all events, and other detailed information sufficient for compliance and auditing capabilities. Click the ‘Excel’,’PDF’,’ or ‘CSV’ buttons to export the report.