Since NSX-T 2.5, the concept of edge failure domains is introduced. Failure domains is a logical grouping concept within an edge cluster. These failure domains makes it possible to logical separate active/standby Tier-0/1 SR instances across edge nodes in different locations. (Racks, room, sites,…). The benefit of this is to guarantee service availability of a Tier-0/1 SR. The active and standby instances will not run in the same failure domain.
Note: Configuration of an edge cluster that is stretched across multiple sites is out-of-scope in this post.
API
Until now (tested on NSX-T 3.1.1), the implementation of these failure domains can only be done via API.
In the official documentation it’s well explained how to perform this action: https://docs.vmware.com/en/VMware-NSX-T-Data-Center/3.1/administration/GUID-5D7E3D43-6497-4273-99C1-77613C36AD75.html#automatic-recovery-of-the-data-plane-1
Python script
But would it not be handier if these multiple API calls would be bundled together and that you just have to fill in the required names and IDs?
Well, I thought the same! I’ve created this simple Python script which bundles all the necessary API calls. And even more handy is that if you need to provide an ID of a specific component, it shows you a complete list of the existing ones in your environment.
Check out the script on GitHub: https://github.com/arnaudgandibleux/NSX-T-CreateFailureDomains
Different options are possible while running the script.
- A full run through the configuration of failure domains and immediatly attach it to edge nodes
- Run every function separately:
- 1: Get failure domains
- 2: Create failure domains
- 3: Delete failure domains
- 4: Get edge nodes
- 5: Get edge clusters
- 6: Assign edge nodes to new failure domain
- 7: Configure edge cluster placement method