VMware Cloud Foundation Edge Cluster Deployment Removal Tool

20 November 2020 Off By arnaud

Since VCF 4.0, an automated deployment of a new NSX-T edge cluster is possible. No manual actions are required during the deployment of the edge nodes. You only need to provide the necessary parameters (FQDN, MGMT IP, TEP IP, BGP Peer,…)

How to deploy an NSX-T edge cluster within VCF 4.0 is well documented in the official VMware documentation.

But what if something goes wrong during the deployment because of misconfiguration…

During my edge cluster deployment, I encountered the following issue:

The SDDC manager performs checks on the filled in values before actually starting the deployment. It stated that the BGP Peer Password should have a least one special character, but it was NOT checking on the maximum length of the password. By accident I exceeded this maximum length of 20 characters imposed by NSX-T. 

(source: https://code.vmware.com/apis/976/nsx-t)

The deployment task started fine, until it came across the BGP Peer configuration. It failed because of BGP Peer Password was too long. 

How can an issue like this be solved? How can I revert the automated deployment of a NSX Edge cluster in VCF 4.0? Can I edit my NSX-T Edge Cluster configuration during or after deployment?

Well, VMware KB-78635 (https://kb.vmware.com/s/article/78635) can get it done for you. VMware is providing a python script to remove safely all configuration done by the automated deployment. It removes:

  • Tier-1 gateway
  • Tier-0 gateway
  • Edge cluster
  • Edge node VMs
  • Edge uplink segments
  • VLAN-backed transport zones created for the edge deployment
  • vCenter portgroups created for the edge deployment
  • Resource pool, if one was created as part of edge deployment
  • VMware SDDC Manager inventory records of the edge deployment

The script is attached as a .tar.gz-file to the KB itself.

Using the attached script provided by VMware has some prerequisites and limitations:

  • The scripts can only be used for NSX Edge clusters that were deployed using the SDDC Manager
  • No additional configuration may have been done to the NSX edge cluster. It has to be a clean/untouched deployment.
  • SSH root access to the SDDC Manager is required.
  • Administrator credentials of the domain are required
  • The current version of the script (0.17) can NOT be used when unconnected Tier-1 routers exists in the environment (even in other unaffected clusters). This limitation will probably be fixed in the next release. As an example I’m thinking of a Tier-1 router being used as an one-arm load balancer.
    • As workaround you can temporarily connect any unconnected Tier-1s to the Tier-0 router from another edge cluster. Don’t connect the unconnected Tier-1s to the Tier-0 of the edge cluster you want to remove. If you do so, then the cleaner will delete them as well.

Edge Cluster Deployment Removal Tool installation and usage

  • Download the attachment of the above KB-78635
  • Use a SCP-client to copy the .tar.gz-file to the SDDC Manager
  • Login via SSH onto the SDDC Manager with the user named vcf
  • Switch to the root user via the following command:

su –

  • Uncompress the .tar.gz-file by executing the following command:
tar -xzf <file.tar.gz>
  • Run the script remove_edge_cluster.sh with at least following parameters:
    • -c (Name of edge cluster to remove)
    • -u (Name of SSO admin user of domain)

Remark the -d option: make use of the dryrun option to simulate the removal process. During the dryrun, the complete behavoir can observed without the risk of breaking anything.

Example:

./remove_edge_cluster.sh -c edge-cluster01 -u administrator@vsphere.local

Thanks for reading this post and feel free to reach out to me in case of questions and/or remarks.

Have a good day.