Pivotal Cloud Foundry Conference and Multi-Site PaaS

So I recently got back from Pivotal’s first Cloud Foundry conference;

pivotalcfconf

 

as I’m not a developer, I guess by the power of deduction I’ll settle for cloud leader.

While there, this newly appointed cloud leader, erm, lead a pretty popular discussion on multi-site PaaS (with a focus on Cloud Foundry, but a lot of the ideas translate) with the intention of stirring up enough technical and business intrigue to move the conversation into something of substance and action.

I’m about to kick off further discussions on the VCAP-DEV mailing list (https://groups.google.com/a/cloudfoundry.org/forum/#!forum/vcap-dev), but draft run here can’t hurt

  • Having multiple separate PaaS instances is cheating (and a pain for everyone involved)
  • My definition of multi-site PaaS is currently to support application deployment and horizontal scalability in two or more datacentres concurrently, providing resilience scalability beyond what is currently available.
  • The multi-site PaaS should have a single point of end user/customer interaction.
  • Key advantages should be simplified management, visibility and control for end customers.
  • Multi-Site PaaS should be architected in such a way as to prevent a SPOF or inter-DC dependencies

All well and good saying WHAT i want, how about how?

  • A good starting point would be something that sits above the current cloud foundry components (ie, something new that all api calls hit first) that could perform extensions to existing functions;
    • Extend API functionality for multi-site operations
      • cf scale –-sites 3 <app>
      • cf apps –-site <name>
      • etc
    • Build up a map of PaaS-wide routes/DNS A records to aid in steering requests to the correct sites ‘Gorouters’
    • This may also be a good place to implement SSL with support for per-application SSL certificates as these will need to be available in any site the application is spanned to also.
  • Further thoughts that i need to pin down
    • I’d like to see this getting health state from each DC and redirecting new requests if necessary to best utilize capacity (taking into account latency to the client of the request)

Where this layer will sit physically, and how it will become this unification layer without becoming the SPOF point itself is still playing around my mind.
Current thoughts include;

  • Calling the layer Grouter (global-router) for convenience.
  • This layer will be made up of multiple instances of GRouters
  • Passive listening to NATS messages WITHIN the G-Router instances site for all DNS names/apps used at that site.
  • Distributed protocol for then sharing these names to all other Grouters (NOT A SHARED DB/SPOF).
    • No power to overwrite another G-Routers DB
    • Maybe something can be taken from the world of IP routing where local routes outrank that of remote updates, but can be used if need be.
    • Loss of DB/updates causes Grouter instance to fail open and forward all requests to local GoRouters (failure mode still allows local site independent operation).
    • DNS should only be steering real PaaS app requests or API calls to the GRouter layer although may be for different DC, Depending on use of Anycast DNS *needs idea development*
    • Grouters In normal mode can send redirects for requests for Apps that are In other DC’s to allow a global any-cast DNS infrastructure
  • Multiple instances per DC just like GoRouters so they scale.

Other points which will need discussion are;

  • How does a global ‘cf apps’ work
  • Droplets will need saving to multi-DC blob stores for re-use in scale/outage scenarios
  • We will need a way to cope with applications that have to stay within a specific geographic region.
  • Blue/Green zero downtime deployments?
    • One site at a time
    • Two app instances within each DC and re-mapping routes
    • Second option would prevent GRouter layer needing to be involved in orchestration, reducing complexity.

My next steps are to map out use cases and matching data flows embodying my ideas above.

Just FYI, all this was written on a SJL>LHR flight were awake-ness was clinging to me like the plague, so expect revisions!

Matt