we have following blue=green deployment design. idea
- deploy latest code inactive cluster
- smoke test
- switch vip make current 1 inactive
and created pipelines accordingly in go.cd. however, issue have want deploy latest code cluster has newly transitioned inactive state. how make sure 1 doesn't again become active? or how others doing blue-green deployments? google search results in solutions geared towards aws. don't use aws or public cloud.
edit 1
infrastructural constraints: have hardware available 2 clusters
what stops running batch jobs in live cluster?: live cluster serving production queries , batch load take machine resources, , might make online system non-responsive
i'm not sure if you, in our setup have load balancer clients talk. lb know instances live, , dark , forwards traffic accordingly. if request has 'special' header, lb sends traffic dark pool. have setup per application (just making clear in diagram have posted, people might think whole platform blue-green)
so diagram of be, green cluster live , blue dark (<3 ascii art)
[client] <- assume internal, otherwise add fw :). | \|/ [application load balancer] <- internal, per app | |\--------------\--------------\--------------\ \|/ \|/ \|/ \|/ [node 1 g/l] [node 2 g/l] [node 3 b/d] [node 4 b/d] g = green b = blue l = live d = dark
the application load balancer can number of technologies. gateway app (like netflix zuul) or load balancing webserver (like airbnb smartstack uses haproxy).
it's worth mentioning if live cluster goes in flames, don't automatically promote dark cluster live... i'm trying don't use blue/green alternative high availability. concern? (as you're using vips here , keepalived)
edit
thanks answers questions.unfortunately, don't think you'll able blue-green constraint.
have considered have 1 big environment , doing sort of hybrid between canary release , blue-green? approach, have 5 servers serving live traffic , 1 serving dark traffic (i assume have 6 boxes in total). live nodes configured 3 nodes take live traffic , 2 batch processing.
when you're happy code in dark pool, start upgrading servers 1 one until have servers serving live traffic in live pool. @ point, might need move 2 batch processing servers light pool, unless have way moving them more (probably 1 job @ time?).
just in case, want make clear might come bite (and don't fellow developers in pain). if batch processing fundamental part of platform, you don't have true ha environment, reason outlined in original answer, if live cluster fails reason (db corruption?) won't able run in remaining hardware.
Comments
Post a Comment