It took a lot of planning, prototyping, testing and trial runs, but we’ve finally completed our migration over to amazon/aws.
What does this mean for you? It means significantly lower chances of down-time (our systems are waaaaay more fault-tolerant now), plus tighter security (we were always secure, but now even more so).
On our end we get to breathe easier knowing that our systems are fully redundant and scalable to meet demand, and we can start deploying new updates with confidence.
The rest of this post is a bit inside baseball, but I thought I’d share some details as we get asked about our setup quite often in support-land and at tech meetups.
About a year ago we began bumping up against walls that impeded our ability to move forward. These fell in to three broad categories: scalability, agility, and (as a result) marketing. Arguably these are good problems for a company to have (well, maybe not the marketing part), and while we had theoretical solutions for everything, the practical matter of implementation took quite a bit of time, a lot of learning and a few leaps of faith.
On the infrastructure front, our production architecture at Linode, while hyper-optimized (running arch+nginx), did not lend itself to easy scalability. Given a team of sysadmins we could have overcome this, but we’re app developers at heart. Enter aws: eb, ec2, rds, ses, s3 and route53 to save the day. While we still have a few things to tweak, we couldn’t be happier with the move. Major props to the boys at TriNimbus for giving us a hand.
On the agility front, our inability to efficiently work on isolated feature development was killing us. Branching-and-merging is not an area where svn excels. We’re now git across the board and cannot believe we waited so long to switch. We still use beanstalk for our repo origins because, well, they’re awesome. On another note, we made the mistake, in hindsight, of outsourcing some of our mobile app development. We were running low on internal man-power (as our time was getting sucked into putting out fires), but in the end having a portion of our development outsourced really killed our iterative release cycles. As we were unable to synchronously release feature updates across all platforms we got stuck in the mud. So we’re bringing development back in-house. All of it.
On the marketing front, being in a scalability bind we weren’t in a position to bring hoards of new users into the mix — though we did welcome thousands of new users through organic / word-of-mouth referrals. Now that we have the infrastructure in place to support rapid growth, we’ll be kicking things up a notch to more aggressively get the word out.
All of which is to say that it took the better part of a year to dig ourselves out of the hole we made for ourselves. Building good software and running a solid business both involve continual improvements and course-corrections, and it’s nice to be back on a path with daylight again. We’re all pretty stoked for what comes next.
A great article by CamMi Pham that really resonates.
“There’s a notable distinction between being busy and being productive. Being busy doesn’t necessarily mean you’re being productive. Being productive is less about time management and more on managing your energy. It is the business of life. We need to learn how to spend the least amount of energy to get the most benefits.”
Read the full article here. It’s worth it.
The data center that houses Nirvana’s front-end web servers suffered a prolonged power outage last night. It took a lot longer than anyone expected (or wanted) before the thousands of VMs at their facility, including ours, were able to be brought back online. The high-level sequence of events is available here.
We’re back online now and everything is running normally.
Please accept our sincere apologies if you were unable to login this morning.
We don’t usually talk much about our infrastructure (as we figure it’s not that interesting to most GTD’ers), but given recent events I suppose this might be one of those times when people might like to know more.
Over the past few months we’ve been incrementally migrating our infrastructure to a geographically distributed and fault tolerant cloud architecture, hosted at Amazon AWS. They are truly amazing, and we are in good company.
Our databases are already running as multi-AZ replicated RDS instances, so they were unaffected by the power outage.
In a frustrating twist of fate, we had planned on moving the remainder of our web servers to AWS yesterday, but decided to push the migration back a week (as we’ve been working a lot of Saturday nights lately and we kinda wanted a break), and Linode, where we’ve been happily hosted for years, winds up having a major outage — the exact type of event we have been working hard to mitigate by moving to our new architecture. Arrrrgh.
Having our servers auto-scaling and load-balanced across multiple data centers will significantly reduce the chances of outages in the future. In light of last night’s events, it can’t come soon enough. Thanks for sticking with us, and sorry again for the unexpected downtime.
You may have read in the news this week that a major bug, nick-named heartbleed, was reported in OpenSSL, the cryptographic library used by websites to encrypt and protect information transmitted over the internet.
Nirvana servers are not affected by this vulnerability. We run an earlier version of OpenSSL that is not vulnerable to the heartbleed bug. (version 1.0.0a to be precise)
That said, if you use the same email/password combination to login to Nirvana that you use on other affected sites, it might be wise to change your Nirvana password, as those sites may have been exploited to expose your passwords in transit. This will reset/rotate your authentication tokens throughout our systems, and you should be good to go.
Some food for thought:
“The Principle of Priority states (a) you must know the difference between what is urgent and what is important, and (b) you must do what’s important first.”
― Steven Pressfield, The War of Art: Break Through the Blocks & Win Your Inner Creative Battles
Tagging actions in Nirvana with contexts, and the time / energy required, helps you quickly whittle down your available actions based on where you are, how much time, and how much energy you have throughout the day.
Having a short list of actions to chose from helps ensure that you make progress on the important things that matter “right now.”