A little over one year ago - I was chosen to take a new role within Sonian to lead our Development Operations team. Previously at Sonian I had a role that changed constantly, from being the technical lead with our sales and business development teams to taking 3rd level support issues and working with our OEM partners on custom API integration options. The goal for this new role was to bring some structure to the team, attract new talent and retain existing talent, and ensure the overall health and efficiency of our application.
How to Setup AWS S3 Access From Specific IPs
Recently we were testing with AWS VPC, and a requirement for our project was that we needed to allow nodes within a VPC access to S3 buckets, but deny access from any other IP address. Specifically this was accessing of data that was going to be secured using AWS IAM keys. We needed to make sure that even with the AWS access key and secret key, data could only be retrieved while inside the VPC. Adding yet another layer of security to our existing model.
Why Your Company Should Have Internal HackDays
Recently the Sonian DevOps team (Yes, we call them our DevOps team - they write/deploy code and manage systems) took part in an internal hackday. Internally we call it a hackday, but if you are going to float the idea to your engineering management, calling it a “codefest” might be an easier sell.
About a month ago my Jira board grew with more and more stories asking for new or monitors, bug fixes and additional metrics for Sensu. We’ve had a few large projects start recently with came with tight deadlines and large resource needs, so I didn’t expect to complete these stories for at least a few months. Based on the schedule, I thought if we can have everyone spend one day to work on some of these stories/tickets, the larger projects shouldn’t be delayed.
We have engineering-wide codefests during our company meetups three times a year. These are fantastic opportunities for team members across all parts of our engineering teams (devs and non-devs alike) to work on and present new solutions and ideas to the entire company. We needed a day to hack on a specific project (our Sensu monitors and metrics), and I didn’t want to waste codefest on that. I needed a separate day, just to hack on a specific project.
How to Enable Cross-Account AWS Access With IAM Users
We manage our AWS assets across many different accounts. This helps us keep data and access controls separate depending on the type of data we are controlling.
One of our AWS accounts is a non-production account where we spin up and down test systems to support new feature testing and other activities to support development. Our build cluster (which lives in a separate AWS account) needs access some S3 buckets that live in our non-production account. The problem is that this account also contains some of our more sensitive stacks, like QA and UA systems, which are locked down from a moderate to a production level of access. So, in this case, I needed to create an IAM user on our build cluster account that could access specific buckets in our Non-Prod account. This is to ensure we’re not giving out keys that could be used to potential cause data destruction.
Here is how you can do it if you are looking for the same level of Amazon cross-account access for S3 buckets (with granular per-bucket IAM level permissions).
What Am I Doing, and How Did I Get Here?
After quite a long absence from blogging I have decided to return to discussing some of the new and (hopefully) interesting technologies that my team and I work with. I work for Sonian as the Director of DevOps, and I manage 5 very skilled individuals who assist me with the operation of our systems in the cloud. We work with lots of cutting edge technology (Such as Chef from Opscode), and where there is no software to do what we need - we create it ourselves (See Sensu for an example).
In addition - I’m working to increase my knowledge of Ruby as we use Ruby for all our tools and applications. Coming from a non-computer science background, that will be a challenge in itself. I’ve starting by reading the fantastic book written by Chris Pine - Learn to Program.
I’ll be writing more about some of the challenges of managing big data and large system at scale in the cloud - where you have no control over network and disk I/O (among other things we can’t control), and discuss how we deploy, monitor, scale and secure our systems.
But Wait, There’s Less (Durability)!
Amazon recently announced a new tier of storage available within their web services cloud infrastructure. Amazon’s current storage solution, S3, is truly the gold standard for durable cloud based storage that provides 99.999999999% durability (which if my math is right, means that for every 100 Billion objects stored in S3, Amazon “may” lose a single object every year). Amazon is listening to their customers, and now provides a lower cost (33% cheaper) S3 storage solution called Reduced Redundancy Solution (RRS).
Moved Into the Clouds
I had started this blog initially as a way to discuss storage and virtualization solutions while working as a technology consultant. But recently a new opportunity presented itself, and I’ve now made the transition out of consulting, and back to the start-up world. This most recent adventure is with a company called Sonian which provides a cloud based data archiving and eDiscovery solution. What is so wonderful about this new venture is we leverage the Amazon Web Services cloud providing us the ability to consume storage and computing by the granule. We don’t need to make huge capital outlays in data centers, storage, servers, etc… And since we don’t need to buy and maintain all this hardware (which will eventually be refreshed in 3-5 years), we can keep the costs low and pass on those savings to our customers.
How to Add VMware Paravirtual SCSI (PVSCSI) Adapters.
A few months before the vSphere release VMware showed some amazing stats in regards to the increased level of I/O that can be attained in a virtual infrastructure. They posted this info on their blog and the outcome of the testing was impressive. They were able to achieve 350,000 I/O operations per second on a single vSphere host (ESX 4.0) and with just 3 virtual machines. Their testing utilized the EMC Enterprise Flash Drives, which have an incredibly high throughput. They talked about how the VMware Paravirtual SCSI (PVSCSI) adapter was able to achieve 12% more throughput with 18% less CPU cost compared to the LSI virtual adapter.
VMware Fault Tolerance
vSphere was just released to general availability today, and one of the best features of this upgrade is the addition of VMware Fault Tolerance. From the VMware site:
VMware Fault Tolerance is leading edge technology that provides continuous availability for applications in the event of server failures, by creating a live shadow instance of a virtual machine that is in virtual lockstep with the primary instance. By allowing instantaneous failover between the two instances in the event of hardware failure, VMware Fault Tolerance eliminates even the smallest of data loss or disruption.
Invalid Arguments: Virtual Machine Has No Snapshots
I ran into an very interesting issue today with a client who is using Veeam Backup and Replication to keep their virtual machines replicated to a remote ESX server for disaster recovery. Veeam starts a replication job and will take a snapshot of the virtual machine and then replicate the main VMDK disk file to the remote site. When the backup job finishes Veeam will tell VMware to remove the snapshot until the next replication schedule runs. Since we are replicating our VM’s across a slow WAN connection (600Kbps optimized with Citrix WANScalers) the replication can often timeout, or hang. Today I noticed that the replication had not updated since last night. So I needed to stop the replication and re-start it. Since the Citrix WANScalers can cache as well as compress, restarting a failed replication job is usually pretty quick, as most of the data was previously cached on the Citrix boxes. Here are the details of what I found, and how I fixed it…