It was to talk about “team restructuring”
Got to agree with @Zushii@feddit.de here, although it depends on the scope of your service or project.
Cloud services are good at getting you up and running quickly, but they are very, very expensive to scale up.
I work for a financial services company, and we are paying 7 digit monthly AWS bills for an amount of work that could realistically be done with one really big dedicated server. And now we’re required to support multiple cloud providers by some of our customers, we’ve spent a TON of effort trying to untangle from SQS/SNS and other AWS specific technologies.
Clouds like to tell you:
- Using the cloud is cheaper than running your own server
- Using cloud services requires less manpower / labour to maintain and manage
- It’s easier to get up and running and scale up later using cloud services
The last item is true, but the first two are only true if you are running a small service. Scaling up on a cloud is not cost effective, and maintaining a complicated cloud architecture can be FAR more complicated than managing a similar centralized architecture.
You are paying aws to not have one big server, so you get high availability and dynamic load balancing as instances come and go.
I agree its not cheaper than being on prem. But it’s much higher quality solutions.
Today at work, they decided to upgrade from ancient Ubuntu version to a more recent version. Since they don’t use aws properly, they treat servers as pets. So to upgrade Ubuntu, they actually upgraded Ubuntu on the instance instead of creating a new one. This led to grub failing and now they are troubleshooting how to mount disks etc.
All of this could easily be avoided by using the cloud properly.
That could be avoided by using on prem properly, too. People are very capable of making bad infrastructure whether on prem or cloud.
I used to work on an on premise object storage system before, where we required double digits of “nines” availability. High availability is not rocket science. Most scenarios are covered by having 2 or 3 machines.
I’d also wager that using the cloud properly is a different skillset than properly managing or upgrading a Linux system, not necessarily a cheaper or better one from a company point of view.
where we required double digits of “nines” availability
Do you mean 99% or 99.99999999%? Because 99.99999999% is absurd. Even Google doesn’t go near that for internal targets. That’s 1/3 of a second per year of downtime. If a network hiccup causes 30s of downtime, you’ve blown through a century of error budget. If you’re talking durability, that’s another matter, but availability?
For ten-nines availability to make any sense, any dependent system would also have to have ten nines availability, and any calling system would have to have close to ten nines availability or it’s not worth ten nines on the called system.
If the traffic ever goes over TCP/IP, not even if it ever goes over the public internet, if it ever goes over Ethernet wires, ten nines sounds like overkill. Maybe if it stays within a mainframe computer, but you’d have to carefully audit that mainframe to ensure that every component involved also has approx ten nines.
If you mean 2 nines availability, that’s not high availability at all. That’s nearly 4 days of downtime a year. That’s enough that you don’t necessarily need a standby system, you just need to be able to repair the main one within a few hours if it goes down.
We have on prem and do all our upgrades by burn the OS and move the data, with the exception of the hypervisor OS (which has a pretty resilient bulk self upgrade built in, and we have a burn-the-OS plan documented for if they do crash). Even system file corruption of a random pet server? New VM and reattach the data disk. Need high availability? Throw F5 or HAProxy at the problem (assuming L7 protocol support).
Both cloud and on prem can work equally when done right. The most important part is to understand that both have different types of cost (human, machine, developer) and to make the right choice based your/your customer’s needs and any applicable laws or regulations about data locality. And yeah, sometimes one will be better for someone and not someone else.
Seven figures of cloud engineering can’t solve stupid, but neither can seven figures of datacenter. This isn’t some Sith/Jedi concept where you have hard definitions of dark and light or good and evil - though sometimes both will see each other as the enemy, and they are in a way competitors.
Being cloud-agnostic also means additional cost/complexity.
Sometimes the only way to win the game is by not playing it.
I would argue that most cloud native services existed in their standalone forms way before public clouds made their own versions. For example there are loads of message queue systems that are just as easy to incorporate and are cloud agnostic, some of them are FOSS. Sure you can reinvent the wheel but in most cases something like RabbitMQ will work OK depending on the use case. Having cloud vendor lock in is where cost catches up with you. Complexity is arbitrary since there are ways to make anything overcomplicated.
I worked in operations for a large company that had their own 50,000 sq ft data center with 2000 physical servers, uncountable virtual servers, backup tape robots, etc… Their cooling bill would like to disagree with your assessment about scaling. I was unpacking new servers regularly because, when you own you own servers, not only do you have to buy them, but you have to house them (so much rented space), run them, fix them, cool them, and replace them.
Don’t get me wrong, I’ve also seen the AWS bill for another large company I worked for and that was staggering. But, we were a smaller tech team and didn’t require a separate ops group specifically to maintain the physical servers.