lemm.ee

Local All Communities Log in Sign up

Local All Communities

783

Today I'm grateful I'm using Linux - Global IT issues caused by Crowdstrike update causes BSOD on Windows(www.timesnownews.com)

posted 4 months ago

by

Thorned_Rose@sh.itjust.works

in

This isn’t a gloat post. In fact, I was completely oblivious to this massive outage until I tried to check my bank balance and it wouldn’t log in.

Apparently Visa Paywave, banks, some TV networks, EFTPOS, etc. have gone down. Flights have had to be cancelled as some airlines systems have also gone down. Gas stations and public transport systems inoperable. As well as numerous Windows systems and Microsoft services affected. (At least according to one of my local MSMs.)

Seems insane to me that one company’s messed up update could cause so much global disruption and so many systems gone down :/ This is exactly why centralisation of services and large corporations gobbling up smaller companies and becoming behemoth services is so dangerous.

Sort:

Hot Top Controversial New Old

[ +- ]

184 points

4 months ago

The annoying aspect from somebody with decades of IT experience is - what should happen is that crowdstrike gets sued into oblivion, and people responsible for buying that shit should have an epihpany and properly look at how they are doing their infra.

But will happen is that they’ll just buy a new crwodstrike product that promises to mitigate the fallout of them fucking up again.

report

reply

[ +- ]

0x0@programming.dev

93 points

4 months ago

decades of IT experience

Do any changes - especially upgrades - on local test environments before applying them in production?

The scary bit is what most in the industry already know: critical systems are held on with duct tape and maintained by juniors 'cos they’re the cheapest Big Money can find. And even if not, There’s no time. or It’s too expensive. are probably the most common answers a PowerPoint manager will give to a serious technical issue being raised.

The Earth will keep turning.

report

reply

[ +- ]

goodgame@feddit.uk

34 points

4 months ago

some years back I was the ‘Head’ of systems stuff at a national telco that provided the national telco infra. Part of my job was to manage the national systems upgrades. I had the stop/go decision to deploy, and indeed pushed the ‘enter’ button to do it. I was a complete PowerPoint Manager and had no clue what I was doing, it was total Accidental Empires, and I should not have been there. Luckily I got away with it for a few years. It was horrifically stressful and not the way to mitigate national risk. I feel for the CrowdStrike engineers. I wonder if the latest embargo on Russian oil sales is in anyway connected?

report

reply

[ +- ]

0x0@programming.dev

18 points

4 months ago

I wonder if the latest embargo on Russian oil sales is in anyway connected?

Doubt it, but it’s ironic that this happens shortly after Kaspersky gets banned.

report

reply

[ +- ]

ik5pvx@lemmy.world

30 points

4 months ago

Unfortunately falcon self updates. And it will not work properly if you don’t let it do it.

Also add “customer has rejected the maintenance window” to your list.

report

reply

[ +- ]

MyNameIsRichard@lemmy.ml

35 points

4 months ago

Turns out it doesn’t work properly if you do let it

report

reply

[ +- ]

marcos@lemmy.world

7 points

4 months ago

Well, “don’t have self-upgrading shit on your production environment” also applies.

As in “if you brought something like this, there’s a problem with you”.

report

reply

[ +- ]

HumanPenguin@feddit.uk

25 points

4 months ago

Not OP. But that is how it used to be done. Issue is the attacks we have seen over the years. IE ransom attacks etc. Have made corps feel they needf to fixed and update instantly to avoid attacks. So they depend on the corp they pay for the software to test roll out.

Autoupdate is a 2 edged sword. Without it, attackers etc will take advantage of delays. With it. Well today.

report

reply

[ +- ]

0x0@programming.dev

15 points

4 months ago

*

I’d wager most ransomware relies on old vulnerabilities. Yes, keep your software updated but you don’t need the latest and greatest delivered right to production without any kind of test first.

report

reply

Show more comments

Show more comments

[ +- ]

Avatar_of_Self@lemmy.world

2 points

4 months ago

*

I get the sentiment but defense in depth is a methodology to live by in IT and auto updating via the Internet is not a good risk to take in general. For example, should Crowdstrike just disappear one day, your entire infrastructure shouldn’t be at enormous risk nor should critical services. Even if it’s your anti-virus, a virus or ransomware shouldn’t be able to easily propagate through the enterprise. If it did, then it is doubtful something like Crowdstrike is going to be able to update and suddenly reverse course. If it can then you’re just lucky that the ransomware that made it through didn’t do anything in defense of itself (disconnecting from the network, blocking CIDRs like Crowdsource’s update servers, blocking processes, whatever) and frankly you can still update those clients anyway from your own AV update server which is a product you’d be using if you aren’t allowing updates from the Internet in order to roll them out in dev first, phasing and/or schedules from your own infrastructure.

Crowdstrike is just another lesson in that.

report

reply

[ +- ]

shirro@aussie.zone

131 points

4 months ago

*

I isn’t even a Linux vs Windows thing but a competent at your job vs don’t know what the fuck you are doing thing. Critical systems are immutable and isolated or as close as reasonably possible. They don’t do live updates of third party software and certainly not software that is running privileged and can crash the operating system.

I couldn’t face working in corporate IT with this sort of bullshit going on.

report

reply

[ +- ]

rozodru@lemmy.world

61 points

4 months ago

This is just like “what not to do in IT/dev/tech 101” right here. Every since I’ve been in the industry for literally decades at this point I was always told, even when in school, “Never test in production, never roll anything out to production on a Friday, if you’re unsure have someone senior code review” of which, Crowdstrike, failed to do all of the above. Even the most junior of junior devs should know better. So the fact that this update was allowed go through…I mean blame the juniors, the seniors, the PM’s, the CTO’s, everyone. If your shit is so critical that a couple bad lines of poorly written code (which apparently is what it was) can cripple the majority of the world…yeah crowdstrike is done.

report

reply

[ +- ]

magic_lobster_party@kbin.run

35 points

4 months ago

It’s incredible how an issue of this magnitude didn’t get discovered before they shipped it. It’s not exactly an issue that happens in some niche cases. It’s happening on all Windows computers!

This can only happen if they didn’t test their product at all before releasing to production. Or worse: maybe they did test, got the error, and they just “eh, it’s probably just something wrong with test systems”, and then shipped anyway.

This is just stupid.

report

reply

[ +- ]

pelotron@midwest.social

5 points

4 months ago

Can you imagine being the person that hit that button today? Jesus.

report

reply

[ +- ]

CalcProgrammer1@lemmy.ml

28 points

4 months ago

*

It’s also a “don’t allow third party proprietary shit into your kernel” issue. If the driver was open source it would actually go through a public code review and the issue would be more likely to get caught. Even if it did slip through people would publically have a fix by now with all the eyes on the code. It also wouldn’t get pushed to everyone simultaneously under the control of a single company, it would get tested and packaged by distributions before making it to end users.

report

reply

[ +- ]

Aphelion@lemm.ee

5 points

4 months ago

It’s actually a “test things first and have a proper change control process” thing. Doesn’t matter if it’s open source, closed source scummy bullshit or even coded by God: you always test it first before hitting deploy.

report

reply

[ +- ]

cybersandwich@lemmy.world

12 points

4 months ago

And roll it out in a controlled fashion: 1% of machines, 10%, 25%…no issues? Do the rest.

How this didn’t get caught by testing seems impossible to me.

The implementation/rollout strategy just seems bonkers. I feel bad for all of the field support guys who have had there next few weeks ruined, the sys admins who won’t sleep for 3 days, and all of the innocent businesses that got roped into it.

A couple local shops are fucked this morning. Kinda shocked they’d be running crowd strike but also these aren’t big businesses. They are probably using managed service providers who are now swamped and who know when they’ll get back online.

One was a bakery. They couldn’t sell all the bread they made this morning.

report

reply

Show more comments

Show more comments

[ +- ]

Morphit @feddit.uk

2 points

4 months ago

It’s not that clear cut a problem. There seems to be two elements; the kernel driver had a memory safety bug; and a definitions file was deployed incorrectly, triggering the bug. The kernel driver definitely deserves a lot of scrutiny and static analysis should have told them this bug existed. The live updates are a bit different since this is a real-time response system. If malware starts actively exploiting a software vulnerability, they can’t wait for distribution maintainers to package their mitigation - they have to be deployed ASAP. They certainly should roll-out definitions progressively and monitor for anything anomalous but it has to be quick or the malware could beat them to it.

This is more a code safety issue than CI/CD strategy. The bug was in the driver all along, but it had never been triggered before so it passed the tests and got rolled out to everyone. Critical code like this ought to be written in memory safe languages like Rust.

report

reply

[ +- ]

umbrella@lemmy.ml

15 points

4 months ago

I couldn’t face working in corporate IT with this sort of bullshit going on.

im taking you don’t work in IT anymore then?

report

reply

[ +- ]

KillerTofu@lemmy.world

6 points

4 months ago

There are state and government IT departments.

report

reply

[ +- ]

Aceticon@lemmy.world

14 points

4 months ago

*

More generally: delegate anything critical to a 3rd party and you’ve just put your business at the mercy of the quality (or lack thereof) of their own business processes which you do not control, which is especially dangerous in the current era of “cheapest as possible” hiring practices.

Having been in IT for almost 3 decades, a lesson I have learned long ago and which I’ve also been applying to my own things (such as having my own domain for my own e-mail address rather than using something like Google) was that you should avoid as much as possible to have your mission critical or hard to replace stuff dependent on a 3rd Party, especially if the dependency is Live (i.e. activelly connected rather than just buying and installing their software).

I’ve managed to avoid quite a lot of the recent enshittification exactly because I’ve been playing it safe in this domain for 2 decades.

report

reply

[ +- ]

Phoenix3875@lemmy.world

3 points

4 months ago

*

Deleted by creator

report

reply

[ +- ]

reverendz@lemmygrad.ml

2 points

4 months ago

Our group got hit with this today. We don’t have a choice. If you want to run Windows, you have to install this software.

It’s why stuff like this is so crippling. Individual organizations within companies have to follow corporate mandates, even if they don’t agree.

report

reply

[ +- ]

msage@programming.dev

-6 points

4 months ago

So it’s Linux vs Windows

report

reply

[ +- ]

SayCyberOnceMore@feddit.uk

25 points

4 months ago

No it’s Crowdstrike… we’re just seeing an issue with their Windows software, not their Linux software.

report

reply

[ +- ]

Sethayy@sh.itjust.works

-7 points

4 months ago

That being said Microsoft still did hire crowd strike and give them the keys to release an update like this.

End result still is windows having more issues than linux

report

reply

Show more comments

Show more comments

[ +- ]

monoboy@lemmy.zip

83 points

4 months ago

*

Didn’t Crowdstrike have a bad update to Debian systems back in April this year that caused a lot of problems? I don’t think it was a big thing since not as many companies are using Crowdstrike on Debian.

Sounds like the issue here is Crowdstrike and not Windows.

report

reply

[ +- ]

Baldur Nil@programming.dev

42 points

4 months ago

*

They didn’t even bother to do a gradual rollout, like even small apps do.

The level of company-wide incompetence is astounding, but considering how organizations work and disregard technical people’s concerns, I’m never surprised when these things happen. It’s a social problem more than a technical one.

report

reply

[ +- ]

PlexSheep@infosec.pub

18 points

4 months ago

*

They didn’t even bother to test their stuff, must have pushed to prod

(Technically, test in prod)

report

reply

[ +- ]

laurelraven@lemmy.blahaj.zone

13 points

4 months ago

Everyone has a test environment

Some are lucky enough to also have a separate production environment

report

reply

[ +- ]

DaneGerous@lemmy.world

17 points

4 months ago

A crowdstrike update killed a bunch of our Linux VMs that had a newer kernel a month or so ago.

report

reply

[ +- ]

Possibly linux@lemmy.zip

3 points

4 months ago

*crowdstrike

report

reply

[ +- ]

TCB13@lemmy.world

79 points

4 months ago

While I don’t totally disagree with you, this has mostly nothing to do with Windows and everything to do with a piece of corporate spyware garbage that some IT Manager decided to install. If tools like that existed for Linux, doing what they do to to the OS, trust me, we would be seeing kernel panics as well.

report

reply

[ +- ]

tenchiken@lemmy.dbzer0.com

63 points

4 months ago

Hate to break it to you, but CrowdStrike falcon is used on Linux too…

report

reply

[ +- ]

kautau@lemmy.world

55 points

4 months ago

*

And if it was a kernel-level driver that failed, Linux machines would fail to boot too. The amount of people seeing this and saying “MS Bad,” (which is true, but has nothing to do with this) instead of “how does an 83 billion dollar IT security firm push an update this fucked” is hilarious

report

reply

[ +- ]

Badabinski@kbin.earth

10 points

4 months ago

*

Falcon uses eBPF on Linux nowadays. It’s still an irritating piece of software, but it no make your boxen fail to boot.

edit: well, this is a bad take. I should avoid commenting on shit when I’m sleep deprived and filled with meeting dread.

report

reply

Show more comments

Show more comments

[ +- ]

Aniki 🌱🌿@lemmings.world

-1 points

4 months ago

You’re asking the wrong question: why does a security nightmare need a 90 billion dollar company to unfuck it?

report

reply

Show more comments

Show more comments

[ +- ]

DigitalDilemma@lemmy.ml

12 points

4 months ago

And Macs, we have it on all three OSs. But only Windows was affected by this.

report

reply

[ +- ]

biscuitswalrus@aussie.zone

32 points

4 months ago

Hate to break it to you, but most IT Managers don’t care about crowdstrike: they’re forced to choose some kind of EDR to complete audits. But yes things like crowdstrike, huntress, sentinelone, even Microsoft Defender all run on Linux too.

report

reply

[ +- ]

TCB13@lemmy.world

4 points

4 months ago

Yeah, you’re right.

report

reply

[ +- ]

Mikina@programming.dev

24 points

4 months ago

I wouldn’t call Crowdstrike a corporate spyware garbage. I work as a Red Teamer in cybersecurity, and EDRs are bane of my existence - they are useful, and pretty good at what they do. In the last few years, I’m struggling more and more to with engagements we do, because EDRs just get in the way and catch a lot of what would pass undetected a month ago. Staying on top of them with our tooling is getting more and more difficult, and I would call that a good thing.

I’ve recently tested a company without EDR, and boy was it a treat. Not defending Crowdstrike, to call that a major fuckup is great understatement, but calling it “corporate spyware garbage” feels a little bit unfair - EDRs do make a difference, and this wasn’t an issue with their product in itself, but with irresponsibility of their patch management.

report

reply

[ +- ]

TCB13@lemmy.world

2 points

4 months ago

Fair enough.

Still this fiasco proved once again that the biggest thread to IT sometimes is on the inside. At the end of the day a bunch of people decided to buy Crowdstrike and got screwed over. Some of them actually had good reason to use a product like that, others it was just paranoia and FOMO.

report

reply

[ +- ]

abbadon420@lemm.ee

2 points

4 months ago

This is the problem that managers view security as a product they can simply buy as wholesale, instead of a service that they need to hire a security guy (or a whole department) for.

report

reply

[ +- ]

TCB13@lemmy.world

2 points

4 months ago

Hmmm… but that goes up to the CEO level, people like to see everything as a product they can buy because that has less liabilities than hiring people… Also makes a lot more sense from an accounting perspective.

report

reply

[ +- ]

Aniki 🌱🌿@lemmings.world

-2 points

4 months ago

How is it not a window problem?

report

reply

[ +- ]

Jako301@feddit.de

19 points

4 months ago

Why should it be? A faulty software update from a 3rd party crashes the operating system. The exact same thing could happen to Linux hosts as well with how much access those IPSec programms usually get.

report

reply

[ +- ]

Aniki 🌱🌿@lemmings.world

-16 points

4 months ago

*

But that patch is for windows, not Linux. Not a hypothetical, this is happening.

report

reply

Show more comments

Show more comments

[ +- ]

DigitalDilemma@lemmy.ml

15 points

4 months ago

*

The fault seems to be 90/10 CS, MS.

MS allegedly pushed a bad update. Ok, it happens. Crowdstrike’s initial statement seems to be blaming that.

CS software csagent.sys took exception to this and royally shit the bed, disabling the entire computer. I don’t think it should EVER do that, so the weight of blame must lie with them.

The really problematic part is, of course, the need to manually remediate these machines. I’ve just spent the morning of my day off doing just that. Thanks, Crowdstrike.

EDIT: Turns out it was 100% Crowdstrike, and the update was theirs. The initial press release from CS seemed to be blaming Microsoft for an update, but that now looks to be misleading.

report

reply

[ +- ]

marcos@lemmy.world

3 points

4 months ago

It is on the sense that Windows admins are the ones that like to buy this kind of shit and use it. It’s not on the sense that Windows was broken somehow.

report

reply

[ +- ]

Swarfega@lemm.ee

65 points

4 months ago

I’ve just spent the past 6 hours booting into safe mode and deleting crowd strike files on servers.

report

reply

[ +- ]

allywilson@lemmy.ml

19 points

4 months ago

Feel you there. 4 hours here. All of them cloud instances whereby getting acces to the actual console isn’t as easy as it should be, and trying to hit F8 to get the menu to get into safe mode can take a very long time.

report

reply

[ +- ]

Swarfega@lemm.ee

7 points

4 months ago

Ha! Yes. Same issue. Clicking Reset in vSphere and then quickly switching tabs to hold down F8 has been a ball ache to say the least!

report

reply

[ +- ]

Blank@lemmy.world

2 points

4 months ago

Just go into settings and add a boot delay, then set it back when you’re done.

report

reply

[ +- ]

Avatar_of_Self@lemmy.world

2 points

4 months ago

*

What I usually do is set next boot to BIOS so I have time to get into the console and do whatever.

Also instead of using a browser, I prefer to connect vmware Workstation to vCenter so all the consoles insta open in their own tabs in the workspace.

report

reply

[ +- ]

Possibly linux@lemmy.zip

1 point

4 months ago

Can’t you automate it?

report

reply

[ +- ]

ArrogantAnalyst@infosec.pub

10 points

4 months ago

Since it has to happen in windows safe mode it seems to be very hard to automate the process. I haven’t seen a solution yet.

report

reply

[ +- ]

Swarfega@lemm.ee

5 points

4 months ago

Sadly not. Windows doesn’t boot. You can boot it into safe mode with networking, at which point maybe with anaible we could login to delete the file but since it’s still manual work to get windows into safe mode there’s not much point

report

reply

[ +- ]

lengau@midwest.social

7 points

4 months ago

It is theoretically automatable, but on bare metal it requires having hardware that’s not normally just sitting in every data centre, so it would still require someone to go and plug something into each machine.

On VMs it’s more feasible, but on those VMs most people are probably just mounting the disk images and deleting the bad file to begin with.

report

reply

Show more comments

Show more comments

Linux

!linux@lemmy.ml

From Wikipedia, the free encyclopedia

Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).

Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word “Linux” in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.

Rules

Posts must be relevant to operating systems running the Linux kernel. GNU/Linux or otherwise.
No misinformation
No NSFW content
No hate speech, bigotry, etc

Related Communities

!opensource@lemmy.ml
!libre_culture@lemmy.ml
!technology@lemmy.ml
!libre_hardware@lemmy.ml

Community icon by Alpár-Etele Méder, licensed under CC BY 3.0

Community stats

7.7K
Monthly active users
6.5K
Posts
179K
Comments

Community moderators

AgreeableLandscape@lemmy.ml
nooter692@lemmy.ml
MarcellusDrum@lemmy.ml
Arthur Besse@lemmy.ml
Cyclohexane@lemmy.ml
d3Xt3r@lemmy.nz

modlog legal instances join-lemmy.org

lemmy-ui-next v0.11.0 (github)lemmy v0.19.5 (github)