Lately my PC has started crashing while it plays videos. It freezes completely, screen frozen and not responding to any input (keyboard, mouse), I mean I cannot change TTY (alt + ctrl + F(1-2-…)), and it cannot even respond to alt + PrntScr + REISUB. I have to force power off by holding down the power button.

After I reboot I have tried checking all logs available and I cannot find anything logged right before the incident. Last entries are always different and not indicating anything.

I suspect it has to do with the graphics card but I’m looking for ways that I can dig deeper on that and confirm it or not.

What else should I check? How can I find more info?

OS: Lubuntu 22.04.3 LTS (latest updates) I’m using the nvidia proprietary drivers (nvidia-driver-390)

UPDATE:

First of all thank you all for your input and fresh ideas. Now I’ve already tried some of them and I will continue with the other ones until I get some results.

till now I have tried

  • memtest and it didn’t show any errors.
  • boot from a live distro and see if problem also occurs. Well it didn’t occur but on the live distro you cannot change the graphics driver. So it was using the open source nouveau driver, also it didn’t happen during the 1 hour I let it play. The thing is that it never was punctual even before. It could happen during the first hour or the third or sometime later.

Next steps are to

  • open the case and clean it up to remove the possibility of high temp because of that,
  • change my drivers to be the nouveau and try again,
  • try with only the onboard GPU on,
  • remove extra disks to reduce the load of the PSU

thank you all again.

23 points

Boot with a live CD from a different distro. This will split a hardware issue from a software issue.

permalink
report
reply
8 points

This! Try this! Don’t go taking your computer apart until you try this. It’s great advice.

permalink
report
parent
reply
-6 points

permalink
report
parent
reply
3 points

thanks! nice idea, i’ll try it

permalink
report
parent
reply
2 points

Not really. Distros usually build the same software slightly differently. If the bug is in a piece of software used by all distros such as the Linux kernel, it won’t make a difference.

permalink
report
parent
reply
16 points
*

Check your storage connection! If storage disconnects, your OS will freeze and stop responding to the keyboard. Also, the os won’t be able to write any logs because the storage isn’t attached. Even power off won’t work because the os can’t read any files. This feels very similar to your problem.

For me, my motherboard had a faulty drive controller which would randomly stop working and drives would no longer appear connected.

I’m not sure whether you have the same issue as me but it has the same characteristics as mine. Hope this helps!

permalink
report
reply
12 points

thanks for your input but it looks different. I mean when I power it off with the button, then it is possible to boot without issues. Also it doesn’t freeze randomly. It freezes only if it plays videos. Now it is 10 days on uninterrupted since I stopped playing videos.

permalink
report
parent
reply
3 points

Had issue with my storage recently, and the symptom was similar to what OP described. Syslog didn’t reveal anything, as the root filesystem was read only, so troubleshooting it was hard. Coincidentally I needed a newer kernel, and after the upgrade the problem disappeared.

permalink
report
parent
reply
-2 points

What does storage have to do with the keyboard?

permalink
report
parent
reply
1 point

When storage disconnects while the os is still running, it causes the os to freeze and stop responding to all keyboard inputs. I thought this was similar to OPs issue which is why I suggested it

permalink
report
parent
reply
3 points

Not always. I’ve seen Linux systems keep running, and open programs work, until they need something from disk, and then either they throw an error or crash.

permalink
report
parent
reply
1 point

I was asking why, lol.

permalink
report
parent
reply
10 points

Test your RAM. I had a machine doing this a few years ago - turns out I had a stick of RAM with a 128k block somewhere in the middle that was dead.

That machine worked fine as long as I didn’t get it doing anything too intensive, then it would crash. A new stick of RAM solved the issue.

permalink
report
reply
4 points

This is the most likely issue. To add - test 3-4 passes of Memtest86+. The first pass is shorter and meant for finding egregious RAM problems. It can fail on subsequent full passes. I had my RAM fail on 3rd of 4th pass which passed the 1st. It could even be caused by incompatibility of the size of RAM with the platform. For example in my case AMD supported 2x 8GB sticks of this RAM with no issues. Insert 4x 8GB and it starts producing errors even if each individual stick passes with flying colors.

permalink
report
parent
reply
3 points
*

Seconded. I’d been having issues (random freezes, crashes) for a while but I had attributed then to a lack of RAM. So I bought some more RAM at some point and ran memtest on all RAM together and saw errors. Those bastards, they sold me dodgy RAM, right? Tested the new sticks individually, they were clean. Turns out I had a bad 64kb area on one of my old sticks.

You can tell the kernel to not use the bad area btw if it’s all in one place, so don’t necessarily rush out to replace the bad stick.

permalink
report
parent
reply
1 point

thanks. I run memtest for about an hour and no errors. I’ll leave it run more if nothing else shows any progress

permalink
report
parent
reply
1 point

Check video ram too. Could be a badd dimm on ur vid card

permalink
report
parent
reply
10 points

I was in a similar situation not too long ago and couldn’t find anything to fix it either at first. One thing that was high on my list was changing my PSU since a defect or weak one often seems to be a problem in such cases. Besides a general hardware failure of course. If it’s the hardware that could be anything really. Motherboard, RAM, GPU, PSU. PSU is the easiest to switch tho, so if you go that route I would try that first.

Anyways, I never had to do this cause in my case, believe it or not, a BIO update fixed my problem. I am still not 100% sure what happened but I think the update fixed the GPU voltage distribution or something similar.

Hope that help at least a little bit.

permalink
report
reply
4 points

good idea about the PSU. I hadn’t thought of that. The PSU is not any high-performance/high-quality and is already 5 years old. Being unable to provide the required voltage may be a possibility if we accept that the performance degrades in time. (Was working without issues for 5 years in the same PC configuration).

I think I’ll try by first removing the extra HDDs so reducing the load and check again. Thanks for your input

permalink
report
parent
reply
1 point

If your processor/MB has onboard video, it would probably be easier to pull the gpu and test. If you still suspect power management, pulling other components like additional HDDs after adding the gpu back would confirm it.

permalink
report
parent
reply
3 points

PSU is the last thing we check but is usually the first to fail under load if it is old or cheap. Try reducing the load on it like not using ur GPU, HDDs or any other peripheral that is unnecessary.

Next check ur RAM, that too can give random errors under load.

permalink
report
parent
reply
7 points

If you have another pc, ssh from it to the problem machine and run sudo dmesg -w. That should show kernel messages as they are generated and won’t rely on them being written to disk.

permalink
report
reply
6 points

i will try it but I’m quite confident that it will be unresponsive/not reachable since if the kernel was listening it would respond to the alt + PrntScr + REISUB by unmounting the drives and I would see it when I examine the logs afterwards

permalink
report
parent
reply
11 points

To be clear, dmesg -w should be run before you do anything to cause the crash. It will continuously print kernel output until you press ctrl+c or the kernel crashes.

In my experience, a crashing kernel will usually print something before going unresponsive but before it can flush the log to disk.

permalink
report
parent
reply
2 points

from now on I’m having it open and shown on the screen show if it happens again I’ll see it. thanks

permalink
report
parent
reply

Linux

!linux@lemmy.ml

Create post

From Wikipedia, the free encyclopedia

Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).

Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word “Linux” in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.

Rules

  • Posts must be relevant to operating systems running the Linux kernel. GNU/Linux or otherwise.
  • No misinformation
  • No NSFW content
  • No hate speech, bigotry, etc

Related Communities

Community icon by Alpár-Etele Méder, licensed under CC BY 3.0

Community stats

  • 7.9K

    Monthly active users

  • 6.3K

    Posts

  • 175K

    Comments