Each time I try AMD graphics, something is fucked for me. Back with fglrx, fglrx just sucked, so I used Nvidia. Then I had an AMD right around when they finally had opensource drivers, but it was still buggy as hell. So I went with Nvidia again (first a GTX 790, then a GTX 1060). In the meantime I had a new work notebook where I also went with an AMD APU, and had driver crashes for a long time when I was in video calls and it had to decode multiple streams. That thankfully stabilized with Linux 6.4.
Since sooo many people in the community swear by AMD, I thought “dammit, let’s try it again for my new desktop” and got an 7800rx … and I have to reboot ~5 times until I finally make it to a running xserver or wayland session. Apparently I am hit by this problem (at least I hope so). But that doesn’t even read nice … the fix seems to be to revert another fix for powermanagement. So I either have a mostly non-booting card or suboptimal power management.
I start to regret having chosen AMD … again :-/ I seem to be cursed.
And here I am with a 3090 having more issues than I have time for wishing I went with an AMD card. Sadly we both can see grass ain’t necessarily greener.
Thanks for that perspective. At least that makes me regret my decision less.
I’ve tried the open source drivers, the proprietary dkms variant, and standard proprietary drivers and all give me issues.
Linux is cursed for me
Sorted that for you
I have a similar story with an RX580, I replaced my GTX 1060 3GB for a 8GB RX 580 mostly because the 3GB of vram were an issue for BeamNG.
Now I can’t record my 3 displays with the RX 580, it just fails when trying to do so, and 2 displays results in constant encoder overloads, something that the 1060 had issues at all, also my colors are off when recording and I have no idea why, it even happens when recording with the CPU:
https://bbs.archlinux.org/viewtopic.php?id=292196
Also kernel 6.6 broke the power reporting on all polaris GPUs, thankfully that was fixed recently in kernel 6.7.2, but holy shit it took like 6 months to fix that.
I probably shouldn’t have read tests and forums, but simply searched for crashes and open bugs to get a feeling for what I am getting into. Then again I also read from people with very ugly problems with nvidia, so it’s not a really good measure.
I really want AMD to be good; they offer more VRAM where nvidia always seems to cheap out in pretty suspicious ways. Then again nvidia seems to be more power efficient.
My time with nvidia on linux was 0 issues in performance or usability.
The only sort of issue that I had was that the GTX 1060 drew 20W at idle when using the 3 displays, this was a bug that nvidia fixed for the RTX 20 series and newer cards but never fixed for pascal lol.
But even on BeamNG, there was a period were the native linux version didn’t work on mesa while it worked for nvidia, now to be fair with amd this was because the vulkan implementation of beamng is horrible and right now it does not work on either lol.
Polaris GPUs had very weak video encoders, I also had an RX 580 and had issues on Linux as well as Windows. To my knowledge the AMF encoders worked better for those, but I could never get them working with OBS
Oh I did try to use the AMF driver, my first attempt ended with i3 crashing upon startup. What was worse is that even after removing those drivers and putting mesa back it still crashed on startup, good thing I had a btrfs snapshot before messing with that.
My second attempt I was able to use the AMF on OBS, but it still failed to record the 3 displays.
My biggest issue right now is the issue with the colors, I don’t care if I have to use the cpu to record at this point.
I’ve had similar issues. I don’t understand the love for AMD. My whole rig is AMD, but it’s constantly having GPU crashes. All games run at high FPS and my CPU temps seem nominal. But the games will crash. Everything from RimWorld to Baldurs Gate 3. They all run pinned at 60fps but randomly crash. I’ve tried a thousand different configurations and drivers. I’ve tried Ubuntu and Linux Mint. I’m now just accepting that I can’t rely on it as a gaming rig. I like that AMD is trying to be progressive with open source drivers but the quality doesn’t seem to be there. My next rig might be Nvidia and Intel. But we will see.
My issue was the GPU fan and the PSU fan would blow into each other. I opened the PSU and reversed the fan
Hah, I would not expect that to kill it. Maybe a small build. The other day I was switching the cards and realized my CPU fan and case fan were both disconnected, idk how the hell it was running without overheating… except I always have the side of the case off because the 3080 will shut me down otherwise.
What does that mean? Genuinely don’t know what it means that it runs Wayland.
Did you check the system logs to see what caused it?
Many things can result in seemingliy random crashes. Any overclock (including XMP and Expo) or undervolt or even a bios version can be problematic.
I would check first if it’s stable on windows.
It’s not stable on Windows either. But I haven’t looked at logs because I didn’t really know what - or how - to check.
Most distros use systemd and its logging solution: journald. You can use journalctl to read the logs around the time of the crash for e.g.:
journalctl -S -5m
this shows the last 5 minutes. Use this when a game crashes but the system continues working and did not reboot.journalctl -b -1 -S -10m
this shows the last 10 minutes from the previous boot. Use this if the crash froze the whole system and rebooted.
Look for red lines (errors) and what wrote them. AMD GPU faults usually have the ‘amdgpu’ mentioned, memory errors could appear as ‘protection fault’.
Sorry to hear that. For what it’s worth, I’ve had no problems with integrated AMD graphics, so maybe it’s a PCIe issue?
Hmm, interesting idea. I need to investigate that. The dmesg output is full of amdgpu irq errors, but of course that could also happen with an issue on the board.
I would rule out a generic hardware issue, since 1) I get graphics during boot up until it needs to do a modeswitch (I guess) and b) it works fine so far on Windows.
I did have a similar issue after the first boot on Windows as well and assumed so far that the modeswitch after the initial driver install caused the problem. But Windows likely also installed chipset drivers at that time, so PCIe could be a possibility. Then again… I know that Windows reloads graphics drivers on-the-fly… but chipset drivers? Probably not. Which would speak against that theory.