I’ve got a weird problem and looking for possible pointers.
On at least one of our servers, kernel 5.10.0-0.deb10.16-amd64
boots without a problem. Bat as we don’t want to rely on an “ancient” kernel build for Debian Buster, we also tried various later ones but they all fail to start in the same way. Taking for example 6.1.0-11-amd64
from Debian Bookworm, this one would boot fine from local disk, but the very same one loaded via DHCP/PXE/TFTP would load the kernel and initrd seemingly fine but then only print
early console in setup code
Probing EDD (edd=off to disable)... ok
and then hang, i.e. the newly loaded kernel does not even start. Kernel command line options include already
debug loglevel=7 ro console=ttyS1,115200n8 earlyprintk=serial,ttyS1,115200n8 console=tty0
and I don’t get any more info from the system, neither via serial port nor at the console.
Anyone with pointers?
Edit: edd=off
results in the very same except the corresponding line missing from output
In the FTPD logs, do you see the initrd file being pulled? Could it be a mismatch between the kernel and initrd you’re serving?
Thanks for the hint, but no, no mismatch and yes the files are being pulled (even looked with tshark
if everything comes over properly).
The solution was then much more benign. The stock kernel in the NFSroot where the initrd was produced in was much smaller than the one from the bootable system. This lead into why this was the case and it was missing about 200 non-free drivers which somehow made the kernel stop right before really starting off.
Adding those to the NFSroot and then into the initrd solved the problem sigh
Wasted way too much time there and I still have no idea why 5.10 booted ok the whole time.