[gpfsug-discuss] node lockups in gpfs > 4.1.1.14
Aaron Knister
aaron.s.knister at nasa.gov
Fri Aug 4 09:00:35 BST 2017
Hey All,
Anyone seen any strange behavior running either 4.1.1.15 or 4.1.1.16?
We are mid upgrade to 4.1.1.16 from 4.1.1.14 and have seen some rather
disconcerting behavior. Specifically on some of the upgraded nodes GPFS
will seemingly deadlock on the entire node rendering it unusable. I
can't even get a session on the node (but I can trigger a crash dump via
a sysrq trigger).
Most blocked tasks are blocked are in cxiWaitEventWait at the top of
their call trace. That's probably not very helpful in of itself but I'm
curious if anyone else out there has run into this issue or if this is a
known bug.
(I'll open a PMR later today once I've gathered more diagnostic
information).
-Aaron
--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
More information about the gpfsug-discuss
mailing list