[gpfsug-discuss] node lockups in gpfs > 4.1.1.14
Aaron Knister
aaron.s.knister at nasa.gov
Fri Aug 4 16:02:04 BST 2017
I've narrowed the problem down to 4.1.1.16. We'll most likely be
downgrading to 4.1.1.15.
-Aaron
On 8/4/17 4:00 AM, Aaron Knister wrote:
> Hey All,
>
> Anyone seen any strange behavior running either 4.1.1.15 or 4.1.1.16?
>
> We are mid upgrade to 4.1.1.16 from 4.1.1.14 and have seen some rather
> disconcerting behavior. Specifically on some of the upgraded nodes GPFS
> will seemingly deadlock on the entire node rendering it unusable. I
> can't even get a session on the node (but I can trigger a crash dump via
> a sysrq trigger).
>
> Most blocked tasks are blocked are in cxiWaitEventWait at the top of
> their call trace. That's probably not very helpful in of itself but I'm
> curious if anyone else out there has run into this issue or if this is a
> known bug.
>
> (I'll open a PMR later today once I've gathered more diagnostic
> information).
>
> -Aaron
>
--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
More information about the gpfsug-discuss
mailing list