[gpfsug-discuss] Again! Using IBM Spectrum Scale could lead to data loss

Aaron Knister aaron.knister at gmail.com
Tue Aug 22 15:37:06 BST 2017


Hi Jochen,

I share your concern about data loss bugs and I too have found it troubling
especially since the 4.2 stream is in my immediate future (although I would
have rather stayed on 4.1 due to my perception of stability/integrity
issues in 4.2). By and large 4.1 has been *extremely* stable for me.

While not directly related to the stability concerns, I'm curious as to why
your customer sites are requiring downtime to do the upgrades? While, of
course, individual servers need to be taken offline to update GPFS the
collective should be able to stay up. Perhaps your customer environments
just don't lend themselves to that.

It occurs to me that some of these bugs sound serious (and indeed I believe
this one is) I recently found myself jumping prematurely into an update for
the metanode filesize corruption bug that as it turns out that while very
scary sounding is not necessarily a particularly common bug (if I
understand correctly). Perhaps it would be helpful if IBM could clarify the
believed risk of these updates or give us some indication if the bugs fall
in the category of "drop everything and patch *now*" or "this is a
theoretically nasty bug but we've yet to see it in the wild". I could
imagine IBM legal wanting to avoid a situation where IBM indicates
something is low risk but someone hits it and it eats data. Although many
companies do this with security patches so perhaps it's a non-issue.

>From my perspective I don't think existing customers are being "forgotten".
I think IBM is pushing hard to help Spectrum Scale adapt to an
ever-changing world and I think these features are necessary and useful.
Perhaps Scale would benefit from more resources being dedicated to
QA/Testing which isn't a particularly sexy thing-- it doesn't result in any
new shiny features for customers (although "not eating your data" is a
feature I find really attractive).

Anyway, I hope IBM can find a way to minimize the frequency of these bugs.
Personally speaking, I'm pretty convinced, it's not for lack of capability
or dedication on the part of the great folks actually writing the code.

-Aaron

On Tue, Aug 22, 2017 at 7:09 AM, Zeller, Jochen <Jochen.Zeller at sva.de>
wrote:

> Dear community,
>
> this morning I started in a good mood, until I’ve checked my mailbox.
> Again a reported bug in Spectrum Scale that could lead to data loss. During
> the last year I was looking for a stable Scale version, and each time I’ve
> thought: “Yes, this one is stable and without serious data loss bugs” - a
> few day later, IBM announced a new APAR with possible data loss for this
> version.
>
> I am supporting many clients in central Europe. They store databases,
> backup data, life science data, video data, results of technical computing,
> do HPC on the file systems, etc. Some of them had to change their Scale
> version nearly monthly during the last year to prevent running in one of
> the serious data loss bugs in Scale. From my perspective, it was and is a
> shame to inform clients about new reported bugs right after the last
> update. From client perspective, it was and is a lot of work and planning
> to do to get a new downtime for updates. And their internal customers are
> not satisfied with those many downtimes of the clusters and applications.
>
> For me, it seems that Scale development is working on features for a
> specific project or client, to achieve special requirements. But they
> forgot the existing clients, using Scale for storing important data or
> running important workloads on it.
>
> To make us more visible, I’ve used the IBM recommended way to notify about
> mandatory enhancements, the less favored RFE:
>
>
> *http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=109334*
> <http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=109334>
>
> If you like, vote for more reliability in Scale.
>
> I hope this a good way to show development and responsible persons that we
> have trouble and are not satisfied with the quality of the releases.
>
>
> Regards,
>
> Jochen
>
>
>
>
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170822/6a88ef48/attachment-0002.htm>


More information about the gpfsug-discuss mailing list