[gpfsug-discuss] AFM Recovery of SW cache does a full scan of home - is this to be expected?
Billich Heinrich Rainer (ID SD)
heinrich.billich at id.ethz.ch
Wed Jan 8 17:02:18 GMT 2020
Hello,
still new to AFM, so some basic question on how Recovery works for a SW cache:
we have an AFM SW cache in recovery mode – recovery first did run policies on the cache cluster, but now I see a ‘tcpcachescan’ process on cache slowly scanning home via nfs. Single host, single process, no parallelism as far as I can see, but I may be wrong. This scan of home on a cache afmgateway takes very long while further updates on cache queue up. Home has about 100M files. After 8hours I see about 70M entries in the file /var/mmfs/afm/…/recovery/homelist, i.e. we get about 2500 lines/s. (We may have very many changes on cache due to some recursive ACL operations, but I’m not sure.)
So I expect that 12hours pass to buildup filelists before recovery starts to update home. I see some risk: In this time new changes pile up on cache. Memory may become an issue? Cache may fill up and we can’t evict?
I wonder
* Is this to be expected and normal behavior? What to do about it?
* Will every reboot of a gateway node trigger a recovery of all afm filesets and a full scan of home? This would make normal rolling updates very unpractical, or is there some better way?
Home is a gpfs cluster, hence we easily could produce the needed filelist on home with a policyscan in a few minutes.
Thank you, I will welcome and clarification, advice or comments.
Kind regards,
Heiner
.
--
=======================
Heinrich Billich
ETH Zürich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200108/0a41b42d/attachment.htm>
More information about the gpfsug-discuss
mailing list