Hi
Here it is a thing which I spent a few days on.
We had two Domino R11.0.1 FP3 in a cluster. Then we added a 3rd Domino server of the same version into the cluster and after that we started to see many errors like that:
08-08-2021 18:09:53 Database open error: <filepath.nsf>: Database is currently in use by you or another user
08-08-2021 18:09:53 Database open error: <filepath.nsf>: Database is currently in use by you or another user
08-08-2021 18:09:53 Database open error: <filepath.nsf>: Database is currently in use by you or another user
08-08-2021 18:09:53 Database open error: <filepath.nsf>: Database is currently in use by you or another user
Despite the errors in logs, all our applications worked OK, we didn't have any consequences of this error except that it was very hard to use log.nsf, because we could get ~5K such messages per day. The errors referred to a few different databases, but not to all databases we had. The errors were logged on all three Domino-servers.
I googled and found these two articles without a solution:
- https://www.ibm.com/support/pages/node/4026393
- https://ds_infolib.hcltechsw.com/ldd/nd85forum.nsf/4d33daaa03bb930385256a0700727b3b/d34b8d985abac1a28525788e006ac257?OpenDocument
Unfortunatelly, they couldn't help me either, they suggested some basic advices, like try to delete the database and then replicate it again, try to restart Cluster Replicator and so on.
So, I didn't have any choice except trying to resolve it myself.
After spending two days on that I discovered that the reason of the issues were DDM probes. We always used DDM probes for server monitoring and didn't change anything there for a long time, that's why it was so hard to catch that but after I disabled them - the error messages stopped to appear.
Now, let me tell you my explanation which is a pure theory but looks reasonable to me.
We rolled out 3rd Domino server because we planned to run a new Web-application which had to attract many new users in the Internet and we wanted to be sure that our Domino-environment would handle the increased workload - we were using external load balancer which had to route users to our Domino cluster and 3rd Domino had to help a lot.
After we added a 3rd Domino, we ran the application as well and we got more Web-users as we expected. So, basically, what happened:
- More users than before worked with just a few databases (kind of an entry point for the new Web-application);
- Three Cluster Replicators (instead of two) pushed data to replicas on other servers and they had to do it often than before;
- DDM worked without changes but it seemed the server tasks executing DDM probes needed to have an exclusive access to the database for a short period;
- Cluster Replicators worked intensively with a few databases and often failed to open the databases while DDM probes were running and therefore Cluster Replicator showed that message and kept the changes in memory until it would be possible to open the database later.
- disabling of probes disabled the messages
- enabling the same probes back returned the messages back as well
We have seen this error occasionally (every couple of months) for years now but we never found the root cause. I will look into this for our servers also and try to find out if the DDM-probes might be the root cause for us also.
ReplyDeleteIf it is not a big deal for you, can you please let me know if it helps?
DeleteHello Yuriy,
ReplyDeleteI am from HCL Technical Support. I was able to fix the error by disabling the DDM Probes for one of my customers. Thank you for your article that helped. I will further check on this to understand why cluster replicator task is causing the error. I will post here if I find anything fruitful. Thank you.
Great, happy I could help.
Delete