May 18, 2018
ProofMe engineers successfully completed initial testing on ProofMe.com last night. ProofMe behaved as expected during our testing efforts. We are applying one additional scaling optimization on our data cluster that will be processing over the weekend. Anticipating this will be a successful scaling optimization, we expect to bring customers onto a live ProofMe.com sometime Monday. Additional updates will be provided Monday morning. Once the system is back online, information regarding subscription credits or refunds for the disruption in service will be provided.
May 17, 2018
Our data cluster build process has resulted in a functional cluster. We are now taking steps to bring ProofMe.com online internally and will be conducting tests against the enhanced data cluster. We anticipate bringing customers back onto ProofMe between tomorrow and Monday. More specific timing will be shared via tomorrow's update.
May 16, 2018
Data cluster build operations are still in progress and nearing completion. We have successfully applied many of the scaling optimizations as planned. There are several steps remaining in the process to bring ProofMe back online. At present, we are still targeting a return to service this week. We thank our customers for your continued patience.
May 15, 2018
We continue working diligently towards the the restoration of ProofMe.com service. Data cluster build operations are currently in progress. ProofMe engineers are monitoring this multi-step process and fine-tuning performance efforts. Scaling optimizations will be applied once the initial cluster build is complete. We remain on-track to restore service this week. Progress and timeline updates will continue to be published until this issue has been resolved.
May 14, 2018
Over the weekend, ProofMe.com engineers completed the initial phase of data cluster optimizations utilizing new parameters. This is a multi-step process that will take several days to complete. We will continue to post updates informing of our progress and timelines. The productivity of our customers is of utmost importance to us, the restoration of ProofMe remains our top priority.
May 11, 2018
ProofMe engineers continue to make progress to bring ProofMe.com back online. By working to create an upgraded clean instance of the data cluster, we have made a number of improvements in the underlying data cluster configuration that will improve stability of the service after it is back online. Once the clean instance of the data cluster is online, it will be replicated to a second scaled-up cluster. This process could take up to several days due to the amount of data we are dealing with. We remain on-target to restore service for our customers next week. Information regarding subscription credits or refunds for the disruption in service will be provided once ProofMe.com is back online.
May 10, 2018
Our engineers are currently building a clean instance of the data cluster on an upgraded version of the database software. This effort involves configuration optimization and scaling of the underlying data cluster. We're targeting a return to service within the next week. We will continue to advise our customers on any impacts to this timeline. We appreciate your continued patience during this process as we work hard to get ProofMe back up and running for you.
May 8, 2018
On April 30, some ProofMe.com users began reporting intermittent service disruptions and inconsistent search results. These issues were caused by out-of-sync data indexes in ProofMe's distributed database cluster.
After an attempted repair of the out-of-sync indexes, the application entered a failure state and became unavailable for all users. No customer data has been lost. No sensitive data has been leaked or compromised in any way.
Service disruption is never completely avoidable, but this level of disruption is unacceptable. To ensure that this does not happen again, we're working on multiple improvements to application architecture and operational procedures. This page is designed to provide customers with up-to-date information on the current status of this outage and any future issues that may affect ProofMe users.
To all customers affected by the outage: we are sorry. We apologize for the hardship and disruption this has caused you and we take your feedback very seriously. We are working around the clock to rectify this issue as quickly as possible and will put a long-term solution in place to ensure reliability in the future.
- April 28 - April 30: Initial reports of intermittent service disruptions and inconsistent data retrieval results. Investigated and resolved issues.
- May 1: Attempted repair of search indexes resulting in ProofMe outage
- May 2 - May 3: Continued failed attempts to repair search indexes
- May 4 - May 7: Architect contingency plans, additional engineers added to issue
- May 8: Multiple contingency plans initiated
We anticipate a full restoration of ProofMe service. We have implemented an enhanced cluster architecture and are in the process of configuring and migrating data to the new cluster. Once operational, we will conduct testing and migrate live traffic onto the new cluster.
Future Architecture Improvements
The ProofMe team had previously put in place a migration plan that would involve moving ProofMe onto a modern and supported data store. Unfortunately, this level of migration takes time to complete and was not yet ready to use at the time of this incident. This plan is still in place and we are accelerating the implementation timeline for the plan in response to this incident.
Root Cause Analysis
We will be conducting a root cause analysis and posting our findings on status.proofme.com.