Quality and stability. How the FONDY platform maintains and ensures a high level of service
What is Fondy and what is this article about?
Fondy is a cloud provider of payment technologies. It offers Visa and Mastercard payment processing services and white label solutions in Europe. Let’s break this down:
- Cloud — This means you don’t need to have your own servers in order to take card payments through your site. We have all the necessary software on our data centres.
- Provider of payment technologies — This includes both classic Internet Acquiring via bank and more complex financial and technological solutions such as an accounting centre, e-wallets, P2P transfers and payments for services.
- White label — A unique solution for clients who want to offer the same range of services as Fondy but under their own brand and domain name.
Our fundamental position is to be open and honest with our clients. We want them to know both the benefits and risks of using our service as an instrument to developing their own businesses. That’s why we want to shed some light on the internal business processes we have in place which ensure the accessibility, stability and quality of our service.
In our public Service Level Agreement, we declare an uptime to clients of our payment gateway at the level of 99.95%. This means:
Total downtime (planned and unplanned) should not exceed:
- 43.2s per day
- 5 minutes 2.4s per week
- 21 minutes 54.9s per month
- 4 hours 22m 58.5s per year
Payments rejected due to a technical fault should not exceed:
- 5 per 10,000
For 99% of all payments, additional delay owing to our system at the time of payment transfer from the client to the processing service centre or acquiring bank will not exceed 0.5s and in 99.95% of cases will not exceed 3s.
We should say straight away at this point that these requirements extend only to that part of the infrastructure to which we are in a position to ensure access. They do not extend to the bank payment gateways, payment systems, communications channels and various other services and facilities which fall outside of our data centre and over which we do not have direct control.
In order to achieve and consistently support such a high level of server uptime, we have done a great deal of work to improve the development, testing and installation of updates to the system and to the monitoring of their implementation. In this article, we will relate in more detail how this was achieved and by what means.
When it comes to development, we adhere to the practice of continuous integration. This allows us to efficiently produce improvements to system nodes in response to clients’ needs and put updates into the production system on a daily basis, if not several times per day. New changes, introduced by a developer on the basis of technical specifications, undergo code review and a process of automatic testing and assembly. This means the time from development completion to changes being put into the production system can be just a matter of minutes. Such effective development processes place us in a position where we are able to respond to any innovation or experiment. This is extremely important for our clients as many of them are businesses experiencing dynamic growth and which need both the financial and technological flexibility which can give them the edge in highly competitive markets.
Notwithstanding our constant changes to the code, our applications are highly durable. The stability of our system, though it is subject to frequent updates, is achieved through a large amount of automatic tests, which are run before the release of every update. Not a single update is put into the production system prior to having been green-lit in testing.
Any update entered into the production system is first developed on one of our production servers which handles a small amount of payments traffic, typically not more than 1% of the overall number. Furthermore, following installation, our monitoring and development department, which is responsible for changes to the code, actively monitors for the presence of errors. Any errors are automatically logged by the application in the automated error-tracking system Sentry. If errors are found the update is immediately rolled back and sent for investigation and fixing.
If an error which makes it into the production concerns a functionality not covered by existing tests, our quality assurance (QA) department develops a new automated test.
Automated tests are a fundamental stage of checking if updates are fit for use. Apart from unit tests, integrated tests also play an important role. It is worth pausing here for a moment to understand them in more detail.
Integrated testing is a method for verifying the correctness of a system’s interaction with other internal and external systems and services.
Our QA department has developed a huge number of integrated test scenarios which are checked ahead of every update of the production system. For example, the most critical part of the system, the payment gateway, is covered by several hundred files of tests, each of which contains more than 3,000 automated tests. Therefore, the number of checks during one regression testing reaches 80,000. Regression testing of this kind acts as a check of both the internal workability of the entire API gateway and of its integration with various external systems such as:
- Processing centres
- Acquiring banks
- International and local payments systems
- Other payment services (of which more than 50 have been implemented into the Fondy system)
If an external system does not have a stable testing environment, then in order to test for integration with Fondy, our developers use a so-called mock object or ‘dummy’ which imitates the response of the external system in a positive and negative scenario.
Regression includes a check of all payment functions such as:
- Purchases made via a browser
- Purchases made via host-to-host request for PCI DSS merchants
- Buttons and widgets for receiving routine payments and donations
- Regular payments
- Payment reversals
- Checking payment statuses
- P2P transfers
- Card verification
- iOS, Android PHP SDK
- Advance authorization and payment completion
- Various other functions, as described in our public API specifications
For example, tests which seek to imitate the interaction between a customer using a browser and our payment page can be reproduced in all of the most common browser versions. For Internet Explorer, this is from IE8 to IE 11 inclusive. More than 80,000 tests are carried out in the space of approximately 5 minutes on a server with 32GB of RAM and an 8-core processor. Carrying out the tests at such as speed on a server of average specification is made possible owing to a configuration devised by our company, which allows tests to be launched in the most optimal way in parallel across all processor cores. If the tests were launched one after another in a single stream, then one batch of regression tests would take more than an hour. In order to achieve multi-stream testing and the loading of each CPU we use robot framework, which supports the launch of tests in several streams. In order to speed up the tests, we also simultaneously test several development branches in parallel using docker, a platform for automated application visualisation. With the help of docker, we roll out several versions of the payment gateway at once, work on which is conducted and completed by a number of different developers.
The results of each batch of tests are made available to all interested collaborators via Telegram messenger. This allows for the efficient correction of failed tests. Messages contain detailed information on all tests including screenshots, an error log and error descriptions for developers.
Don’t forget, it is only via Telegram that the main notification is made of system events and incidents, about which more follows below.
For monitoring system performance we use both popular platforms, like Zabbix and Sentry, as well as a Business Intelligence (BI) system of our own design. Our BI system allows operators to track the efficiency of the payment gateway and, depending on the type of incident recorded, produces notifications. It is essential we know about incidents in accordance with our incident escalation matrix.
The ‘merchant monitoring’ section of the BI system displays the number of rejections, successful payments, errors, payment processing speed and conversion rate at the biggest merchants and sends a notification in the event of any irregularities.
In the same way, we have a ‘protocol monitoring’ section where the performance of active integrations with external systems is monitored. In the event one of them fails, an operator contacts the support team of the system in question in order to either notify them of the problem or activate the cascade processing function.
Cascade processing is one of the most important functions of our monitoring system. It allows a merchant to switch to the protocol of a reserve acquiring bank in the event of disruption or failure of their primary acquiring bank. The switch can be made automatically or manually by a designated member of staff.
We hope this article has been useful to you and helped you to see Fondy as an open, technologically advanced company which is concerned above all with the quality of its service and with fulfilling its responsibilities to its clients.