In this project, I created a robot that solves a complex booking task with many sub-tasks in process of its operation. It was working in a very unstable environment and showed good scaling (thousands of threads) characteristics required by business logic.
Background and requirements
The customer of this project contacted me on my friend’s recommendation, as before that he had been told in a number of IT companies that this task either cannot be solved or too difficult.
Customer's company provided a range of services, one of which was registration for their clients on a special site. The clients provide their personal information, select several options, and finally choose the date and the time from the available range on that site. The problem was that the date and time were available only for a few minutes per day so the main task was to develop software that automates the registration process.
Captcha
One of the biggest blocker was reCAPTCHA which user had to enter after each action to prove that he isn’t a robot. reCAPTCHA — is a service that protects sites from bots.
reCAPTCHA is a free service that protects your site from spam and abuse. It uses advanced risk analysis techniques to tell humans and bots apart.
Google recaptcha home page
reCAPTCHA — is a service that protects sites from bots
Solution
The work was done successfully. A software user (administrators) specifies customer’s data, the options that need to be selected, the appropriate dates, and these data is stored in the database. Then the whole process of registration should be done by the bot.
The program constantly monitors the system for the appearance of free dates. The available dates are checked every 5 seconds (the interval can be set by the administrator). If free dates appear, the bot will start to register the client whose data can be taken from the database.
Registration operations are performed in many processes, sometimes hundreds of streams at the same time. Each of the processes passes 5 Google Recaptcha v2. When the load is too high, the site fails and the bot has to start all over again.
When the free slots for registration have run out, the process which was the first one to detect that returns this information to the parent thread (from which it had been created). Then the parent thread, in its turn, deletes its child threads — registration processes. After that, the system returns to regular monitoring of free dates.
To summarize
The time spent on this project was full of non-standard tasks. This project is one of the most complex software solutions that I have ever developed.
Environment
multithreading, recaptcha solving, PHP, Pthreads, casperJS, phantomJS, JS, JQuery, mysql.
Historical screenshot, the first captcha that was passed by my bot.
It is the result of hard work and very few hours of sleep