MirrorCache (mc) concept vs MirrorBrain (mb) concept¶

Scanning¶

mb uses daily full scan of all files on all mirrors, without regard of whether files were ever downloaded or changed recently. This may be waste of resources. mc scans only those locations which are needed by users and prioritize the scans according to popularity.
It may take up to 24 hours before mb notices that file has gone from a mirror. Clients will be incorrectly redirected and receive 404 error. mc is making sure that file exists on selected mirror and never redirects to locations where file was missing for a while.
New files are added by scanners in mb, which needs careful handling of deadlocks when two scanners attempt to add the same files. mc has separate jobs for keeping in sync information about files on the Main Server and for scanning mirrors. In addition, jobs have natural locking mechanism (https://docs.mojolicious.org/Minion#lock), which prevents conflicting operations.
New files are added by scanners in mb, which currently leads to many incorrect files in database. (Potentially overload of mb DB because of incorrectly configured mirror). (This may be fixed in cost of added coupling on mb architecture).

mb relies on command line tool and cron to run jobs. User then manually reviews logs to find eventual problems (which is a challenge to do with current amount of logging).
mc has a popular, mature, fast and asynchronous job queue: https://docs.mojolicious.org/Minion, which includes prioritization, rescheduling, custom job queues, scalability (adding workers), manageability (assigning workers to particular queues), dashboard, search, etc.

mb relies on rendering of static files generated by a cron job.
mc uses the mojolicious.org framework, which has delayed rendering, javascript, etc.

Architecture
- By design mb uses local files for rendering directories in WebUI and for scanner jobs to detect files present on mirror. It is a complication for using several geographical instances of mb, so it's possible to configure Apache plugins with some periodical sync jobs to store empty files instead of keeping full content of files.
- mc is fileless by design, so it should be easy to spawn a new mc instance, which will become useful immediately.
Since at the moment filename is not enough to identify the content of a file (see https://github.com/openSUSE/open-build-service/issues/6690), additional identification is required (e.g. filesize + mtime). This will complicate mb approach even more. (It will need to sync additional info besides empty files).

While mc may lead to eventual increased traffic to the main server, both mc and mb must rely on robustness of the main server.

Last update: January 14, 2022