Scheduling Events

There are several situations in web application programming where it is necessary to schedule events to happen in the future, outside of the request driven model. Some of the most common are these:

  • Expiring static files from the webserver. Some data can be cleaned up whenever a page is requested. On occasion, though, the application establishes the contract that a file will stay around for a fixed period of time. When access to these files is provided by the webserver (not through the application itself) then the files need to be deleted at a given future moment.
  • Time-based notifications. For example, if you deal with dates and times in your web application it's sometimes necessary to actually notify users (most often, via email) at a given time. It's clearly not acceptable to wait until someone hits a page (possibly hours or days later) to issue these notifications.
  • Syndication. Polling data on remote servers has to be done ready for when a user hits a page, because otherwise it can introduce an unacceptable delay while variously contactable remote hosts are queried.

In several of my web applications now I've come to a sticking point when it comes to scheduling events. As far as I know this is always left up to the developer to arrange. Scheduling events is considered outside their remit.

There are a few solutions I know of.

  • The application can provide a script which the administrator must schedule to be run periodically at install time. Drupal, for example, recommends adding a crontab entry which periodically wgets a script on the web site. In redistributable apps, many users will obliviously skip this step and wonder why the application won't work.
  • Run scheduled tasks after serving each page. This approach doesn't solve the above problems. In mod_php/perl/python applications this hogs a webserver thread too, which could degrade performance.
  • There are websites like webcron.org that will fetch a script on your server at intervals. It would be madness to rely on this in your own applications or suggest this as a solution for a redistributable applications, so it's only suitable as a fallback if all else fails.
  • The application may be able to use to the system scheduler (cron/at on Posix, Windows Scheduler Service on Windows). While it should be possible for a PHP application to enqueue things into the webserver's user's crontab (as long as PHP isn't restricted to "safe mode"), I'm not sure that this is advisable. Most offline applications I know that need to schedule something spawn their own daemon to handle scheduled events, even if it sits idle most of the time.

I can't see why the frameworks shouldn't provide an API for scheduling tasks. This would have the advantage of being simple, integrated and portable, and it could negotiate to use the platform scheduler or fall back to spawning a daemon to dispatch events.

Comments

Comments powered by Disqus