1
26 Oct 2015 02:00:14

Highloaded php scripts with realtime web client control

Lets imagine we have huge pool of data to process, e.g. 100 000 lines report & user needs to be able fetch result of this job by browser donwload interface.

Yes, there could be some solution like websocket realtime application but if you have PHP project & all data & API concern with it then there is solution to implement it. Across internet lot of solutions but generally they have lots of limitations.

So here we go. What we need to approach goal is Linux API, processes manager & IPC (inter process communication). First, lets think we have web form, it's in instance has ajax submit, on server side (php) we could receive request & do follow trick:

Open process (e.g. php proc_open/fork/exec), predefine needed descriptors (to pass data from ajax request, like SQL query filters & other form stuff). Linux fork API works with one interesting note - when child process is spawned it backs control to parent process (our ajax submit) & this script could now response to web server to avoid gateway timeout. When parent process finishes, Linux kernel change child task's parent to another process, usually it's /sbin/init - process №1. It means our process executes behalf on it's own & has CLI context with all needed configs like memory limits, php max time exec etc.

So at this point we have response from ajax submit with needed service information (generate by us on PHP side) & stand alone running process, that does useful stuff. OK, we could implement on JS side some polling facilities to check whether server had did job & in case of success fetch data & response it to client otherwise if there some error is occurred handle it.

Here we could use some advantages of IPC - like sockets, FIFO (named pipe), some third party soft like queues daemons that uses Linux IPC too or just regular filesystem files. To avoid complicates with explaining all this possibilities we will use file & SQL DB (most common way to do work).

When client polls server it checks on SQL side whether server finished job, handles timeout & error cases if any.

Once server finished work it puts plain data in file, mark SQL as ready & terminates. On other hand client sends ajax, catch ready flag & fetches file data, prints to CGI, adds headers if needed (Content-type, disposition etc). Browser receive data & by ajax sucess calback/js/html facilities provides needed interface.

Work is done. So to summarize we could process long jobs on behalf of web client, gets processing status & generated data.

 

Bellow I provide example of this stuff in context of Drupal, php & report generating stuff with download dialog on end of process, here is code:

 

**
 * Ajax callback to generate report without form for feed views icons.
 *
 * For comments about how it works @see pruban_reports_generate_background_report_submit.
 */
function pruban_reports_generate() {
  if (pruban_is_ajax()) {
    $get = $_POST;

    $form_build_id = $_POST['form_build_id'];

    _pruban_unset(
      array(
        'ajax_html_ids',
        'ajax_page_state',
        'form_token',
        'form_build_id',
        'form_id',
      ),
      $get
    );

    if (!empty($get)) {
      global $user;

      // Make unique file_path file name against user & report type & form build ID.
      $file_path = file_directory_temp() . "/REPORT_{$user->uid}_" . preg_replace('/\//', '_', $get['q']) . "_$form_build_id";
      $file_ready_flag = basename($file_path);
      $file_init_time = time();

      // Don't use descriptors.
      $descriptors = array();

      $url = url(preg_replace('/^\//', '', $get['q']) . '/csv', array('query' => $get));
      $url = preg_replace('/^\//', '', $url);
      $url = urlencode($url);

      $cmd = "/usr/bin/drush -r " . DRUPAL_ROOT . " pruban-report '$url' '$file_path' {$user->uid} '$file_ready_flag' $file_init_time &";

      // On linux when fork is invoked parent (this current script) continues work, finishes & response back to browser.
      // When this script (process) is terminated it's child (proc_open here) becomes child of Init (process 1) process
      // & continues do hard long work in CLI mode under drush. When spawned child finishes job it puts job result in
      // predefined file in temporary dir then sets flag (Drupal MYSQL variable) to notify client (ajax polling js)
      // target is ready. Then client sends ajax to fetch & release used file & sets properly header to allow client's
      // user to download result of this stuff.
      $process = proc_open($cmd, $descriptors, $pipes, NULL, NULL);

      if (!is_resource($process)) {
        drupal_set_message("Reports: Unable to create forked process to generate report", 'error');
      }
    }

    $commands[] = array('command' => 'prubanGenerateReport', 'data' => array(
      'file_path' => $file_path,
      'file_ready_flag' => $file_ready_flag,
      'file_init_time' => $file_init_time,
    ));

    return array('#type' => 'ajax', '#commands' => $commands);
  }
}
It's ajax submit that spawns child process as drush command, this command executes whole menu item & puts menu result to file. Here is callback:
 "Generates report in background mode to avoid nginx timeout.",
    'drupal dependencies' => array('pruban_reports'),
    'aliases' => array('pruban-report'),
  );

  return $items;
}

/**
 * Remove wrong imported nodes from database.
 */
function drush_pruban_reports_generate_report($url, $file_path, $uid, $file_ready_flag, $init_time = NULL) {
  // Be very careful if would change it - here super admin is log in. So we check for CLI mode, if user has SSH
  // he would able to generate this reports. Otherwise menu_execute_handler will trim view query
  // against hook_query_alters.
  if (drupal_is_cli()) {
    // Redefine FEEDS settings (248mb).
    ini_set("memory_limit","2048M");

    global $user;
    $user = user_load($uid);
    user_login_finalize();

    $components = parse_url(urldecode($url));

    parse_str($components['query'], $_GET);
    $_GET['q'] = $components['path'];

    $op = NULL;
    if (!empty($_GET['_triggering_element_name']) && $_GET['_triggering_element_name'] == 'op') {
      if (!empty($_GET['_triggering_element_value'])) {
        $op = drupal_strtolower($_GET['_triggering_element_value']);
      }
    }

    // Cut useless stuff - it breaks views result as exposed form is GET & always being submitted, so all this stuff
    // affect a way it will generate output.
    _pruban_unset(
      array(
        'ajax_html_ids',
        'ajax_page_state',
        'form_token',
        'form_build_id',
        'form_id',
      ),
      $_GET
    );

    $_GET['op'] = $op;

    // Disable delivery (to avoid wrappers & other useless stuff). We interested only in plain CSV here.
    ob_clean();
    ob_start();

    // Execute menu item & get generated result in buffer.
    menu_execute_active_handler($components['path'], FALSE);

    $report = ob_get_clean();

    // Check was disposition header was generated by menu item.
    $disposition = !empty($GLOBALS['report_disposition_header'])
      ? $GLOBALS['report_disposition_header'] . "\n"
      : '';

    // Store stuff in file & signal client job is done.
    file_put_contents($file_path, $disposition . $report);

    // Let client know done was done.
    pruban_reports_release(
      array(
        'uid' => $uid,
        'report' => arg(1),
        'file' => $file_ready_flag,
      )
    );
  }
  else {
    // Prevent access & exit with error code.
    echo "only CLI mode is allowed\n";
    exit(1);
  }
}
 

Basically it through passes POST data from ajax views filters to menu handler & executes it in origin (web client) environment. Once job is done it uses release callback to notify web client about result:


/**
 * Service callback to provide pseudo semaphores for long processed under web polling.
 */
function pruban_reports_is_released($key = NULL) {
  if (!empty($key)) {
    $semaphores = variable_get('pruban_reports_semaphores', array());

    if (isset($semaphores[$key['uid']][$key['report']][$key['file']])) {
      return $semaphores[$key['uid']][$key['report']][$key['file']];
    }

    return FALSE;
  }

  return NULL;
}

/**
 * Service callback to provide pseudo semaphores for long processed under web polling.
 */
function pruban_reports_release($key = NULL) {
  if (!empty($key)) {
    $semaphores = (array) (variable_get('pruban_reports_semaphores', array()));

    $semaphores[$key['uid']][$key['report']][$key['file']] = TRUE;

    variable_set('pruban_reports_semaphores', $semaphores);
  }
}
Here is ajax based polling server side menu callback that checks ready state:
 
<?php
/**
 * @file
 *
 * File used for dynamic linking pages.
 */

define('pruban_REPORTS_CLIENT_WAIT_LIMIT', 60 * 30);

/**
 * Tracks & fetches report generation.
 */
function pruban_reports_tracker() {
  global $user;

  if (!empty($_POST['fetch'])) {
    // Get content from FIFO. Another process have been generated report & awaits when we pick up data.
    $fd = fopen($_POST['path'], 'r');

    if (is_resource($fd)) {
      // Retrieve whole buffer.
      $report = stream_get_contents($fd);

      // Close file & delete FIFO.
      fclose($fd);

      @unlink($_POST['path']);

      // Clean report's semaphores data.
      $semaphores = (array) (variable_get('pruban_reports_semaphores', array()));
      if (isset($semaphores[$user->uid][$_POST['report']])) {
        unset($semaphores[$user->uid][$_POST['report']]);
        variable_set('pruban_reports_semaphores', $semaphores);
      }

      // Set header to force downloading generated CSV file.
      header('Content-type: text/csv');

      // Fetch header from plain text & cut it.
      if (preg_match('/^(Content-disposition.*?)\n/', $report, $matches)) {
        $report = preg_replace('/^Content-disposition.*?\n/', '', $report);
        header($matches[1]);
      }

      // jQuery file Download library is used, it requires cookie in order to be able determine success.
      $params = session_get_cookie_params();
      setcookie('fileDownload', 'true', REQUEST_TIME + $params['lifetime'], '/', $params['domain'], TRUE, FALSE);

      $response = $report;
    }
    else {
      header('Content-type: script/javascript');

      http_response_code(500);
      // jQuery fileDownload library checks by old & new form equal to determine fail
      // , change form to trigger fail callback.
      $response = "<script>jQuery('body form').attr('fail', true)</script>";
    }

    // Flush to ob buffer & exit.
    echo $response;
  }
  else {
    // Prevent infinite polling. Use limit constant to change it.
    if (empty($_POST['created']) || empty($_POST['flag']) || empty($_POST['report'])) {
      $response = drupal_json_encode(
        array(
          'error' => array('code' => 400),
          'done' => FALSE,
        )
      );
    }
    else if (time() - $_POST['created'] > pruban_REPORTS_CLIENT_WAIT_LIMIT) {
      $response = drupal_json_encode(
        array(
          'error' => array('code' => 408),
          'done' => FALSE,
        )
      );
    }
    else {
      $response = drupal_json_encode(
        array(
          'done' => pruban_reports_is_released(
            array(
              'uid' => $user->uid,
              'report' => $_POST['report'],
              'file' => $_POST['flag'],
            )
          ),
        )
      );
    }

    echo $response;
  }

  drupal_exit();
}

  Here is client js based polling facilities (fileDownload jQuery extension is used + jQuery ui & Drupal ajax api):

var pruban_reports_tracker_id = undefined;

(function ($) {
  $(document).ready(function(){
    Drupal.ajax.prototype.commands.prubanGenerateReport = function (ajax, response, status) {
      if (typeof response.data == 'undefined'
        || typeof response.data.file_path == 'undefined'
        || typeof response.data.file_init_time == 'undefined'
        || typeof response.data.file_ready_flag == 'undefined') {

        alert('Generating report error! Missing JS context');
        return;
      }

      var file_path = response.data.file_path;
      var file_init_time = response.data.file_init_time;
      var file_ready_flag = response.data.file_ready_flag

      if (typeof Drupal == 'undefined' || typeof Drupal.settings == 'undefined'
        || typeof Drupal.settings.pruban_reports == 'undefined'
        || typeof  Drupal.settings.pruban_reports.report == 'undefined') {

        $(".report-wait-dialog").dialog('option', 'title', 'ERROR 456');
        $(".report-wait-dialog").html(Drupal.t('Unrecoverable Error. Missing report context'));
        return;
      }

      var fetch = false;

      // Prevent set ups multiple intervals per one window.
      if (typeof pruban_reports_tracker_id != 'undefined')
        return;

      $('<div class="report-wait-dialog">' + Drupal.t('Please wait, the report in progress...') + '</div>')
        .dialog({
          close: function(event, ui) {
            clearInterval(pruban_reports_tracker_id);
            pruban_reports_tracker_id = undefined;
          }
        });

      pruban_reports_tracker_id = setInterval(function() {

        $.ajax(
          {
            url: "/pruban/reports-tracker?poll=true",
            type: "POST",
            data: {
              path : file_path,
              // Get uri without get params from js environment.
              report : Drupal.settings.pruban_reports.report,
              created : file_init_time,
              flag : file_ready_flag
            },
            success: function(data) {
              if (typeof data.done != 'undefined' && data.done == true) {
                // Stop polling.
                clearInterval(pruban_reports_tracker_id);
                pruban_reports_tracker_id = undefined;

                // Fetch done job as we know here it's already done.
                if (!fetch) {
                  fetch = true;

                  $.fileDownload(' /pruban/reports-tracker?fetch=true '' , {
                    cookieDomain:
                      Drupal.settings.prubanSite.cookieDomain,
                    prepareCallback: function(url) {
                      $( " .report-wait-dialog ")
                        .html(Drupal.t(
                          'Report is ready, fetching...'));
                    },
                    successCallback: function(url) {
                      $( " .report-wait-dialog").dialog('close');
                    },
                    failCallback : function(response, url) {
                      $( ".report-wait-dialog" ).dialog('option', 'title', 'ERROR 500');
                      $( ".report-wait-dialog" ).html(Drupal.t('Internal Server Error'));
                    },
                    httpMethod: "POST",
                    data: {
                      fetch : true,
                      path : file_path,
                      report : Drupal.settings.pruban_reports.report,
                      created : file_init_time,
                      flag : file_ready_flag
                    }
                  });
                }

              }
              // Treat errors.
              else if (typeof data.error != 'undefined' && typeof data.error.code != 'undefined') {
                // Stop polling.
                clearInterval(pruban_reports_tracker_id);
                pruban_reports_tracker_id = undefined;

                switch (data.error.code) {
                  case 400:
                    $( ".report-wait-dialog" ).dialog('option', 'title', 'ERROR 400');
                    $( ".report-wait-dialog" ).html(Drupal.t('Bad Request'));
                    break;

                  case 408:
                    $( ".report-wait-dialog" ).dialog('option', 'title', 'ERROR 408');
                    $( ".report-wait-dialog" ).html(Drupal.t('Request Timeout'));
                    break;

                  default:
                    $( ".report-wait-dialog" ).dialog('option', 'title', 'ERROR 456');
                    $( ".report-wait-dialog" ).html(Drupal.t('Unrecoverable Error'));

                }

              }
            },
            dataType: "json"
          }
        );
      }, 5000);
    }
  })
})(jQuery);

One notion about fileDownload library, first I was misunderstanding why my success/fail callbacks don't work, the reason in a way library does checks - to provide sucess you have to set cookie on server side (when SQL flag is ready too), if there is fail you have to change somehow form as library checks old & new state of generated iframe's form to check whether it failed. That's all. Hope it will be helpfull to solve dayli developer problems.

Comments:

add comment