Cron crawler in 7.x

I’m new to Drupal and have just developed a small site on Drupal 7. I am seeing very sluggish response times from my shared hosting provider, so I started playing around with cacheing options. Boost is by far the best option for this kind of site which sees very few hits at the moment. Response times went from 6+ seconds to 200ms, which is wonderful! The issue is that this obviously is only the case when cache files exist, and at the moment they only seem to get generated when a page is requested by a visitor. I know version 6 of Boost included a crawler to generate cache files, but it seems this component hasn’t made it to the 7 dev release just yet? Or have I configured something incorrectly?

Solution. 

hook_cron_queue_info

/**
* Implementation of hook_cron_queue_info()
*/
function hook_cron_queue_info() {
	$queues['runner'] = array(
	'worker callback' => 'runner_run', // This is the callback function for each queue item.
	'time' => variable_get('boost_crawl_queue_seconds',30), // This is the max run time per cron run in seconds.
	);
	return $queues;
}


hook_cron()

/**
* Implementation of hook_cron()
*/
function hookn_cron() {

	$queue = DrupalQueue::get('runner');

	$nids = db_select('node', 'n')
	->fields('n', array('nid'))
	->condition('status', 1)
	->execute();

	//cron ignore when load front page node.
	$frontPage=variable_get('site_frontpage', $default = NULL);
	$frontnid=str_replace("node/", "", $frontPage);

	foreach ($nids as $count => $nid) {
		if($nid->nid==$frontnid){
			// ignor front page content
			// Hit Frontpage.
			$url = url('<front>', array('absolute' => FALSE));
			$queue->createItem($url);

			watchdog('boost', 'Front page cache %nid .', array('%nid' => $frontnid), WATCHDOG_NOTICE);
		}else{
			$url = url('node/' . $nid->nid, array('absolute' => FALSE));
			$queue->createItem($url);
			$count++;
		}
	}

	//cache tag pages
	$termarr=taxonomy_get_tree('1', $parent = 0, $max_depth = NULL, $load_entities = FALSE);
	foreach ($termarr as $value) {
		$url = url('taxonomy/term/'.$value->tid,array('absolute' => FALSE));
		httprl_request($url);
		$request = httprl_send_request();
	}

	watchdog('boost', 'Make cache no. of node %count .', array('%count' => $count), WATCHDOG_NOTICE);
}


runner_run()

/**
* Worker Callback for the runner cron queue.
*/
function runner_run($url) {
	drupal_set_message($url);
	httprl_request($url);
	$request = httprl_send_request();

	boost_log('Crawler fetched !url', array('!url' => $url), WATCHDOG_DEBUG);
}

2 comments

  1. rdaniel · · Reply

    Hi, were do you put this code. This code is for a module ?.
    I’m very very newbie in drupal, please let me know.

    1. Yes rdaniel.. This code for a module. Once you run drupal cron this function will work too.

Leave a comment