PHP Classes

File: documentation.txt

Recommend this page to a friend!
  Classes of Jacek Lukasiewicz   Web scraper   documentation.txt   Download  
File: documentation.txt
Role: Documentation
Content type: text/plain
Description: documentation
Class: Web scraper
Extract information from Web site pages
Author: By
Last change:
Date: 12 years ago
Size: 1,178 bytes
 

Contents

Class file image Download
This class allows you to get data from any site. The data are taken from defined locations in the DOM structure. Data points are defined using the phpquery notation - similar to the selectors used in JQuery library. This class can fetch data in three different modes by: * scanning a single page * scanning a "from->to" range of pages matching defined URL schema * scanning a list of URLs retrieved from a PHP array EXAMPLE $scrap = new Scraper(); //set base url with token named ##TOKEN##. $scrap->setBaseUrl('http://your.site.ccm/path/to/details.html?id=##TOKEN##'); //Set the scan range for the token //##TOKEN## will be replaced by from the scope of id $scrap->addRangeScanRule(151598039, 151598042, '##TOKEN##'); //definition of points where data are $scrap->addDataTarget('name', '.headline .margin h1'); $scrap->addDataTarget('price', '#buyerpricegross'); $scrap->addDataTarget('image', '#imageWrapper #thumbnailoverlay a'); $data = $scrap->process(); //$data has array structure: array( array('name' => ...., array('price' => ...., array('image' => ...., ), .... .... ....