Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amazon daily deals page not loading completely #21

Open
ShayanArifButt opened this issue Sep 5, 2017 · 9 comments
Open

Amazon daily deals page not loading completely #21

ShayanArifButt opened this issue Sep 5, 2017 · 9 comments
Assignees

Comments

@ShayanArifButt
Copy link

ShayanArifButt commented Sep 5, 2017

So here is the link for the daily deals amazon 1st page ( https://www.amazon.com/gp/goldbox ) . i am trying to load this with the php script. but it is loading only the first eight products , the rest 24 products are not loaded. After analyzing the daily deals page ,i realized the rest of products are loaded through ajax.

Anyway i can make the complete page load ? Here is the script i am using;

`ini_set('max_execution_time', 120);
require 'MTS/MTS/EnableMTS.php';

$windowObj = \MTS\Factories::getDevices()->getLocalHost()->getBrowser('phantomjs')->getNewWindow();

$agentName = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36 OPR/46.0.2597.57";
$windowObj->setUserAgent($agentName);

$myUrl = "https://www.amazon.com/gp/goldbox/";
$windowObj->setUrl($myUrl);

// tried to save page to see if all page is loaded or not , seems like did not load
//$domData = $windowObj->getDom();
//file_put_contents("daily_deals.html", $domData);

//perform a screenshot:
$screenshotData = $windowObj->screenshot();

//the image is just showing some portion of the screen, how can i capture complete page ?
echo <img src="data:image/png;base64,' . base64_encode($screenshotData) . '" />;

@ShayanArifButt
Copy link
Author

ShayanArifButt commented Sep 5, 2017

So i looked at an old issue where you suggested a timeout script and implemented it , but now the after loading for about 30 second the script is giving error, here is my updated code ( basically i am giving the id of last product in $selector , to wait for it to load ) ;

`
//Some websites are either far away or just slow, so it is a good idea to up the allowed execution time.
ini_set('max_execution_time', 120);
require 'MTS/MTS/EnableMTS.php';

$windowObj = \MTS\Factories::getDevices()->getLocalHost()->getBrowser('phantomjs')->getNewWindow();

$agentName = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36 OPR/46.0.2597.57";
$windowObj->setUserAgent($agentName);

$myUrl = "https://www.amazon.com/gp/goldbox/";
$windowObj->setUrl($myUrl);

$timeout = 30; //in seconds
$selector = "[id=100_dealView_31]";

$tTime = time() + $timeout;
$pageReady = null;
while($pageReady === null) {

$sExist     = $windowObj->getSelectorExists($selector);
if ($sExist === true) {
    $pageReady  = true;
} elseif (time() > $tTime) {
    //failed to get ready before timeout
    $pageReady  = false;
}

}

if ($pageReady === false) {
//the selector did not load in time. Exception or other logic to handle this condition...
throw new \Exception("Selector: " . $selector . ", did not load in: " . $timeout . " seconds");
} else {

// tried to save page to see if all page is loaded or not , seems like did not load
$domData	= $windowObj->getDom();
file_put_contents("daily_deals.html", $domData);

//perform a screenshot:
$screenshotData	= $windowObj->screenshot();

//the image is just showing some portion of the  screen, how can i capture complete page ? 
echo '<img src="data:image/png;base64,' . base64_encode($screenshotData) . '" />';		

}`

and this is the message which came after script ended;

Fatal error: Uncaught exception 'Exception' with message 'MTS\Common\Devices\Browsers\PhantomJS::getSelectorExists>> Got result code: 0, EMsg: Failed to get selector exists. Error: Invalid Return: null, ECode: 0' in E:\xampp\htdocs\new_job\mts_browser_1\MTS\MTS\Common\Devices\Browsers\PhantomJS.php:227 Stack trace: #0 E:\xampp\htdocs\new_job\mts_browser_1\MTS\MTS\Common\Devices\Browsers\Window.php(150): MTS\Common\Devices\Browsers\PhantomJS->getSelectorExists(Object(MTS\Common\Devices\Browsers\Window), '[id=100_dealVie...') #1 E:\xampp\htdocs\new_job\mts_browser_1\mts_daily_deals.php(24): MTS\Common\Devices\Browsers\Window->getSelectorExists('[id=100_dealVie...') #2 {main} thrown in E:\xampp\htdocs\new_job\mts_browser_1\MTS\MTS\Common\Devices\Browsers\PhantomJS.php on line 227

@merlinthemagic
Copy link
Owner

merlinthemagic commented Sep 6, 2017

Since the DOM is extended via AJAX you will need to trigger the call that extends the page. The easiest is likely to simply scroll down the page like so:

//scroll down the page 500px
$top		= 500;
$left		= 0;
$windowObj->setScrollPosition($top, $left);

To ensure you get the entire page, find an element that only shows up once there is no more dynamic content to load. Then loop over the scroll function and test for the presence of the element you seek.
You can test for the presence of the element with:

$selector		= "[id=someElementId]";
$exists		= $windowObj->getSelectorExists($selector);
//true if exists, else false

Furthermore to screen shot the entire page, you will need to scroll all the way down. Then size the browser to the size of the document and issue the screenshot.

//get size of the document after your scrool loop is complete:

$docDetails	= $windowObj->getDocument();

$width	= $docDetails["document"]["width"];
$height	= $docDetails["document"]["height"];
$windowObj->setSize($width, $height);

//perform a screenshot:
$screenshotData	= $windowObj->screenshot();

//very large image...
echo '<img src="data:image/png;base64,' . base64_encode($screenshotData) . '" />';	

@merlinthemagic merlinthemagic self-assigned this Sep 6, 2017
@ShayanArifButt
Copy link
Author

ShayanArifButt commented Sep 27, 2017

@merlinthemagic
So i am trying to screenshot this complete page( https://www.amazon.com/gp/offer-listing/B01BAFWRFO ), but it does not screenshots complete page,. Sometimes it just keeps loading for a long time and then times out

here is my code;

`
$myUrl = "https://www.amazon.com/gp/offer-listing/B01BAFWRFO";
$browserObj = \MTS\Factories::getDevices()->getLocalHost()->getBrowser('phantomjs');

$milliSecs = 60000;
//on the BROWSER OBJ!
$browserObj->setDefaultExecutionTime($milliSecs);

//$browserObj->setKeepalive(true);
$windowObj = $browserObj->getNewWindow();

$agentName = $_SERVER['HTTP_USER_AGENT'];
$windowObj->setUserAgent($agentName);
$windowObj->setUrl($myUrl);

$selector = "[id=navFooter]";
$exists = $windowObj->getSelectorExists($selector);

while( !$exists ){

//scroll down the page 500px
$top		= 500;
$left		= 0;
$windowObj->setScrollPosition($top, $left);

    $selector		= "[id=navFooter]";
    $exists		= $windowObj->getSelectorExists($selector);

}

//get size of the document after your scrool loop is complete:

$docDetails = $windowObj->getDocument();

$width = $docDetails["document"]["width"];
$height = $docDetails["document"]["height"];
$windowObj->setSize($width, $height);

//perform a screenshot:
$screenshotData = $windowObj->screenshot();

//very large image...
echo '';`

@merlinthemagic
Copy link
Owner

Hi,

You have two problems.

First you are setting the scroll position to 500px again and again, you need to increment it.

Second imagine how fast the while loop executes compared to how fast the AJAX content is served. You will need to wait a tiny bit to make sure the content is loaded before scrolling again.

@ShayanArifButt
Copy link
Author

So for the link in above comment ,, its not even an ajax issue , because i save the DOM as an html file to see the complete page and all the page was saved , but when i screenshot , not all the page is displaying

`//get the HTML of the current page:
$domData = $windowObj->getDom();

//save the window object so we can pick it up again
file_put_contents('page.html', $domData);`

@ShayanArifButt
Copy link
Author

ShayanArifButt commented Sep 27, 2017

@merlinthemagic

so now i have tried like this , with incrementing $top , still the same, not complete screenshot:

`$selector = "[id=navFooter]";
$exists = $windowObj->getSelectorExists($selector);

$top = 500;
$left = 0;

while(!$exists){

//scroll down the page 500px

$windowObj->setScrollPosition($top, $left);

$selector		= "[id=navFooter]";
$exists		= $windowObj->getSelectorExists($selector);

$top = $top + 500;

}`

@merlinthemagic
Copy link
Owner

What is the resolution of the image you receive at the end?

Also please post a var_dump of $docDetails.

@ShayanArifButt
Copy link
Author

ShayanArifButt commented Sep 27, 2017

@merlinthemagic

so for the above link ( https://www.amazon.com/gp/offer-listing/B01BAFWRFO/ref=dp_olp_new_mbc?ie=UTF8&condition=new ) , i fixed it by giving fixed height and width , so the code is simply the one in documentation:

`$top = 0;
$left = 0;
$width = 2300;
$height = 2400;
$windowObj->setRasterSize($top, $left, $width, $height);

//perform a screenshot:
$screenshotData = $windowObj->screenshot();

//very large image...
echo <img src="data:image/png;base64,' . base64_encode($screenshotData) . />;

@ShayanArifButt
Copy link
Author

ShayanArifButt commented Sep 27, 2017

@merlinthemagic

these are parameters i got from echoing previous code;
array(3) { ["body"]=> array(6) { ["clientHeight"]=> int(2606) ["offsetHeight"]=> int(2606) ["scrollHeight"]=> int(2606) ["clientWidth"]=> int(1920) ["offsetWidth"]=> int(1920) ["scrollWidth"]=> int(1920) } ["documentElement"]=> array(4) { ["clientHeight"]=> int(1080) ["scrollHeight"]=> int(2606) ["clientWidth"]=> int(1920) ["scrollWidth"]=> int(1920) } ["document"]=> array(2) { ["width"]=> int(1920) ["height"]=> int(2606) } } width: 1920 height: 2606
the code is echoed was;

`$selector = "[id=navFooter]";
$exists = $windowObj->getSelectorExists($selector);

$top = 500;
$left = 0;

while(!$exists){

//scroll down the page 500px

$windowObj->setScrollPosition($top, $left);

$selector		= "[id=navFooter]";
$exists		= $windowObj->getSelectorExists($selector);

$top = $top + 500;

}

//get size of the document after your scrool loop is complete:

$docDetails = $windowObj->getDocument();

var_dump($docDetails);

$width = $docDetails["document"]["width"];
$height = $docDetails["document"]["height"];

echo "
width: {$width}";
echo "
height: {$height}";

$windowObj->setSize($width, $height);

//perform a screenshot:
$screenshotData = $windowObj->screenshot();

//very large image...
echo <img src=data:image/png;base64,' . base64_encode($screenshotData) . />';`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants