Sunday, September 20, 2009

Build Spider/Crawler with PHP




PHPCrawl is a set of classes written in PHP for crawling/spidering websites, so just call it a webcrawler-library for PHP.



SOLUTION




The crawler "spiders" websites and delivers information about all found pages, links, files and so on to users of the library. By overriding a special method of the main-class users now decide what should happen to the pages and their content, files and other information the crawler finds.

PHPCrawl povides a lot of options to specify the behaviour of the crawler like URL- and Content-Type-filters, cookie-handling, limiter-options and much more.


Download: http://sourceforge.net/projects/phpcrawl/


PHP Example:

The following code is an complete example for using the class.
The listed script "crawls" a site and just prints out some information about found pages.

Please note that this example-script also comes in a file called "example.php" with the phpcrawl-package.

<?php

// It may take a whils to crawl a site ...
set_time_limit(10000);

// Inculde the phpcrawl-mainclass
include("classes/phpcrawler.class.php");

// Extend the class and override the handlePageData()-method
class MyCrawler extends PHPCrawler
{
function handlePageData(&$page_data)
{
// Here comes your code.
// Do whatever you want with the information given in the
// array $page_data about a page or file that the crawler actually found.
// See a complete list of elements the array will contain in the
// class-refenence.
// This is just a simple example.

// Print the URL of the actual requested page or file
echo "Page requested: ".$page_data["url"]."<br>";

// Print the first line of the header the server sent (HTTP-status)
echo "Status: ".strtok($page_data["header"], "\n")."<br>";

// Print the referer
echo "Referer-page: ".$page_data["referer_url"]."<br>";

// Print if the content was be recieved or not
if ($page_data["received"]==true)
echo "Content received: ".$page_data["bytes_received"]." bytes";
else
echo "Content not received";

// ...

// Now you should do something with the content of the actual
// received page or file ($page_data[source]), we skip it in this example

echo "<br><br>";
flush();
}
}

// Now, create an instance of the class, set the behaviour
// of the crawler (see class-reference for more methods)
// and start the crawling-process.

$crawler = &new MyCrawler();

// URL to crawl
$crawler->setURL("www.php.net");

// Only receive content of files with content-type "text/html"
// (regular expression, preg)
$crawler->addReceiveContentType("/text\/html/");

// Ignore links to pictures, dont even request pictures
// (preg_match)
$crawler->addNonFollowMatch("/.(jpg|gif|png)$/ i");

// Store and send cookie-data like a browser does
$crawler->setCookieHandling(true);

// Set the traffic-limit to 1 MB (in bytes,
// for testing we dont want to "suck" the whole site)
$crawler->setTrafficLimit(1000 * 1024);

// Thats enough, now here we go
$crawler->go();


// At the end, after the process is finished, we print a short
// report (see method getReport() for more information)

$report = $crawler->getReport();

echo "Summary:<br>";
if ($report["traffic_limit_reached"]==true)
echo "Traffic-limit reached <br>";

echo "Links followed: ".$report["links_followed"]."<br>";
echo "Files received: ".$report["files_received"]."<br>";
echo "Bytes received: ".$report["bytes_received"]."<br>";

?>


Characters Calculation of PDF Document with PHP




In this tutorial, We wil convert the document format also. We will convert pdf to text file
through php and then read its content to calculate the number of characters in the file.



SOLUTION




It is quite simple to calculate characters of a pdf document. To accomplish this task. I will
use pdf2html Linux Library.
Please download and install pdf2html library from http://sourceforge.net/projects/pdftohtml/


Code to execute pdf conversion and characters calculation.

Linux command execution to convert the pdf to text format.

'/usr/bin/pdftotext ' . $file_path; //File path must be the absolute server path.

PHP
shell_exec('/usr/bin/pdftotext ' . $file_path);


Complete code to upload a file to the processed folder in your root directory.

if(move_uploaded_file($_FILES[$filen]['tmp_name'],'processed/'.$_FILES[$filen]['name'])){

$file_name=$_FILES[$filen]['name'];
$file_path=$_SERVER['DOCUMENT_ROOT'].'/processed/'.$_FILES[$filen]['name'];


$file_name=str_replace('.pdf','.txt',$file_name);

$output=shell_exec('/usr/bin/pdftotext ' . $file_path);

sleep(2);
$handle = fopen($file_name, "r");
$contents = fread($handle, filesize($file_name));
fclose($handle);
$file_count = strlen(str_replace(' ','',$contents));



}

TroubleShooting

1. shell_exec function will not execute. If you don't have permission to run ssh commands
and also if your php is running in the safe mode.

2. This script will generate a text file with same name and directory where you have placed
the pdf file. So if the file isn't create in that directory and your program will work you
will able to track the file in the root directory. This means you have to correct your
file path.

3. Cannot count the calulation and upload the file. It is necessary to change the rights
of processed folder to 777.

If you have further questions about this post, kindly post your comments.


Define Constants in Codeigniter.




Defining contants in codeigniter is easy but according to my point of view, I don't like the procedure. But I have to provide solution to those who want to use constants.



SOLUTION




You can define the constants in the index.php present in the root directory of codeigniter. You can also include constant files in the index file.



Codeigniter has define some in its core constants in the index file.

define('EXT', '.'.pathinfo(__FILE__, PATHINFO_EXTENSION));

define('FCPATH', __FILE__);

define('SELF', pathinfo(__FILE__, PATHINFO_BASENAME));

define('BASEPATH', $system_folder.'/');

Tuesday, September 1, 2009

Magento 5 Common Errors and Solutions.


Five Common Errors in Magento. Tweak your application for better performance

INTRODUCTION



Magento is an Ecommerce platform built for open-source technology.It is build on zend frame work.While using MAGENTO you face very common errors for example installation errors or Magento cannot login to the admin etc. which may create problems in application. Here are five common errors and their solutions

Error 404 after installation of sample data (while home page is OK)

solution:

You will need to do one of two things:

  1. Refresh catalog rewrites - Admin > System > Cache > Refresh catalog rewrites. If this doesn’t work then >

  2. Turn off URL Rewrites - Admin > System > Web > SEO > NO


When you go the Magento Connect Manager in your Admin Panel, you may get the following error message:

Error: Please check for sufficient write file permissions

solution:-

Magento Connect requires write permissions to the Magento files in order to install new extensions or upgrade the software to a new version. You can change your file/folder permissions either thru your FTP Client (like FileZilla) OR thru a SSH client (like Putty).

Go inside your Magento folder & change permissions of ALL FILES & FOLDERS RECURSIVELY to "777".

Here's how to do this thru your SSH client. Log in to your SSH account and then execute the following commands:

cd
find . -type d -exec chmod 777 {} \;
chmod 666 downloader/config.ini

You should now be able to access the Magento Connect Manager. When you have finished the installation/upgrade, you should reset the permissions by executing the following SSH commands:

find . -type d -exec chmod 755 {} \;
find . -type f -exec chmod 644 {} \;
chmod o+w var media app/etc/use_cache.ser

Magento error — Notice: Undefined index: 0 in app/code/core/Mage/Core/Model/Mysql4/Config.php on line 92

Solution:-


This will most likely occur when migrating Magento from one host to another. The fix, while not so obvious and easy t find turned out to be quite easy ;) During the installation, Magento sets two IDs to 0 in its database. However, when importing the database dump to your new host, MySQL doesn’t like this and changes these ID’s to 2, which of course is not what Magento needs to load, thus the error. The fix is quite easy — set these IDs back to 0.

The IDs, we need are for ‘admin’ in the tables:

* core_store
* core_website

Just open phpMyAdmin and change 2 to 0, clear Magento’s cache by deleting the contents of the /var/cache folder and Voila!

Also, it will be a good idea to check the box for “Disable foreign key checks”, when exporting it in the firt place, as you will surely receive an error for this when importing ;)

Mgento commerce: object not found:-

solution:-

If you just installed Magento and receive an “Object Not Found” 404 error with ever link you click, it’s because mod_rewrite is not enabled on your server. Tell your hosting provider to turn it on if you’re on shared hosting. If you’re using XAMPP,

By default mod_rewrite module is not enabled in apache with XAMPP.

To enable mod_rewrite in xampp first go to the directory of installation \apache\conf and edit httpd.conf. Find the line that contains
#LoadModule rewrite_module modules/mod_rewrite.so
uncomment this(should be):

LoadModule rewrite_module modules/mod_rewrite.so

Also find AllowOverride None

Should be:

AllowOverride All

AllowOverride appears 2 or 3 times on the configuration file. Change all of them.

Magento cannot Login to the admin:-

solution:-

The solution is to add the following line to your hosts file so you can access your localhost as www.localhost.com.

C:Windows\System32\drivers\etc\hosts (edit this file in notepad)

127.0.0.1 magento.localhost.com www.localhost.com

Also worth mentioning, if you need to reset a password manually in the database use the following:

UPDATE admin_user SET password=MD5(‘mypassword’) WHERE username=‘admin’;

If none of the above works, install the browser Opera and use it. Opera seems to work out of the box whereas IE, Firefox, and Google Chrome do not.

All rights are reserved. Nobleatom.com
Software Development Services.
Contact me: khubabmazhar596@hotmail.com

Web Design Increase Page Rank Internet blogs DigNow.net web directory1Abc DirectorySeo friendly web directory