羊皮纸论坛目录 / 程式设计 / 网页/网站程式 / PHP /

PHP 判断网址是否正确或网页是否存在

发表新主题
随机主题
上个主题
下个主题
|
PHP 判断网址是否正确或网页是否存在

在 PHP 中有时侯我们需要判断某些网页或是网址是否存在,例如有效的减少失效连结。

最简单的方法就是 fopen / file_get_contents .. 等等有很多种方法,不过这些方法都会把整页 HTML 读取回来,若只是要判断网址是否失效来说,速度就显得有些缓慢!要判断可以由 HTTP HEADER 来判断,就不用把整页的内容都抓回来!

可以用  get_headers() 得到这些资讯∶

HTTP/1.1 200 OK
Date: Mon, 06 Oct 2008 15:45:27 GMT
Server: Apache/2.2.9
X-Powered-By: PHP/5.2.6-4
Set-Cookie: PHPSESSID=4e037868a4619d6b4d8c52d0d5c59035; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Vary: Accept-Encoding
Connection: close
Content-Type: text/html

PHP + Curl + Content-Type 的判断方式∶

FUNCTION existsWebpage($url){
$parts = parse_url($url);
IF(!$parts){return false;} /* the URL was seriously wrong */
IF(isset($parts['user'])){return false;} /* user@gmail.com */
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);

/* set the user agent - might help, doesn't hurt */
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (compatible; wowTreebot/1.0; +http://wowtree.com)');
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);

/* try to follow redirects */
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

/* timeout after the specified number of seconds. assuming that this script runs on a server, 20 seconds should be plenty of time to verify a valid URL. */
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 15);
curl_setopt($ch, CURLOPT_TIMEOUT, 20);

/* don't download the page, just the header (much faster in this case) */
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_HEADER, true);

/* handle HTTPS links */
IF($parts['scheme'] == 'https'){
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
}

$response = curl_exec($ch);
curl_close($ch);

/* allow content-type list */
$content_type = false;
IF(preg_match('/Content-Type: (.+\/.+?)/i', $response, $matches)){
switch ($matches[1]){
case 'application/atom+xml':
case 'application/rdf+xml':
case 'application/xhtml+xml':
case 'application/xml':
case 'application/xml-dtd':
case 'application/xml-external-parsed-entity':
$content_type = true;
break;
}

IF(!$content_type && (preg_match('/text\/.*/', $matches[1]) || preg_match('/image\/.*/', $matches[1]))){
$content_type = true;
}
}

IF(!$content_type){ return false;}

/* get the status code from HTTP headers */
IF(preg_match('/HTTP\/1\.\d+\s+(\d+)/', $response, $matches)){$code = intval($matches[1]);}
ELSE {return false;}

/* see if code indicates success */
return (($code >= 200) && ($code < 400));
}

  • 关键字 : curl_setopt, matches, false, application, return, content_type, preg_match, parts, response, Content, seconds, check, cache, 速度就显得, 这些资讯, 读取回来, 要判断网址, 若只, 网址, 来判断
0 0
2010-04-30T11:45:00+0000


  • 当您未登入羊皮纸时,可以利用脸书 Facebook 登入来发表回响。若使用羊皮纸会员身份发表回响则可获得经验值及虚拟金币,用来参加羊皮纸推出的活动。
发表回响
 
验证字串