论坛目录 / 程式设计 / 网页/网站程式 / PHP /

PHP 判断网址是否正确或网页是否存在

发表新主题
随机主题
上个主题
下个主题
|
    PHP 判断网址是否正确或网页是否存在

    ???在 PHP 中有时侯我们需要判断某些网页或是网址是否存在,例如有效的减少失效连结。

    最简单的方法就是 fopen / file_get_contents .. 等等有很多种方法,不过这些方法都会把整页 HTML 读取回来,若只是要判断网址是否失效来说,速度就显得有些缓慢!要判断可以由 HTTP HEADER 来判断,就不用把整页的内容都抓回来!

    可以用  get_headers() 得到这些资讯∶

    HTTP/1.1 200 OK
    Date: Mon, 06 Oct 2008 15:45:27 GMT
    Server: Apache/2.2.9
    X-Powered-By: PHP/5.2.6-4
    Set-Cookie: PHPSESSID=4e037868a4619d6b4d8c52d0d5c59035; path=/
    Expires: Thu, 19 Nov 1981 08:52:00 GMT
    Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
    Pragma: no-cache
    Vary: Accept-Encoding
    Connection: close
    Content-Type: text/html

    PHP + Curl + Content-Type 的判断方式∶

    FUNCTION existsWebpage($url){
    $parts = parse_url($url);
    IF(!$parts){return false;} /* the URL was seriously wrong */
    IF(isset($parts['user'])){return false;} /* user@gmail.com */
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);

    /* set the user agent - might help, doesn't hurt */
    curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (compatible; wowTreebot/1.0; +http://wowtree.com)');
    curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);

    /* try to follow redirects */
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

    /* timeout after the specified number of seconds. assuming that this script runs on a server, 20 seconds should be plenty of time to verify a valid URL. */
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 15);
    curl_setopt($ch, CURLOPT_TIMEOUT, 20);

    /* don't download the page, just the header (much faster in this case) */
    curl_setopt($ch, CURLOPT_NOBODY, true);
    curl_setopt($ch, CURLOPT_HEADER, true);

    /* handle HTTPS links */
    IF($parts['scheme'] == 'https'){
    curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 1);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    }

    $response = curl_exec($ch);
    curl_close($ch);

    /* allow content-type list */
    $content_type = false;
    IF(preg_match('/Content-Type: (.+\/.+?)/i', $response, $matches)){
    switch ($matches[1]){
    case 'application/atom+xml':
    case 'application/rdf+xml':
    case 'application/xhtml+xml':
    case 'application/xml':
    case 'application/xml-dtd':
    case 'application/xml-external-parsed-entity':
    $content_type = true;
    break;
    }

    IF(!$content_type && (preg_match('/text\/.*/', $matches[1]) || preg_match('/image\/.*/', $matches[1]))){
    $content_type = true;
    }
    }

    IF(!$content_type){ return false;}

    /* get the status code from HTTP headers */
    IF(preg_match('/HTTP\/1\.\d+\s+(\d+)/', $response, $matches)){$code = intval($matches[1]);}
    ELSE {return false;}

    /* see if code indicates success */
    return (($code >= 200) && ($code < 400));
    }

    • 关键字 : curl_setopt, matches, false, application, return, content_type, preg_match, parts, response, Content, seconds, check, cache, 速度就显得, 要判断网址, 要判断可以由, 就不用把整页, 否失效来说, 判断某些网页, 减少失效连结
    00
    2010-04-30T11:45:00+0000


    • 当您未登入羊皮纸时,可以利用脸书 Facebook 登入来发表回响。若使用羊皮纸会员身份发表回响则可获得经验值及虚拟金币,用来参加羊皮纸推出的活动。
    发表回响
     
    验证字串
    留言