Commit a4bd3e0d authored by bhuvan krishna's avatar bhuvan krishna

Merge tag 'upstream/0.3.2'

parents 9f590765 dc0d90d4
language: php
php:
- 5.4
- 5.5
- 5.6
- 7.0
- nightly
before_script: composer install
# Creative Commons Legal Code
## CC0 1.0 Universal
http://creativecommons.org/publicdomain/zero/1.0
Official translations of this legal tool are available> CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE LEGAL SERVICES. DISTRIBUTION OF THIS DOCUMENT DOES NOT CREATE AN ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES REGARDING THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED HEREUNDER, AND DISCLAIMS LIABILITY FOR DAMAGES RESULTING FROM THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED HEREUNDER.
### _Statement of Purpose_
The laws of most jurisdictions throughout the world automatically confer exclusive Copyright and Related Rights (defined below) upon the creator and subsequent owner(s) (each and all, an "owner") of an original work of authorship and/or a database (each, a "Work").
Certain owners wish to permanently relinquish those rights to a Work for the purpose of contributing to a commons of creative, cultural and scientific works ("Commons") that the public can reliably and without fear of later claims of infringement build upon, modify, incorporate in other works, reuse and redistribute as freely as possible in any form whatsoever and for any purposes, including without limitation commercial purposes. These owners may contribute to the Commons to promote the ideal of a free culture and the further production of creative, cultural and scientific works, or to gain reputation or greater distribution for their Work in part through the use and efforts of others.
For these and/or other purposes and motivations, and without any expectation of additional consideration or compensation, the person associating CC0 with a Work (the "Affirmer"), to the extent that he or she is an owner of Copyright and Related Rights in the Work, voluntarily elects to apply CC0 to the Work and publicly distribute the Work under its terms, with knowledge of his or her Copyright and Related Rights in the Work and the meaning and intended legal effect of CC0 on those rights.
**1. Copyright and Related Rights.** A Work made available under CC0 may be protected by copyright and related or neighboring rights ("Copyright and Related Rights"). Copyright and Related Rights include, but are not limited to, the following:
1. the right to reproduce, adapt, distribute, perform, display, communicate, and translate a Work;
2. moral rights retained by the original author(s) and/or performer(s);
3. publicity and privacy rights pertaining to a person's image or likeness depicted in a Work;
4. rights protecting against unfair competition in regards to a Work, subject to the limitations in paragraph 4(a), below;
5. rights protecting the extraction, dissemination, use and reuse of data in a Work;
6. database rights (such as those arising under Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, and under any national implementation thereof, including any amended or successor version of such directive); and
7. other similar, equivalent or corresponding rights throughout the world based on applicable law or treaty, and any national implementations thereof.
**2. Waiver.** To the greatest extent permitted by, but not in contravention of, applicable law, Affirmer hereby overtly, fully, permanently, irrevocably and unconditionally waives, abandons, and surrenders all of Affirmer's Copyright and Related Rights and associated claims and causes of action, whether now known or unknown (including existing as well as future claims and causes of action), in the Work (i) in all territories worldwide, (ii) for the maximum duration provided by applicable law or treaty (including future time extensions), (iii) in any current or future medium and for any number of copies, and (iv) for any purpose whatsoever, including without limitation commercial, advertising or promotional purposes (the "Waiver"). Affirmer makes the Waiver for the benefit of each member of the public at large and to the detriment of Affirmer's heirs and successors, fully intending that such Waiver shall not be subject to revocation, rescission, cancellation, termination, or any other legal or equitable action to disrupt the quiet enjoyment of the Work by the public as contemplated by Affirmer's express Statement of Purpose.
**3. Public License Fallback.** Should any part of the Waiver for any reason be judged legally invalid or ineffective under applicable law, then the Waiver shall be preserved to the maximum extent permitted taking into account Affirmer's express Statement of Purpose. In addition, to the extent the Waiver is so judged Affirmer hereby grants to each affected person a royalty-free, non transferable, non sublicensable, non exclusive, irrevocable and unconditional license to exercise Affirmer's Copyright and Related Rights in the Work (i) in all territories worldwide, (ii) for the maximum duration provided by applicable law or treaty (including future time extensions), (iii) in any current or future medium and for any number of copies, and (iv) for any purpose whatsoever, including without limitation commercial, advertising or promotional purposes (the "License"). The License shall be deemed effective as of the date CC0 was applied by Affirmer to the Work. Should any part of the License for any reason be judged legally invalid or ineffective under applicable law, such partial invalidity or ineffectiveness shall not invalidate the remainder of the License, and in such case Affirmer hereby affirms that he or she will not (i) exercise any of his or her remaining Copyright and Related Rights in the Work or (ii) assert any associated claims and causes of action with respect to the Work, in either case contrary to Affirmer's express Statement of Purpose.
**4. Limitations and Disclaimers.**
1. No trademark or patent rights held by Affirmer are waived, abandoned, surrendered, licensed or otherwise affected by this document.
2. Affirmer offers the Work as-is and makes no representations or warranties of any kind concerning the Work, express, implied, statutory or otherwise, including without limitation warranties of title, merchantability, fitness for a particular purpose, non infringement, or the absence of latent or other defects, accuracy, or the present or absence of errors, whether or not discoverable, all to the greatest extent permissible under applicable law.
3. Affirmer disclaims responsibility for clearing rights of other persons that may apply to the Work or any use thereof, including without limitation any person's Copyright and Related Rights in the Work. Further, Affirmer disclaims responsibility for obtaining any necessary consents, permissions or other rights required for any use of the Work.
4. Affirmer understands and acknowledges that Creative Commons is not a party to this document and has no duty or obligation with respect to this CC0 or use of the Work.
......@@ -69,6 +69,9 @@ function fetch($url, $convertClassic = true, &$curlInfo=null) {
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_MAXREDIRS, 5);
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'Accept: text/html'
));
$html = curl_exec($ch);
$info = $curlInfo = curl_getinfo($ch);
curl_close($ch);
......@@ -78,6 +81,9 @@ function fetch($url, $convertClassic = true, &$curlInfo=null) {
return null;
}
# ensure the final URL is used to resolve relative URLs
$url = $info['url'];
return parse($html, $url, $convertClassic);
}
......@@ -124,6 +130,7 @@ function unicodeTrim($str) {
function mfNamesFromClass($class, $prefix='h-') {
$class = str_replace(array(' ', ' ', "\n"), ' ', $class);
$classes = explode(' ', $class);
$classes = preg_grep('#^[a-z\-]+$#', $classes);
$matches = array();
foreach ($classes as $classname) {
......@@ -231,6 +238,19 @@ function convertTimeFormat($time) {
}
}
function applySrcsetUrlTransformation($srcset, $transformation) {
return implode(', ', array_filter(array_map(function ($srcsetPart) use ($transformation) {
$parts = explode(" \t\n\r\0\x0B", trim($srcsetPart), 2);
$parts[0] = rtrim($parts[0]);
if (empty($parts[0])) { return false; }
$parts[0] = call_user_func($transformation, $parts[0]);
return $parts[0] . (empty($parts[1]) ? '' : ' ' . $parts[1]);
}, explode(',', trim($srcset)))));
}
/**
* Microformats2 Parser
*
......@@ -257,6 +277,16 @@ class Parser {
public $jsonMode;
/** @var boolean Whether to include experimental language parsing in the result */
public $lang = false;
/**
* Elements upgraded to mf2 during backcompat
* @var SplObjectStorage
*/
protected $upgraded;
/**
* Constructor
*
......@@ -304,6 +334,7 @@ class Parser {
$this->baseurl = $baseurl;
$this->doc = $doc;
$this->parsed = new SplObjectStorage();
$this->upgraded = new SplObjectStorage();
$this->jsonMode = $jsonMode;
}
......@@ -316,18 +347,42 @@ class Parser {
$this->parsed[$e] = $prefixes;
}
/**
* Determine if the element has already been parsed
* @param DOMElement $e
* @param string $prefix
* @return bool
*/
private function isElementParsed(\DOMElement $e, $prefix) {
if (!$this->parsed->contains($e))
if (!$this->parsed->contains($e)) {
return false;
}
$prefixes = $this->parsed[$e];
if (!in_array($prefix, $prefixes))
if (!in_array($prefix, $prefixes)) {
return false;
}
return true;
}
/**
* Determine if the element's specified property has already been upgraded during backcompat
* @param DOMElement $el
* @param string $property
* @return bool
*/
private function isElementUpgraded(\DOMElement $el, $property) {
if ( $this->upgraded->contains($el) ) {
if ( in_array($property, $this->upgraded[$el]) ) {
return true;
}
}
return false;
}
private function resolveChildUrls(DOMElement $el) {
$hyperlinkChildren = $this->xpath->query('.//*[@src or @href or @data]', $el);
......@@ -336,12 +391,20 @@ class Parser {
$child->setAttribute('href', $this->resolveUrl($child->getAttribute('href')));
if ($child->hasAttribute('src'))
$child->setAttribute('src', $this->resolveUrl($child->getAttribute('src')));
if ($child->hasAttribute('srcset'))
$child->setAttribute('srcset', applySrcsetUrlTransformation($child->getAttribute('href'), array($this, 'resolveUrl')));
if ($child->hasAttribute('data'))
$child->setAttribute('data', $this->resolveUrl($child->getAttribute('data')));
}
}
public function textContent(DOMElement $el) {
$excludeTags = array('noframe', 'noscript', 'script', 'style', 'frames', 'frameset');
if (isset($el->tagName) and in_array(strtolower($el->tagName), $excludeTags)) {
return '';
}
$this->resolveChildUrls($el);
$clonedEl = $el->cloneNode(true);
......@@ -350,16 +413,122 @@ class Parser {
$newNode = $this->doc->createTextNode($imgEl->getAttribute($imgEl->hasAttribute('alt') ? 'alt' : 'src'));
$imgEl->parentNode->replaceChild($newNode, $imgEl);
}
foreach ($excludeTags as $tagName) {
foreach ($this->xpath->query(".//{$tagName}", $clonedEl) as $elToRemove) {
$elToRemove->parentNode->removeChild($elToRemove);
}
}
return $clonedEl->textContent;
return $this->innerText($clonedEl);
}
/**
* This method attempts to return a better 'innerText' representation than DOMNode::textContent
*
* @param DOMElement|DOMText $el
* @param bool $implied when parsing for implied name for h-*, rules may be slightly different
* @see: https://github.com/glennjones/microformat-shiv/blob/dev/lib/text.js
*/
public function innerText($el, $implied=false) {
$out = '';
$blockLevelTags = array('h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'p', 'hr', 'pre', 'table',
'address', 'article', 'aside', 'blockquote', 'caption', 'col', 'colgroup', 'dd', 'div',
'dt', 'dir', 'fieldset', 'figcaption', 'figure', 'footer', 'form', 'header', 'hgroup', 'hr',
'li', 'map', 'menu', 'nav', 'optgroup', 'option', 'section', 'tbody', 'testarea',
'tfoot', 'th', 'thead', 'tr', 'td', 'ul', 'ol', 'dl', 'details');
$excludeTags = array('noframe', 'noscript', 'script', 'style', 'frames', 'frameset');
// PHP DOMDocument doesn’t correctly handle whitespace around elements it doesn’t recognise.
$unsupportedTags = array('data');
if (isset($el->tagName)) {
if (in_array(strtolower($el->tagName), $excludeTags)) {
return $out;
} else if ($el->tagName == 'img') {
if ($el->hasAttribute('alt')) {
return $el->getAttribute('alt');
} else if (!$implied && $el->hasAttribute('src')) {
return $this->resolveUrl($el->getAttribute('src'));
}
} else if ($el->tagName == 'area' and $el->hasAttribute('alt')) {
return $el->getAttribute('alt');
} else if ($el->tagName == 'abbr' and $el->hasAttribute('title')) {
return $el->getAttribute('title');
}
}
// if node is a text node get its text
if (isset($el->nodeType) && $el->nodeType === 3) {
$out .= $el->textContent;
}
// get the text of the child nodes
if ($el->childNodes && $el->childNodes->length > 0) {
for ($j = 0; $j < $el->childNodes->length; $j++) {
$text = $this->innerText($el->childNodes->item($j), $implied);
if (!is_null($text)) {
$out .= $text;
}
}
}
if (isset($el->tagName)) {
// if its a block level tag add an additional space at the end
if (in_array(strtolower($el->tagName), $blockLevelTags)) {
$out .= ' ';
} elseif ($implied and in_array(strtolower($el->tagName), $unsupportedTags)) {
$out .= ' ';
} else if (strtolower($el->tagName) == 'br') {
// else if its a br, replace with newline
$out .= "\n";
}
}
return ($out === '') ? NULL : $out;
}
/**
* This method parses the language of an element
* @param DOMElement $el
* @access public
* @return string
*/
public function language(DOMElement $el)
{
// element has a lang attribute; use it
if ($el->hasAttribute('lang')) {
return unicodeTrim($el->getAttribute('lang'));
}
if ($el->tagName == 'html') {
// we're at the <html> element and no lang; check <meta> http-equiv Content-Language
foreach ( $this->xpath->query('.//meta[@http-equiv]') as $node )
{
if ($node->hasAttribute('http-equiv') && $node->hasAttribute('content') && strtolower($node->getAttribute('http-equiv')) == 'content-language') {
return unicodeTrim($node->getAttribute('content'));
}
}
} elseif ($el->parentNode instanceof DOMElement) {
// check the parent node
return $this->language($el->parentNode);
}
return '';
} # end method language()
// TODO: figure out if this has problems with sms: and geo: URLs
public function resolveUrl($url) {
// If the URL is seriously malformed it’s probably beyond the scope of this
// parser to try to do anything with it.
if (parse_url($url) === false)
if (parse_url($url) === false) {
return $url;
}
// per issue #40 valid URLs could have a space on either side
$url = trim($url);
$scheme = parse_url($url, PHP_URL_SCHEME);
......@@ -410,7 +579,7 @@ class Parser {
}
/**
* Given an element with class="p-*", get its value
* Given an element with class="p-*", get its value
*
* @param DOMElement $p The element to parse
* @return string The plaintext value of $p, dependant on type
......@@ -419,19 +588,22 @@ class Parser {
public function parseP(\DOMElement $p) {
$classTitle = $this->parseValueClassTitle($p, ' ');
if ($classTitle !== null)
if ($classTitle !== null) {
return $classTitle;
}
if ($p->tagName == 'img' and $p->getAttribute('alt') !== '') {
$this->resolveChildUrls($p);
if ($p->tagName == 'img' and $p->hasAttribute('alt')) {
$pValue = $p->getAttribute('alt');
} elseif ($p->tagName == 'area' and $p->getAttribute('alt') !== '') {
} elseif ($p->tagName == 'area' and $p->hasAttribute('alt')) {
$pValue = $p->getAttribute('alt');
} elseif ($p->tagName == 'abbr' and $p->getAttribute('title') !== '') {
} elseif ($p->tagName == 'abbr' and $p->hasAttribute('title')) {
$pValue = $p->getAttribute('title');
} elseif (in_array($p->tagName, array('data', 'input')) and $p->getAttribute('value') !== '') {
} elseif (in_array($p->tagName, array('data', 'input')) and $p->hasAttribute('value')) {
$pValue = $p->getAttribute('value');
} else {
$pValue = unicodeTrim($this->textContent($p));
$pValue = unicodeTrim($this->innerText($p));
}
return $pValue;
......@@ -445,11 +617,13 @@ class Parser {
* @todo make this adhere to value-class
*/
public function parseU(\DOMElement $u) {
if (($u->tagName == 'a' or $u->tagName == 'area') and $u->getAttribute('href') !== null) {
if (($u->tagName == 'a' or $u->tagName == 'area') and $u->hasAttribute('href')) {
$uValue = $u->getAttribute('href');
} elseif (in_array($u->tagName, array('img', 'audio', 'video', 'source')) and $u->getAttribute('src') !== null) {
} elseif (in_array($u->tagName, array('img', 'audio', 'video', 'source')) and $u->hasAttribute('src')) {
$uValue = $u->getAttribute('src');
} elseif ($u->tagName == 'object' and $u->getAttribute('data') !== null) {
} elseif ($u->tagName == 'video' and !$u->hasAttribute('src') and $u->hasAttribute('poster')) {
$uValue = $u->getAttribute('poster');
} elseif ($u->tagName == 'object' and $u->hasAttribute('data')) {
$uValue = $u->getAttribute('data');
}
......@@ -461,9 +635,9 @@ class Parser {
if ($classTitle !== null) {
return $classTitle;
} elseif ($u->tagName == 'abbr' and $u->getAttribute('title') !== null) {
} elseif ($u->tagName == 'abbr' and $u->hasAttribute('title')) {
return $u->getAttribute('title');
} elseif (in_array($u->tagName, array('data', 'input')) and $u->getAttribute('value') !== null) {
} elseif (in_array($u->tagName, array('data', 'input')) and $u->hasAttribute('value')) {
return $u->getAttribute('value');
} else {
return unicodeTrim($this->textContent($u));
......@@ -574,7 +748,7 @@ class Parser {
if (!empty($value))
$dtValue = $value;
else
$dtValue = $dt->nodeValue;
$dtValue = $this->textContent($dt);
} elseif ($dt->tagName == 'abbr') {
// Use @title, otherwise innertext
// Is it an entire dt?
......@@ -582,7 +756,7 @@ class Parser {
if (!empty($title))
$dtValue = $title;
else
$dtValue = $dt->nodeValue;
$dtValue = $this->textContent($dt);
} elseif ($dt->tagName == 'del' or $dt->tagName == 'ins' or $dt->tagName == 'time') {
// Use @datetime if available, otherwise innertext
// Is it an entire dt?
......@@ -590,9 +764,9 @@ class Parser {
if (!empty($dtAttr))
$dtValue = $dtAttr;
else
$dtValue = $dt->nodeValue;
$dtValue = $this->textContent($dt);
} else {
$dtValue = $dt->nodeValue;
$dtValue = $this->textContent($dt);
}
if (preg_match('/(\d{4}-\d{2}-\d{2})/', $dtValue, $matches)) {
......@@ -632,25 +806,42 @@ class Parser {
$html = '';
foreach ($e->childNodes as $node) {
$html .= $node->C14N();
$html .= $node->ownerDocument->saveHTML($node);
}
return array(
$return = array(
'html' => $html,
'value' => unicodeTrim($this->textContent($e))
'value' => unicodeTrim($this->innerText($e)),
);
if($this->lang) {
// Language
if ( $html_lang = $this->language($e) ) {
$return['html-lang'] = $html_lang;
}
}
return $return;
}
private function removeTags(\DOMElement &$e, $tagName) {
while(($r = $e->getElementsByTagName($tagName)) && $r->length) {
$r->item(0)->parentNode->removeChild($r->item(0));
}
}
/**
* Recursively parse microformats
*
* @param DOMElement $e The element to parse
* @param bool $is_backcompat Whether using backcompat parsing or not
* @return array A representation of the values contained within microformat $e
*/
public function parseH(\DOMElement $e) {
public function parseH(\DOMElement $e, $is_backcompat = false) {
// If it’s already been parsed (e.g. is a child mf), skip
if ($this->parsed->contains($e))
if ($this->parsed->contains($e)) {
return null;
}
// Get current µf name
$mfTypes = mfNamesFromElement($e, 'h-');
......@@ -660,18 +851,29 @@ class Parser {
$children = array();
$dates = array();
// each rel-bookmark with an href attribute
foreach ( $this->xpath->query('.//a[contains(concat(" ",normalize-space(@rel)," ")," bookmark ") and @href]', $e) as $el )
{
$class = 'u-url';
// rel-bookmark already has class attribute; append current value
if ($el->hasAttribute('class')) {
$class .= ' ' . $el->getAttribute('class');
}
$el->setAttribute('class', $class);
}
$subMFs = $this->getRootMF($e);
// Handle nested microformats (h-*)
foreach ($this->xpath->query('.//*[contains(concat(" ", @class)," h-")]', $e) as $subMF) {
foreach ( $subMFs as $subMF ) {
// Parse
$result = $this->parseH($subMF);
// If result was already parsed, skip it
if (null === $result)
if (null === $result) {
continue;
// In most cases, the value attribute of the nested microformat should be the p- parsed value of the elemnt.
// The only times this is different is when the microformat is nested under certain prefixes, which are handled below.
$result['value'] = $this->parseP($subMF);
}
// Does this µf have any property names other than h-*?
$properties = nestedMfPropertyNamesFromElement($subMF);
......@@ -688,7 +890,7 @@ class Parser {
$prefixSpecificResult['html'] = $eParsedResult['html'];
$prefixSpecificResult['value'] = $eParsedResult['value'];
} elseif (in_array('u-', $prefixes)) {
$prefixSpecificResult['value'] = $this->parseU($subMF);
$prefixSpecificResult['value'] = (empty($result['properties']['url'])) ? $this->parseU($subMF) : reset($result['properties']['url']);
}
$return[$property][] = $prefixSpecificResult;
}
......@@ -713,15 +915,17 @@ class Parser {
// Handle p-*
foreach ($this->xpath->query('.//*[contains(concat(" ", @class) ," p-")]', $e) as $p) {
if ($this->isElementParsed($p, 'p'))
if ($this->isElementParsed($p, 'p')) {
continue;
}
$pValue = $this->parseP($p);
// Add the value to the array for it’s p- properties
foreach (mfNamesFromElement($p, 'p-') as $propName) {
if (!empty($propName))
if (!empty($propName)) {
$return[$propName][] = $pValue;
}
}
// Make sure this sub-mf won’t get parsed as a top level mf
......@@ -730,8 +934,9 @@ class Parser {
// Handle u-*
foreach ($this->xpath->query('.//*[contains(concat(" ", @class)," u-")]', $e) as $u) {
if ($this->isElementParsed($u, 'u'))
if ($this->isElementParsed($u, 'u')) {
continue;
}
$uValue = $this->parseU($u);
......@@ -746,8 +951,9 @@ class Parser {
// Handle dt-*
foreach ($this->xpath->query('.//*[contains(concat(" ", @class), " dt-")]', $e) as $dt) {
if ($this->isElementParsed($dt, 'dt'))
if ($this->isElementParsed($dt, 'dt')) {
continue;
}
$dtValue = $this->parseDT($dt, $dates);
......@@ -764,8 +970,9 @@ class Parser {
// Handle e-*
foreach ($this->xpath->query('.//*[contains(concat(" ", @class)," e-")]', $e) as $em) {
if ($this->isElementParsed($em, 'e'))
if ($this->isElementParsed($em, 'e')) {
continue;
}
$eValue = $this->parseE($em);
......@@ -781,14 +988,16 @@ class Parser {
// Implied Properties
// Check for p-name
if (!array_key_exists('name', $return)) {
if (!array_key_exists('name', $return) && !$is_backcompat) {
try {
// Look for img @alt
if (($e->tagName == 'img' or $e->tagName == 'area') and $e->getAttribute('alt') != '')
if (($e->tagName == 'img' or $e->tagName == 'area') and $e->getAttribute('alt') != '') {
throw new Exception($e->getAttribute('alt'));
}
if ($e->tagName == 'abbr' and $e->hasAttribute('title'))
if ($e->tagName == 'abbr' and $e->hasAttribute('title')) {
throw new Exception($e->getAttribute('title'));
}
// Look for nested img @alt
foreach ($this->xpath->query('./img[count(preceding-sibling::*)+count(following-sibling::*)=0]', $e) as $em) {
......@@ -806,7 +1015,6 @@ class Parser {
}
}
// Look for double nested img @alt
foreach ($this->xpath->query('./*[count(preceding-sibling::*)+count(following-sibling::*)=0]/img[count(preceding-sibling::*)+count(following-sibling::*)=0]', $e) as $em) {
$emNames = mfNamesFromElement($em, 'h-');
......@@ -823,40 +1031,29 @@ class Parser {
}
}
throw new Exception($e->nodeValue);
throw new Exception($this->innerText($e, true));
} catch (Exception $exc) {
$return['name'][] = unicodeTrim($exc->getMessage());
}
}
// Check for u-photo
if (!array_key_exists('photo', $return)) {
// Look for img @src
try {
if ($e->tagName == 'img')
throw new Exception($e->getAttribute('src'));
if (!array_key_exists('photo', $return) && !$is_backcompat) {
// Look for nested img @src
foreach ($this->xpath->query('./img[count(preceding-sibling::*)+count(following-sibling::*)=0]', $e) as $em) {
if ($em->getAttribute('src') != '')
throw new Exception($em->getAttribute('src'));
}
$photo = $this->parseImpliedPhoto($e);
// Look for double nested img @src
foreach ($this->xpath->query('./*[count(preceding-sibling::*)+count(following-sibling::*)=0]/img[count(preceding-sibling::*)+count(following-sibling::*)=0]', $e) as $em) {
if ($em->getAttribute('src') != '')
throw new Exception($em->getAttribute('src'));
}
} catch (Exception $exc) {
$return['photo'][] = $this->resolveUrl($exc->getMessage());
if ($photo !== false) {
$return['photo'][] = $this->resolveUrl($photo);
}
}
// Check for u-url
if (!array_key_exists('url', $return)) {
if (!array_key_exists('url', $return) && !$is_backcompat) {
// Look for img @src
if ($e->tagName == 'a' or $e->tagName == 'area')
if ($e->tagName == 'a' or $e->tagName == 'area') {
$url = $e->getAttribute('href');
}
// Look for nested a @href
foreach ($this->xpath->query('./a[count(preceding-sibling::a)+count(following-sibling::a)=0]', $e) as $em) {
......@@ -876,8 +1073,16 @@ class Parser {
}
}
if (!empty($url))
if (!empty($url)) {
$return['url'][] = $this->resolveUrl($url);
}
}
if($this->lang) {
// Language
if ( $html_lang = $this->language($e) ) {
$return['html-lang'] = $html_lang;
}
}
// Make sure things are in alphabetical order
......@@ -903,6 +1108,50 @@ class Parser {
return $parsed;
}
/**
* @see http://microformats.org/wiki/microformats2-parsing#parsing_for_implied_properties
*/
public function parseImpliedPhoto(\DOMElement $e) {
if ($e->tagName == 'img') {
return $e->getAttribute('src');
}
if ($e->tagName == 'object' && $e->hasAttribute('data')) {
return $e->getAttribute('data');
}
$xpaths = array(
'./img',
'./object',
'./*[count(preceding-sibling::*)+count(following-sibling::*)=0]/img',
'./*[count(preceding-sibling::*)+count(following-sibling::*)=0]/object',
);
foreach ($xpaths as $path) {
$els = $this->xpath->query($path, $e);
if ($els->length == 1) {
$el = $els->item(0);
$hClasses = mfNamesFromElement($el, 'h-');
// no nested h-
if (empty($hClasses)) {
if ($el->tagName == 'img') {
return $el->getAttribute('src');
} else if ($el->tagName == 'object' && $el->hasAttribute('data')) {
return $el->getAttribute('data');
}
} // no nested h-
}
}
// no implied photo
return false;
}
/**
* Parse Rels and Alternatives
*
......@@ -916,7 +1165,7 @@ class Parser {
$alternates = array();
// Iterate through all a, area and link elements with rel attributes
foreach ($this->xpath->query('//*[@rel and @href]') as $hyperlink) {
foreach ($this->xpath->query('//a[@rel and @href] | //link[@rel and @href] | //area[@rel and @href]') as $hyperlink) {
if ($hyperlink->getAttribute('rel') == '')
continue;
......@@ -977,22 +1226,16 @@ class Parser {
*/
public function parse($convertClassic = true, DOMElement $context = null) {
$mfs = array();
$mfElements = $this->getRootMF($context);
if ($convertClassic) {
$this->convertLegacy();
}
$mfElements = null === $context
? $this->xpath->query('//*[contains(concat(" ", @class), " h-")]')
: $this->xpath->query('.//*[contains(concat(" ", @class), " h-")]', $context);
// Parser microformats
foreach ($mfElements as $node) {
// For each microformat
$result = $this->parseH($node);
$is_backcompat = !$this->hasRootMf2($node);
// Add the value to the array for this property type
$mfs[] = $result;
if ( $convertClassic && $is_backcompat ) {
$this->backcompat($node);
}
$mfs[] = $this->parseH($node, $is_backcompat);
}
// Parse rels
......@@ -1003,8 +1246,9 @@ class Parser {
'rels' => $rels
);
if (count($alternates))
if (count($alternates)) {