PHP grammar notes

php-wtfI’m refining an ANTLR grammar to parse PHP, which runs into the issue that (so far as I can determine) there’s no formal definition of the PHP language, other than just what the Zend PHP engine accepts. There is documentation at php.net, which is a great help, but is often not complete, and sometimes muddles together discussions about PHP 4 and 5, whose syntax can be substantially different.

So in this post I’m recording some notes on a number of finer points of PHP syntax which seem to be undocumented. In particular:

  • Rules for composing a valid variable or function (or object…member variable/function) identifier “path”.
  • Variable names — not limited to simple strings
  • Variable names can be reserved words
  • The supposed “reference operator”
  • Can some statement keywords (like print) use function syntax?

Preliminaries

  • Based on PHP 5.2.x (ie: pre 5.3). I think 5.3 is mostly (entirely?) a superset of 5.2.x

Accessing variables, functions, objects, methods in PHP

The following table shows a variety of variable, function and member cases. Blue brackets show the precedence order and association of the parts of the expression. (See key below for italicized words.)

Global or local Object member
Constant [Note 1] php-v01 Not applic
Variable php-v02 php-mv02
Variable; name from expression. [Note 2] php-v03 php-mv03
Variable; name from variable php-v04 php-mv04
(…equivalent to…) php-v05 php-mv05
Function php-f01 php-mf01
Function; name from expression php-f02
(Not possible: illegal syntax)
php-mf02
Function; name from variable php-f03 php-mf03
(…equivalent to…) php-f04
(Not legal syntax)
php-mf04
Function; name from variable twice php-f05 php-mf05
(…equivalent to…) php-f06 php-mf06
Array element [Note 3] php-a01 php-oa01
Variable; name from array element php-a02 php-oa02
Array element; array name from variable php-a03 php-oa03
Function; name from array element php-a04 php-oa04
Ordinary function selected by array in object
php-oa05
Method in object selected by ordinary array
Element of array returned from function.
(Apparently not available.)
php-a05
Not legal syntax

Key

Item Description
expr Expression, such as ‘something’, or a more elaborate calculation, such as: ‘some’ . ‘thing’ , or a numeric calculation (for array indexes). (See also Note 1 regarding unquoted strings)
obj Any expression returning an object. Could be simply $someobject, or could be something more elaborate, like a function returning an object: myobjfunc( ).

Notes

1. PHP semi-allows unquoted strings. That is to say, if PHP sees a bare series of characters, it attempts to look up a constant by that name. If not found, PHP will “assume” the series of characters into a string (apparently single-quote style). If error_reporting(E_ALL) then PHP issues a warning (“Notice: Use of undefined constant xxx – assumed ‘xxx’ in […]”).

2. Curly braces enclose expression to be evaluated first, with the resulting string then used as part of the identifier. (But see also braces used for subscript, next Note.)

3. Array index(es):

  • Array index may be enclosed in either square brackets [ ] or braces { }, though the former is more widely used (are braces deprecated?).
  • Multi-dimensional array access: Use multiple sets of brackets/braces: $multiarray[1][4] etc
  • If the variable is a string rather than an array, then the square brackets or braces can be used to access a single character in the string. (Ie: the string is treated in this case as an array of characters).

Possible misunderstanding involving { } and unquoted strings

The { } braces used to enclose expressions (as shown above) appears to be undocumented on php.net. Consequently there are opportunities for it to be misunderstood. This is especially the case when one experiments and sees that:
$a = $blah->afunc();
$b = ${blah}->{afunc}();

… results in $a and $b getting the same result. Perhaps, one might infer, the braces in this context are some sort of grouping mechanism for pieces of an identifier path. What is missed in this scenario is that in the first statement, PHP sees the identifiers $blah and afunc, while in the second statement PHP sees blah and afunc as unquoted strings, and internally revises it like this:
$b = ${'blah'}->{'afunc'}();
Now we can see that { } perform their function enclosing an expression, in this case a couple of simple strings. (This revision process is reported as a warning note if error_reporting(E_ALL) is in effect.)

Additional use of curly braces in string interpolation

In double-quoted strings PHP is able to “interpolate” variables — recognize a variable and substitute its value in its place in the string. This can be invoked using one of two syntaxes, “simple” and “complex”, as described here: http://www.php.net/manual/en/language.types.string.php#language.types.string.syntax.double. The simple case looks like this:
$a = "This is a string with $b in it";
The complex syntax requires curly braces around the section of the string to be interpolated, like this:
$a = "This is a string with {$b} in it";
This is useful if placing the variable directly in the string can’t be parsed properly, for example:
$a = "This is the $nth item"; // $n next to th, won't parse
$a = "This is the {$n}th item"; // better

Note that the syntax that PHP recognizes in this case is actually {$ … }. That is to say, the curly braces are only recognized as interpolation if the opening brace is immediately followed by the dollar sign prefix of a variable.

This gets potentially puzzling if a user sometimes sees something like this:
$a = "This is a string with ${b} in it";
In that example PHP sees the curly braces as enclosing an expression (in this case a bare string, which could be a constant, but if not then PHP “corrects” it to ‘b’ as described above in “Possible misunderstanding involving { } and unquoted strings”.

In short, within a double-quoted string, {$b} and ${b} employ curly braces in two quite different ways. The first encloses an expression for interpolation, the second encloses an expression which will form a variable name. And they could be used together:
$a = "This is a string with {${b}} in it";
… though there’s not much point in this example.

Variable names: Not limited to simple strings

The standard description is here: http://php.net/manual/en/language.variables.basics.php

“Variable names follow the same rules as other labels in PHP. A valid variable name starts with a letter or underscore, followed by any number of letters, numbers, or underscores. As a regular expression, it would be expressed thus:”

[a-zA-Z_\x7f-\xff] [a-zA-Z0-9_\x7f-\xff]*
Ie: first letter must be in [A-Z] , [a-z], underscore or characters 0x7f-0xff , and subsequent characters can be from that same set or digits. Note no spaces or punctuation, for example. However, this is really just a limitation on what PHP will digest in the form: $myvarname and related.

If instead you write:
${'my evil %^&# varname'}
PHP will happily digest pretty much any string as a variable name (or member variable name). (There may be a way to employ evil function names also, but I’ve not seen a way to create the evilly-named function in the first place.)

Variable names: Can be reserved words

Despite the warning here: http://php.net/manual/en/reserved.php

“None of the identifiers listed here should be used as identifiers in any of your scripts. These lists include keywords and predefined variable, constant, and class names. These lists are neither exhaustive or complete.”

… in actuality some (perhaps most or even all) these reserved words can be used as variable names (and member variable names), without PHP objecting. This seems to be because there is no difficulty distinguishing the reserved usage from the variable name usage (due to the dollar sign etc). However, the reserved words trigger an error when code uses them to declare or call a function or method.

References and the reference “operator”

There is much discussion on php.net about PHP “references”. Some of this discussion is quite confusing because the syntax and behavior for references was considerably revised between PHP4 and PHP5, with the latter taking a more straightforward approach.

Manual pages starting here: http://php.net/manual/en/language.references.php, at least as of early 2009-08, are a muddle in this respect.

In PHP, “reference” means what other languages call “alias”

PHP’s reference idea is not what other languages (C++, Delphi etc) call a reference. Instead, the PHP reference is more like what others call an “alias”. It is not a variable that points to another variable, instead it’s a completely interchangeable second (or subsequent) name for the same variable, or rather for the memory slot where the variable’s value is stored. (This is described more fully as a “symbol table alias”.)

Thus:
$a = 57;
$b =& $a; // create a reference (alias) to $a
$b = 25; // the memory slot referred to by $a and $b now contains 25
$c = $b; // The value of that slot is copied to $c

Now both $a and $b have the value 25, because both $a and $b refer to the same memory slot. At this juncture, $a and $b are completely equivalent, there’s nothing special about the one introduced first, or last for that matter.

Because this idea is really that of “alias”, there’s no syntax to “de-reference” such a variable. In addition, ordinary assignment ($c = $b;) simply copies the value as usual, because there’s nothing special about $b — it’s just a variable.

Relationship between PHP objects and references

Somewhat confusingly, it is sometimes said that variables holding PHP objects operate as references. To clarify, in PHP 5, they operate like references in other languages, but not like the PHP references (aka aliases) just discussed. Example:
class G {
var h$;
function f( ) {...}
}
$j = new G( );
$k = $j; // second variable refers to same object
$m = clone $k; // $m gets a copy of the object

In actuality, a variable referring to an object contains a flag to tell that it’s an object, and its value is an object id, by which PHP can find the object in memory. In the sample code, the new function passes such an object id to $j (with no “&” required). Similarly, in ($k = $j) the object id is copied, so that $k now refers to the same object, and the object itself is not duplicated. To create an actual copy, one can use the clone function.

In PHP 4, matters were more convoluted; PHP 4 objects involved using the “reference” idea, and the “&” operator was involved in awkward ways. In short, PHP 5 made a substantial improvement, no longer entailing the “&” operator. So one must avoid being thrown off by outdated documentation that harkens back to days of 4.

Back to the main point: where do references, and “&”, occur?

We can concern ourselves only with how references are involved in ordinary variables (ie: not object variables). In this realm, references, and the ampersand, appear in three different roles:

  • Create a references (alias) of an existing variable
    • $a =& $b;
  • Declare that a function’s argument(s) should be passed by reference (and can thus also return a value)
    • function myfunc(& $anarg, $otherarg) {...}
      Here $anarg is to be passed by reference, that is to say $anarg is an alias to whatever variable is passed in.
  • Declare that a function returns a reference (alias) to a value.
    • function & myfunc() {...}
      $x =& myfunc();

      Here myfunc() is declared to return a value by reference. If the invocation receives the return value using the reference “operator”, then the receiving variable becomes an alias to the value from the function. (Here $x becomes an alias to the value created within myfunc()).

Is “&” an operator?

Though “&” is often referred to as the reference operator, I claim that it is not really an operator. Where it is used in function declaration, it is simply an additional flag on the function or argument specification, so not an operator there. But is it an operator in the “=&” cases?

On the one hand, it is possible to write:
$a = & $b;
The separation of the & with whitespace perhaps gives the impression of it being a separate element. However, if this:
$a = (& $b);
… results in an error message. That is to say, there does not appear to be a separately available “result” from the application of the “&” by itself. So in my view, the combination “=&” is best thought of as a single “assign alias” operator, regardless of any whitespace between the two characters.

Can some statement keywords (like print) use function syntax?

It may be noted that a that the print statement returns a value and behaves something like a function:
print 'something'; // normal usage
print('something'); // usage that looks like a function
$x = print('something'); // Returns a value ($x = 1, not sure what it means).
$x = print 'something' ; // Perhaps unexpectedly, does the same thing without "function syntax"

Here’s the doc on php.net: http://us.php.net/manual/en/function.print.php

However, the line of thinking displayed above and implied by the docs is very misleading. So far as I can tell, there is no version of the print statement that incorporates the brackets in the function syntax. Instead, if brackets are used they merely act as enclosing brackets around the expression provided as an argument to print — an entirely different meaning, and misleading to boot.

Here’s a case where this makes a big difference:
$x = print('something') + 37;
If print( ) was a function, then this should print the string ‘something’, and give $x the value 1+37 =38. Instead, the “+ 37” causes PHP to interpret (‘something’) as a number, having value zero. So the expression provided to print is 0 + 37, causing print to print the number 37. The result value from print is 1 as usual, assigned to $x.

The example’s likely intended effect could be achieved with this syntax:
$x = (print 'something') + 37;
Now print will print ‘something’, and the result, 1 gets added to 37, giving 38 to $x.

———————–

Id for monitoring cherch is GW20090805PHP

Advertisements

3 Comments

  1. A.R.
    Posted 2010-03-24 at 10:53 am | Permalink | Reply

    Graham, are you planning to post that ANTLR grammar anytime soon? I’m interested in writing a simple source-to-source PHP to Java converter for a meaningful subset of PHP and an ANTLR grammar would be helpful!

  2. Posted 2010-03-24 at 11:21 pm | Permalink | Reply

    Hi A.R.,
    I made some progress, but got diverted to other projects, so it’s not very close to done yet. Sorry!
    — Graham

  3. Posted 2016-11-13 at 6:19 pm | Permalink | Reply

    It is truly a nice and helpful piece of information.
    I am satisfied that you just shared this helpful information with us.
    Please keep us up to date like this. Thanks for sharing.

Post a Comment

Required fields are marked *

*
*

%d bloggers like this: