MANUAL of strStream class

This is short but descriptive manual & reference for strStream.
It also describes the basic principles and considerations of developing class strStream.

Copyright terms

Each file related to my recent project begins with a HTML comment similar to following one. The reason why it was not placed in the PHP scope is that this seemed more effective way to give them publicity. The first time the strStream has used only internally, so nobody could be able to get familiar with it. Now, the class has been published, but the comment remains in the HTML scope for historical reason.
If you would like to change scope of the comment, you are welcome - the trailing hashmarks ease your work. :)
As you can realize the copyright terms are slightly modificated version of PostgreSQL's ones. I hope much copy of copyright terms are not prohibited by any law or authority and I didn't break any rule. If it does not hold where you apply this product or its pieces, you are kindly asked to remove the last two paragraph and the reference to them from the preceding paragraph.

Bruce Momjiam is one of the PostgreSQL core developers and currently also the vice president of Great Bridge, a PostgreSQL support company, he said:

"I personally find that the fewer restrictions, the easier things are to understand, the better. I think that a lot of people don't really understand the implications of, for example, the GPL license because it's a long document. It's very complicated, too, it really has a lot of things, it doesn't cover completely so you are left not understanding what's legal to do, or what's not legal to do. "
I think he is absolutely right. That is why I dare to use their copyright for my own purpose.

By the way, PostgreSQL is one of my favourite developing tool (not only among RDBMS'), you should try it if you have any spare time.

<!--#--------------------------------------------------------------------------
# This is the file where strStream class lives
# written by	: Gyozo Papp @: pgerzson@freestart.hu, gerzson17@freemail.hu
# last modified	: 2001.04.30
#--COPYRIGHT-------------------------------------------------------------------
# strStream is Copyright © by all of us (created by Gyozo Papp) 
# Permission to use, copy, modify, and distribute this software and 
# its documentation for any purpose, without fee, and without a written 
# agreement is hereby granted, provided that the above copyright notice 
# and this paragraph and the following two paragraphs appear in all copies. 
# 
# IN NO EVENT SHALL i BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, 
# INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, 
# ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, 
# EVEN IF i HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
#
# i SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, 
# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 
# PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS-IS" BASIS, 
# AND i HAS NO OBLIGATIONS TO PROVIDE MAINTAINANCE, SUPPORT, UPDATES, 
# ENHANCEMENTS, OR MODIFICATIONS.
# BUT i LIKE TO VERY MUCH.
#--------------------------------------------------------------------------#-->

Manual

The purpose of creating this class is that flat text files with arbitrary structure can be parsed easily. You must notice that this class supports stream reading rather than writing, because it is used to parse different flat text files.

First, you have to put() the string to be parsed (input) into the internal buffer of the class.
The internal buffer is currently a simple PHP string, but it may be changed in the future depending on that new features can be implemented with it.

After you load anything to it, you can retrieve classified information from the buffer. Several methods have been already implemented with which you can either get the leftmost word or parenthised block from the buffer. The names of these methods start with 'get' and listed below under 'Stream Reading'.
I tried to make this functions flexible enough in order that it can be used in various fields.

There are two reading mode defined for this class: GET and FETCH. The former one means when you get a piece of the buffer then this is removed from it transparently, while the latter one always leaves the buffer untouched and only makes copies of its return values.

If you find that the block stripped from the buffer is superfluous, you are able to push it back by calling unget(). So you have nothing to worry about if you pull some items in mode GET, you can easily put it back. (Up to now, there is a very limited support to write in a stream, this class is mainly for reading, parsing and slicing its string content).

Examples

Be posted soon! Please wait gently.

Reference

Initializing & Reinitializing

Stream Length

Stream Reading

Internal Buffer Manipulating

This methods change the whole content of the buffer regardless the current retrieving mode since they have no mode argument.

Private

Note: These functions are intended to be used internally - from other methods of this class and derived classes.
For any further details you might see the source code.

strStream(string)

argument:
  1.  $str (string, default = '')
returning value(s):
-
description:
This function is the constructor of the class strStream. If argument[1] is supplied then it is copied to the internal buffer.
Additional initialization is made here, nothing serious.

put(string)

argument:
  1.  $str (string, required)
returning value(s):
description:
This function append $str[1] to the end of internal buffer and clears the state flags.
It is most commonly used to reinitialize an empty strStream.

done()

argument:
-
returning value(s):
-
description:
This function clears the internal buffer and the state flags.
It is most commonly used to gain an empty stream before putting anything into it.

enough(int)

argument:
  1.  $internal (int, default = 1)
returning value(s):
description:
This function checks whether the internal buffer contains at least as many characters as specified in $internal[1].

eos(string)

argument:
  1.  $what (string, default = '')
  2.  $whole (bool, default = false)
returning value(s):
description:
This function checks any characters given in $what[1] occur in the internal buffer. In other words, it checks whether the stream ends before ... If $what is empty, it returns whether the internal buffer is empty or not.

getc(int, int)

argument:
  1.  $count (int, default = 1)
  2.  $mode (int, default = GET)
returning value(s):
description:
It returns a substring (its length is specified by $count[1]) from the begining of stream. Insufficient number of characters (see enough()) cause that function fails and return false.
The $mode[2] specifies whether the substring would be removed from the buffer (GET) or just make a copy as return value and leave the buffer untouched (FETCH).

getw(string, int, bool)

argument:
  1.  $dlmtrs (string, default = S)
  2.  $mode (int, default = GET)
  3.  $skipquote (bool, default = true)
returning value(s):
description:
It returns the leftmost word from the internal buffer. The $dlmtrs[1] contains the characters treated as word-delimiter ones, which means that the substring in the internal buffer before the firt occurence of any character of $dlmtrs would be the return value. If the buffer is already empty or contains just such characters as $dlmtrs, function fails and return false. $mode[2] specifies whether the substring would be removed from the buffer (GET) or just make a copy as return value and leave the buffer untouched (FETCH).
note:
$skipquote is NOT implemented yet.

getp(string, int, int)

argument:
  1.  $op (one of the following character: opening parenthesis ( bracket [ brace { less-than<, default = '(')
  2.  $mode (int, default = GET)
  3.   $skipquote (bool, default = true)
returning value(s):
description:
This function searches for a character block enclosed by that pair of brackets which is specified by its opening tag in $op[1]. In between there may be any characters, except that parentheses (( )), brackets ([ ]) and braces ({ }) must always occur in matching pairs and may be nested. Single (') and double quotes (") must also occur in matching pairs, and characters between them are parsed as a string. It may skip some characters at beginning of the buffer in order to find the first opening bracket.
The function keeps track of brackets, - in other words - it can handle nested blocks. On success, the returning value is an array of the largest block without opening and closing brackets (at index 0) and a string which precedes the block in the buffer (at index 1) - the skipped characters.
The $skipquote[3] argument decides whether quotations (substrings between " "s or ' 's) are skipped and not parsed for brackets. A single quotation mark is treated as a single character and not as the beginning of a quotation. Owing to the light implementation of this tricky behaviour the quotation mark itself must not appear inside the quotation even it is escaped by a '\'. The other quotation mark (for " the ' and vice versa) may appear inside a quotation.
If brackets are not in balance in the buffer which means that more opening tags are found than closing ones, function fails and return -2. If $op[1] is not valid (values listed above), it also fails and return -1.
The $mode[2] specifies whether the returning values would be removed from the buffer (GET) or just make a copy of them and leave the buffer untouched (FETCH).

geta(string, int, bool)

argument:
  1.  $ops (sequence of any opening brackets character such as < ( { [, default = '<([{')
  2.  $mode (int, default = GET)
  3.  $skipquote (bool, default = true)


returning value(s):
description:
This function iterates on $ops[1] and assumes its each character as an opening bracket parameter in calling getp() untill getp() finds a block.
note:
name geta stands for 'get a block that starts with < ( [ or {'.

getb(string, string, bool)

argument:
  1.  $op opening tag of block(string, required)
  2.  $cl closing tag of block(string, required)
  3.  $mode (int, default = GET)
returning value(s):
description:
Briefly, This function is intended to strip out comments from the stream. (between ie.:'/*' '*/' or '//' '\n').
It searches a block of text in the internal buffer starting with $op[1] and ending with $cl[2]. It is similiar to getw(...), but while the former treats its arguments ($op, $cl) as strings as a whole, the latter one as a set of delimiting characters, therefore if you want to get a block which starts a tag length of more than one character you have to use this method.
It may skip some characters at beginning of the buffer in order to find the first opening tag. On success, the returning value is an array of the largest block without opening and closing tag (at index 0) and a string which precedes that block in the buffer (at index 1).
The $mode[2] specifies whether the substring would be removed from the buffer (GET) or just make a copy as return value and leave the buffer untouched (FETCH).

get(int)

argument:
  1.  $mode (int, default = GET)
returning value(s):
description:
Briefly, This function returns the whole content of internal buffer
The $mode[2] specifies whether the buffer to be clear(GET) or just make a copy as return value and leave the buffer untouched (FETCH).

unget(string, bool)

argument:
  1.  $str string to put at the beginning of the stream(string, required)
  2.  $trusted state of $str(bool, default = false)
returning value(s):
description:
This function places $str[1] in the beginning of internal buffer
The most common usage of this method is its 'undo' functionality: if you realized after whatever get method - in GET mode - that its result is not what you expected, you are able to push back it into the buffer and reuse it. The $trusted[2] argument is for reporting the object that the string to be placed is in the same state as the buffer is in. (State of buffer is influenced by normalize_space() and strip_space(), see below)

normalize_space()

argument:
-
returning value(s):
description:
This function returns the internal buffer whitespace normalized by stripping leading and trailing whitespace and replacing sequences of whitespace characters by a single space. Whitespace characters are the same as those allowed by the S production in XML. (namely: \r\n\t defined in S).

strip_space(string)

argument:
  1.  $dlmtrs(string, default ='|,*?+()')
returning value(s):
description:
This function returns the internal buffer by stripping whitespaces around the charcter given in $dlmtrs[1]. Whitespace characters are the same as those allowed by the S production in XML. (namely: \r\n\t defined in S)