Amazium bvba, your online partner
PHP in the Dark: Input/Output
  • Share post with Twitter
  • Share post with StumbleUpon
  • Share post with Delicious
  • Share post with Digg
  • Share post with Technorati
  • Share post with Blinklist
2011-09-03 19:37

PHP in the Dark: Input/Output

input_output, getopt, cli, console, php

When you need data input in a web context, you send a GET/POST request to your script. On the command line, things work differently. In this blog post, we will talk obout input and output in php-cli.

You can find more information on this series, in the PHP in the Dark blog post.

Handling Arguments

If you have worked on the commandline before, you know that one typical way of passing information to the script is using arguments. A quick example:

./ping.php --host=somehost.com

PHP and different PHP libraries allow you to process this information in an easy way.

Caveman way

The $_SERVER global constant variable has elements with keys argc and argv. The first one holds the number of arguments passed to the script, the second one is an array containing the different arguments. It's important to note that the first argument is the script's filename, so argc will always be at least 1. If we execute following script (01-basic-input.php):

./01-basic-input.php -monthly -test=yes -v

We get following output:

$_SERVER["argc"] = 4
$_SERVER["argv"] = Array
(
    [0]=> 01-basic-input.php
    [1]=> -monthly
    [2]=> -test=yes
    [3]=> -v
)

If you look at it, it's not really useful. Sure you get the argument information, but it's raw data. Luckily for us, there are ways to parse the input into something more useful.

PHP getopt()

PHP has a function called getopt that will parse the options passed to the script. It's not something new since similar functionality exists in for instance C. The function can take 2 arguments.

array getopt ( string $options [, array $longopts ] )

The $options argument is a string that contains the short options (cfr. -u). In the string you don't add the dash, only the letter you want as a short option. You can optionally follow the letter by either a colon or a double colon. In the first case (:) you tell getopt that this short option requires a value (cfr. -u jkeppens). In the second case (::) you tell getopt that there might be an optional value.

Whenever you go for the double colon, remember that the value must be attached to the short option. This is the only way for getopt to know that the value is for the option it follows. In case of the singel colon, it can be attached or seperated by spaces.

The second argument in the function is an array containing the long options. The long options can in a similar way be followed by a colon or double colon.

// Define short options
$shortopts "";
$shortopts.= "hv";  // No values
$shortopts.= "u:";  // Required value
$shortopts.= "p::"// Optional value

// Define long options
$longopts  = array(
    
"user:",       // Required value
    
"password::",  // Optional value
    
"help",        // No value
    
"verbose",     // No value
);

// Get the options (5.3 style)
$options getopt(
    
$shortopts,
    
$longopts
);

// Show it
print_r($options);

If we run the script (02-basic-getopt.php), we get following results:

./02-basic-getopt.php -u jkeppens -psecret -v

Array
(
    [u] => jkeppens
    [p] => secret
    [v] =>
)

Using long options, this gives:

./02-basic-getopt.php --user=jkeppens --password=secret --verbose
Array
(
    [user] => jkeppens
    [password] => secret
    [verbose] => 
)

It's important to note that while you can use short and long options, for getopt() these are two unrelated options. It will not be able to match these in a smart way, something you can do with other libraries.

Now, let's see what happens if we put a space before the value next to -p:

./02-basic-getopt.php -u jkeppens -p secret -v

Array
(
    [u] => jkeppens
    [p] => 
)

That was unexpected... it even forgot about the -v! The reason for this is that the parsing of options will stop at the first non-option found. In this case the word "secret".

And when we forget the value next to -u, the text next to it is treated as it's value:

./02-basic-getopt.php -u -psecret -v
Array
(
    [u] => -psecret
    [v] => 
)

As you can see, getopt() is already an improvement to using argc/argv, but it still has some shortcomings. There is no validation and the parser isn't very smart. Luckily PEAR and Zend Framework offer some alternatives.

PEAR GetOpt

PEAR has a getopt implementation called Console_Getopt. It's easy to install if you have PEAR on your machine:

pear install Console_Getopt

The package contains 3 functions that are important:

  • getopt & getopt2: parses the arguments and returns
  • readPHPArgv: reads the $argv array across different PHP configurations

The difference between getopt and getop2 lies in the fact that the first expects the filename as first argument and the second doesn't.

One of the big differences with the php getopt() function is that the pear version expects you to pass in the arguments yourself.

array getopt(array $args, string $shortopts, array $longopts = null)

The format for the long options is different than what we saw before. Instead of colons, equal signs are used. For the short options, colons are still the way to go.

Let's take a look at an example script (03-pear-getopt.php).

// Define short options
$shortopts "";
$shortopts.= "hv";  // No values
$shortopts.= "u:";  // Required value
$shortopts.= "p::"// Optional value

// Define long options
$longopts  = array(
    
"user=",       // Required value
    
"password==",  // Optional value
    
"help",        // No value
    
"verbose",     // No value
);

// Get an instance of new Console_Getopt
require_once 'Console/Getopt.php';
$getopt = new Console_Getopt();

// Get the arguments & remove the filename from the list
$args $getopt->readPHPArgv();
array_shift($args);

// Parse the options

$options $getopt->getopt2($args$shortopts$longopts);
if (
PEAR::isError($options)) {
    echo 
'Got error: ' $options->getMessage() . PHP_EOL;
} else {
    
print_r($options);
}

If we run this, we get a different output from before:

Array (
    [0] => Array (
            [0] => Array (
                    [0] => --user
                    [1] => jkeppens
                )
            [1] => Array (
                    [0] => --password
                    [1] => secret
                )
            [2] => Array (
                    [0] => v
                    [1] => 
                )
        )
    [1] => Array (
            [0] => some
            [1] => other
            [2] => text
        )
)

So far, nothing big, except that we got something extra: the non-option arguments are also parsed.

What makes pear getopt even more interesting is the error handling. If you don't add a required value, you will get a PEAR error to inform you of this:

./03-pear-getopt.php --user --password
Got error: Console_Getopt: option requires an argument -user

Can we do better? Yes, we can!

Zend Framework Getopt

My absolute favourite is the GetOpt implementation of Zend Framework. It's so much better, that I would recommend using it as a separate component if your application is not based on the framework.

Some cool additions:

  • No need to get the arguments yourself
  • Short and long options are aliases for eachother and the relation is known
  • You can define a description for the options, so you can generate a usage message (help)
  • OO implementation: returns an object with the long option names as parameters
  • You can define the type of an option: string, word, integer,...
  • An exception is thrown in case of an unknown/wrongly typed option, missing value,...
  • You can convert the output to a string, an array, xml or json
  • You can retrieve the non-option arguments
  • And much more...

One big difference with the short options is that you don't have to (actually: can't) attach the value. The class is smart enough to handle this for you.

For more information, I suggest you have a look at the reference documentation. It contains a lot of information and examples for each bit of functionality.

http://framework.zend.com/manual/en/zend.console.getopt.html

Example code (04-zend-getopt.php) :

// Define short option
$config = array(
    
'help|h'        => 'Show help',
    
'verbose|v'     => 'Verbose mode',
    
'user|u=s'      => 'Username (string)',
    
'password|p-s'  => 'Password (string) (optional)'
);

try {
    
$options = new Zend_Console_Getopt($config);
    
$options->parse();
    if (!empty(
$options->help)) {
        echo 
$options->getUsageMessage();
    } else {
        
$username $options->user// other aliases: $options->u & $options->getOption('user')
        
$password $options->password;
        
$verbose  = !empty($options->verbose);
        echo 
'Log in [' $username ' / ' $password ']';
        if (
$verbose) {
            echo 
' in verbose mode';
        }
        echo 
PHP_EOL;
    }
} catch (
Zend_Console_Getopt_Exception $e) {
    echo 
$options->getUsageMessage();
}

When running the script, this looks like:

./04-zend-getopt.php -u jkeppens -p test -v
Log in [jkeppens / test] in verbose mode

./04-zend-getopt.php --help
Usage: ./04-zend-getopt.php [ options ]
--help|-h                  Show help
--verbose|-v               Verbose mode
--user|-u          Username (string)
--password|-p [  ] Password (string) (optional)

While this is a great implementation, there is still room for improvement for future versions. Just have a look at this example:

./04-zend-getopt.php --user --password -v

Log in [--password / ] in verbose mode

It's the same behaviour as we saw before where --password is taken as value for --user.

File Descriptors

Using file descriptors

When using the command line, php-cli defines 3 constants (called file descriptors) to handle your I/O.

  • STDIN: Input Stream
  • STDOUT: Output Stream (1)
  • STDERR: Error stream (2)

You can read from STDIN using fgets or write to STDOUT and STDERR using fwrite. Let's see in an example how they work:

// Write out text to the output stream & capture name as input
fwrite(STDOUTPHP_EOL 'Please enter your name: ');
$name trim(fgets(STDIN));

// Display "Hello <name>" to output and "I don't know <name>" to error
fwrite(STDOUTPHP_EOL 'Hello ' $name);
fwrite(STDERRPHP_EOL 'I don\'t know ' $name);

Let's run the script (05-file-descriptors.php) in a simple way first:

./05-file-descriptors.php 

Please enter your name:

The script waits until you type in something followed by enter.

Please enter your name: Jeroen

Hello Jeroen
I don't know Jeroen

As you can see both streams have written to the screen. What if we wanted the error message to write to the error logs? You can do this by appending 2>error.log or 2>>error.log when calling the script. If the logfile you specified exists, > will replace it when you call the script, while using using a double > will append to the file.

./05-file-descriptors.php 2>> error.log

Please enter your name: Jeroen

Hello Jeroen

The second line is no longer present and when we look on the file system we see the logfile:

$ ls *.log
error.log

$> cat error.log
I don't know Jeroen

You can do the same with the output, but then you append >output.log or >>output.log. In our example this doesn't make much sense, since we wouldn't see the question requesting our name.

Special behaviour

When working with daemons, one of the steps you have to take is to replace the current file descriptors. If you wouldn't do that, your daemon would attempt to write it's output to the screen.

It is very easy in PHP to point the file descriptors where you want. You start by closing the current file descriptors and then open up 3 new file descriptors. The first streams opened after closing the file descriptors, will "fill in the blanks" (06-replacing-file-descriptors.php).

// log file to write to
$logfile '/tmp/some-log-file.log';

// Close standard I/O descriptors
fclose(STDOUT);
fclose(STDERR);

// first 3 descriptors fill up the blanks
$fdOUT fopen($logfile'a');
$fdERR fopen('php://stdout''a');

// STDOUT writes to $logfile
// STDERR writes to STDOUT, so also to $logfile

Interactive Input

As we saw with the file descriptors, you can read data from the prompt in the midst of your script. This basic functionality is useful, but with the readline functions we can do so much more. The readline functions implement an interface to the GNU library with the same name and provide you with editable command lines. This includes using up/down arrow keys to browse trough your command history, auto completion and much more.

Readline : the basics

The readline function reads a single line from the command line. You can provide an optional argument that is written out to the screen as prompt:

Most people that use readline, only know of it as a replacement for reading from the prompt (07-readline-basics.php):

// Read the name from the command line, using the provided text as prompt.
$name readline('Please enter your name: ');

// Display "Hello <name>"
echo 'Hello ' $name PHP_EOL;

TIP: If you run this script locally, try to press TAB twice when you get the prompt. You will notice that it will list the files in the current directory.

Readline : history

With readline it is possible to keep a command history file. This enables you to browse trough previously entered commands. This history can be persistent, so you could write it so that you have access to commands you entered 2 weeks ago. The readline history related functions are:

  • readline_add_history: adds a line to the history
  • readline_clear_history: clears the history
  • readline_list_history: returns an array with all commands in the history
  • readline_read_history: read the history stored in a file in memory
  • readline_write_history: write the history in memory to a file

A typical program implementing the history functionality, will first try to read in the previous history from a file using readline_read_history. Usually you will do your readline next, followed by adding the received command to the memory using readline_add_history. When you're done, you write the history in memory back to your history file on disk using readline_write_history.

In the example script below, I added commands to list and clear the history as well:

// If we have a history file, read it in; otherwise blank history
$historyFile './readline.hist';
if (
is_file($historyFile)) {
    
readline_read_history($historyFile);
}

// Endless loop, keep the commands coming
while (true) {
    
$line strtolower(trim(readline("Command: ")));
    echo 
'Received [ ' $line ' ]' PHP_EOL;
    
readline_add_history($line); // add command to history
    // check special actions
    
switch ($line) { // check special actions
        
case 'clear'// clear the history
            
readline_clear_history();
            break;
        case 
'history'// print out the history
            
print_r(readline_list_history());
            break;
        case 
'quit': case 'exit'// quit / exit
            
break 2;
        default:
            break;
    }
}

readline_write_history($historyFile);

Running this script (08-readline-history.php), gives following results:

./08-readline-history.php 
Command: test
Received [ test ]
Command: do stuff
Received [ do stuff ]
Command: history
Received [ history ]
Array
(
    [0] => test
    [1] => do stuff
    [2] => history
)
Command: clear
Received [ clear ]
Command: history
Received [ history ]
Array
(
    [0] => history
)
Command: quit
Received [ quit ]

Readline : Auto-Completion

If you want to overwrite the default auto-completion behaviour (working dir filenames), you can define the completion function using readline_completion_function. You create a function that returns an array of possible auto-completions. While you can do checks yourself and return only the valid elements, readline is smart enough to do this work for you. When I first used this, I didn't know about that and my function looked like this:

/**
 * Callback function to do auto completion with readline
 *
 * @param string $string text typed so far
 * @param string $index
 * @return string
 */
function readlineCompletion($string$index)
{
    
// words available for auto completion
    
$completion = array(
        
'history',
        
'quit',
        
'exit',
        
'clear',
        
'clean',
        
'cls'
    
);

    
// if nothing was typed yet, return all options
    
if (empty($string)) {
        return 
$completion;
    }

    
// determine which words can be autocompleted based on what was typed
    
$matches = array();
    foreach (
$completion as $c) {
        if (
strpos($c$string) === 0) {
            
$matches[] = $c;
        }
    }
    return 
$matches;
}

// Define the function used for auto completion
readline_completion_function('readlineCompletion');

// Readline with auto-complete functionality
$line readline("Command: ");

But this is overkill. A much easier and cleaner version of the same code is (see 09-readline-autocomplete.php):

/**
 * Callback function to do auto completion with readline
 *
 * @param string $string text typed so far
 * @param string $index
 * @return string
 */
function readlineCompletion($string$index)
{
    
// words available for auto completion
    
return array(
        
'history',
        
'quit',
        
'exit',
        
'clear',
        
'clean',
        
'cls'
    
);
}

// Define the function used for auto completion
readline_completion_function('readlineCompletion');

// Readline with auto-complete functionality
$line readline("Command: ");

Conclusion

When you have to execute a command line script and want some basic data input, you can pass arguments to your script. In your script you can parse them using getopt. While the basic getopt does the trick, there are far better implementations with a special recommendation for Zend Framework's Zend_Console_GetOpt.

When you want to interact with your user, you can use the readline functions for a real shell experience complete with auto-completion and command history.

Resources

The code examples for this article can be found on GitHub. If you want to execute them, you will have to make them executable (chmod +x) or pass them to php (php some-script.php).

https://github.com/Amazium/PHP-In-The-Dark/tree/master/1-Input-Output