[Part 1] PHP Best Practices and how to Properly Filter Input and Escape Output Data

By // No comments:
Overview of Handling Data with Best Practices in Mind

Handling user-supplied data is a big part of many web applications, and it's critical that this is done properly to prevent security holes. There are a number of best practices and principles we can follow when handling data, though I'll just be covering the ones I feel to be the most important:


PHP Best Practices and how to Properly Filter Input and Escape Output


Practices:
  1. Treat all user data to be tainted until it has been validated; never assume the integrity of such data (guilty until proven innocent).
  2. Make your users apply by your validation rules. That is to say, do not attempt to correct any invalid data because this gives the potential for security vulnerabilities to arise.
  3. Keep track of data as it enters and exits parts of your application. This is critical in order to be able to tell what data is potentially tainted, and what data has been validated and is safe to use.

Principles:
  1. Minimise exposure of sensitive data. This covers not storing passwords in cookies, not using the HTTP GET method as a way of requesting passwords, not storing configuration files in the document root, and so on.
  2. Defence in Depth - the advocation of using redundant safeguards. This can help to improve the security of a web application through having additional levels of safeguards in-place (that should never have to be used, but are there just in case).


Filtering Input

Filtering input should be done whenever applicable to prevent junk data from entering a web application. It is performed upon the data coming into an application where its validity is inspected. There's a number of ways we can filter our users' input, though the method you choose will be dependent upon the input data you're looking to manipulate. As such, I'll be running through just a few commonly used functions and libraries to give you more of an idea of how this inspection process works. I'll (try to) explicitly reference the practices and principles stated above when I use them.


The Character Type Functions (ctype_)



The character type functions are from the Ctype extension, which is full of handy functions that can be used to validate user input. It does this by checking the characters of a string to see if they're of an appropriate type, much like a simplistic regular expression. All of the Ctype functions are known as predicate functions because they only return a boolean value (TRUE or FALSE). Here's a list of the Ctype functions:

?

  
01
02
03
04
05
06
07
08
09
10
11
ctype_alnum() — Checks for alphanumeric character(s)
ctype_alpha() — Checks for alphabetic character(s)
ctype_cntrl() — Checks for control character(s)
ctype_digit() — Checks for numeric character(s)
ctype_graph() — Checks for any printable character(s) except space
ctype_lower() — Checks for lowercase character(s)
ctype_print() — Checks for printable character(s)
ctype_punct() — Checks for any printable character which is not whitespace or an alphanumeric character
ctype_space() — Checks for whitespace character(s)
ctype_upper() — Checks for uppercase character(s)
ctype_xdigit() — Checks for character(s) representing a hexadecimal digit

  
ctype_alnum() — Checks for alphanumeric character(s)
ctype_alpha() — Checks for alphabetic character(s)
ctype_cntrl() — Checks for control character(s)
ctype_digit() — Checks for numeric character(s)
ctype_graph() — Checks for any printable character(s) except space
ctype_lower() — Checks for lowercase character(s)
ctype_print() — Checks for printable character(s)
ctype_punct() — Checks for any printable character which is not whitespace or an alphanumeric character
ctype_space() — Checks for whitespace character(s)
ctype_upper() — Checks for uppercase character(s)
ctype_xdigit() — Checks for character(s) representing a hexadecimal digit


Tip:
Ensure that you're always passing in strings to these functions, even if the values are numeric. This is because the PHP manual states:
"If an integer between -128 and 255 inclusive is provided, it is interpreted as the ASCII value of a single character (negative values have 256 added in order to allow characters in the Extended ASCII range). Any other integer is interpreted as a string containing the decimal digits of the integer."


Further Reading:

filter_var

The filter_var function accepts three arguments: the variable to validate, the filter to apply (a constant), and any optional flags to be set on the filter used. Some simple scenarios where you're going to want to use this function is for validating URLs and E-mails. Validating them with regular expressions is not a good idea, even if you know your way around them.

The function has two primary types of filtering: validation and sanitisation. Validation filtering will check for invalidity in the data, where FALSE is returned if data integrity is not met, and upon success the data is returned. Sanitisation filtering will attempt to replace any invalid characters and return the sanitised string (according to the filtering type used - this does not mean it is safe to exit it from your application without further sanitising).

Here's a few simple and common use-cases:


?
Validation filtering:

  
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
<?php
$email = 'valid@email.com';
if(filter_var($email, FILTER_VALIDATE_EMAIL) !== FALSE) {
    // valid email
}
if(filter_var($url, FILTER_VALIDATE_URL) !== FALSE) {
    // valid URL
}
$age = 20;
$options = array('options' => array('min_range' => 18, 'max_range' => 100));
if(filter_var($age, FILTER_VALIDATE_INT, $options) !== FALSE) {
    // valid age
}




<?php

$email = 'valid@email.com';
if(filter_var($email, FILTER_VALIDATE_EMAIL) !== FALSE) {
    // valid email
}

$url = 'http://domain.tld';
if(filter_var($url, FILTER_VALIDATE_URL) !== FALSE) {
    // valid URL
}

$age = 20;
$options = array('options' => array('min_range' => 18, 'max_range' => 100));
if(filter_var($age, FILTER_VALIDATE_INT, $options) !== FALSE) {
    // valid age
}



?
Sanitisation filtering:

  
1
2
3
4
5
6
7
8
<?php
$output = 'Protecting against XSS: <script>alert(0)</script>';
echo filter_var($output, FILTER_SANITIZE_FULL_SPECIAL_CHARS);
$int = 3.3;
echo filter_var($int, FILTER_SANITIZE_NUMBER_INT); // 33
// Note that it omits invalid characters, rather than truncating the input like other integer-validating functions

  
<?php

$output = 'Protecting against XSS: <script>alert(0)</script>';
echo filter_var($output, FILTER_SANITIZE_FULL_SPECIAL_CHARS);

$int = 3.3;
echo filter_var($int, FILTER_SANITIZE_NUMBER_INT); // 33
// Note that it omits invalid characters, rather than truncating the input like other integer-validating functions



Further Reading:

It's all about Type

It's a well-known fact that PHP is a loosely-typed language. Data types do not need to be explicitly stated before variable definitions or function parameters, and method signature types do not need to be specified either. But that's not to say variable type is not important though.

Tip:
It's always best practice to perform strict comparisons because of the loosely-typed nature of PHP.



Type-Checking

Type-checking in PHP can be done with the is_ functions - a set of predicate functions that return TRUE if the type is correct, or FALSE otherwise. The following is a list of these functions:


?
is_ functions

  
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
is_​array
is_​bool
is_​callable
is_​double
is_​float
is_​int
is_​integer
is_​long
is_​null
is_​numeric
is_​object
is_​real
is_​resource
is_​scalar
is_​string

  
is_​array
is_​bool
is_​callable
is_​double
is_​float
is_​int
is_​integer
is_​long
is_​null
is_​numeric
is_​object
is_​real
is_​resource
is_​scalar
is_​string


There is also the instanceof operator, which will check that the left-hand operand object is of the same type as the right-hand operand object.


Type-hinting


Support for type-hinting was first introduced in PHP 5, and has been a much-loved feature of the PHP community. Method parameters should take advantage of type hinting when possible because of the improved maintainability it provides, along with the less error-prone code it produces (that is also partially self-documenting). PHP supports the following types: objects, arrays (as of PHP 5.1), callables (as of PHP 5.4), and iterators. If a variable of the incorrect type is passed as an argument to a function, then a fatal error is produced.


?
Type hints are used like so:

  
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
<?php
class TypeHinting
{
    private $carObject;
    private $accessories = array();
    public function __construct(CarObjectInterface $carObject)
    {
        $this->carObject = $carObject;
    }
    public function addAccessories(array $accessories)
    {
        $this->accessories = $accessories;
    }
}

  
<?php

class TypeHinting
{
    private $carObject;
    private $accessories = array();

    public function __construct(CarObjectInterface $carObject)
    {
        $this->carObject = $carObject;
    }

    public function addAccessories(array $accessories)
    {
        $this->accessories = $accessories;
    }
}



?
The above is a much cleaner and more legible snippet than the following (which does not take advantage of type-hinting):

  
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
<?php
class TypeHinting
{
    private $carObject;
    private $accessories = array();
    public function __construct($carObject)
    {
        if(!($carObject instanceof CarObjectInterface)) {
            trigger_error('Fatal error: wrong type!', E_USER_ERROR);
        }
        $this->carObject = $carObject;
    }
    public function addAccessories($accessories)
    {
        if(!is_array($accessories)) {
            trigger_error('Fatal error: wrong type!', E_USER_ERROR);
        }
        $this->accessories = $accessories;
    }
}

  
<?php

class TypeHinting
{
    private $carObject;
    private $accessories = array();

    public function __construct($carObject)
    {
        if(!($carObject instanceof CarObjectInterface)) {
            trigger_error('Fatal error: wrong type!', E_USER_ERROR);
        }

        $this->carObject = $carObject;
    }

    public function addAccessories($accessories)
    {
        if(!is_array($accessories)) {
            trigger_error('Fatal error: wrong type!', E_USER_ERROR);
        }

        $this->accessories = $accessories;
    }
}

PHP does not, however, support type-hinting for scalars (string, int, boolean, float), or the resource and trait types (and probably never will because of its loosely-typed nature). There are solutions to support scalars in the comments section on PHP.net, though beware that some may slow down the performance of your PHP applications.


Type Casting


?
When we perform a type cast operation in PHP, we change the variable type it is currently casted to. PHP supports the $var = (type) $var; syntax (similar to C and Java), where (type) can be any one of the following:

  
1
2
3
4
5
6
7
(int), (integer) - cast to integer
(bool), (boolean) - cast to boolean
(float), (double), (real) - cast to float
(string) - cast to string
(array) - cast to array
(object) - cast to object
(unset) - cast to NULL

  
(int), (integer) - cast to integer
(bool), (boolean) - cast to boolean
(float), (double), (real) - cast to float
(string) - cast to string
(array) - cast to array
(object) - cast to object
(unset) - cast to NULL



?
Type casting is commonly done as a method of validation for integers from user input:

  
1
2
3
4
5
<?php
if(isset($_GET['id'])) {
    $id = (int) $_GET['id']; // ensure that the id from the HTTP GET method is of an integer type
}

  
<?php

if(isset($_GET['id'])) {
    $id = (int) $_GET['id']; // ensure that the id from the HTTP GET method is of an integer type
} 

We can also use the settype() function to force a variable to a particular type.


The Whitelist Approach

Whitelisting assumes that there will be a limited scope of validity in the data (such as an image uploader, where the file type is limited to that of images). We provide the only possibilities that the data can be, and anything else is discarded as invalid. This is commonly done with an array, where the in_array() function checks that a value exists within the array, and is therefore valid.


?

  
01
02
03
04
05
06
07
08
09
10
<?php
$languages = array('PHP', 'JavaScript', 'Ruby', 'Elixir');
$inputLanguage = 'VB.net';
if(in_array($inputLanguage, $languages, TRUE)) {
    // valid language
}else{
    // invalid language
}

  
<?php

$languages = array('PHP', 'JavaScript', 'Ruby', 'Elixir');
$inputLanguage = 'VB.net';

if(in_array($inputLanguage, $languages, TRUE)) {
    // valid language
}else{
    // invalid language
}

We could also use an if/elseif/else or switch statement - though these are more commonly used for flow control logic with simple comparisons, rather than for whitelisting potential values.


Tip:
Always give the third argument to the in_array() function (as TRUE) to preform a strict comparison, unless absolutely necessary. Performing a strict comparison (equivalent to tri-operator comparison: ===, !==) of value- as well as type-checking is important to prevent strange things from happening (check out the "The Mystery of Value Appearance" section of this article).

The opposite to the whitelist approach is to provide a blacklist of all unwanted values. This is done only if you know what possibilities aren't allowed, such as an IP address blacklist.


Regular Expressions

Regular expressions, or regex, are used for checking the format of input data and matching complex patterns. They should be used sparingly since they come at a cost of performance, but are a powerful and concise DSL (Domain-Specific Language) when used. They do require good knowledge of PCRE regex, and the patterns used should always be extensively tested before being deployed since their complexity can make it easy to slip-up.


Due to the amount of content there is to cover when teaching regex, it will have to be done in another tutorial. But for now, if you'd like to check out how to use regular expressions, then I'd recommend the following websites:

Escaping Output



Escaping output to prevent interpretation of it is a method of preservation that is carried out upon data exiting an application. There are two primary exits of data from an application: to the browser as client-side code, and to the database inside queries.



Powered by Blogger.