Overview of Handling Data with Best Practices in Mind
Handling user-supplied data is a big part of many web applications, and it's critical that this is done properly to prevent security holes. There are a number of best practices and principles we can follow when handling data, though I'll just be covering the ones I feel to be the most important:
Practices:
Principles:
Filtering Input
Filtering input should be done whenever applicable to prevent junk data from entering a web application. It is performed upon the data coming into an application where its validity is inspected. There's a number of ways we can filter our users' input, though the method you choose will be dependent upon the input data you're looking to manipulate. As such, I'll be running through just a few commonly used functions and libraries to give you more of an idea of how this inspection process works. I'll (try to) explicitly reference the practices and principles stated above when I use them.
Tip:
Ensure that you're always passing in strings to these functions, even if the values are numeric. This is because the PHP manual states:
"If an integer between -128 and 255 inclusive is provided, it is interpreted as the ASCII value of a single character (negative values have 256 added in order to allow characters in the Extended ASCII range). Any other integer is interpreted as a string containing the decimal digits of the integer."
Further Reading:
The function has two primary types of filtering: validation and sanitisation. Validation filtering will check for invalidity in the data, where FALSE is returned if data integrity is not met, and upon success the data is returned. Sanitisation filtering will attempt to replace any invalid characters and return the sanitised string (according to the filtering type used - this does not mean it is safe to exit it from your application without further sanitising).
Here's a few simple and common use-cases:
Further Reading:
Tip:
It's always best practice to perform strict comparisons because of the loosely-typed nature of PHP.
Type-Checking
Type-checking in PHP can be done with the is_ functions - a set of predicate functions that return TRUE if the type is correct, or FALSE otherwise. The following is a list of these functions:
There is also the instanceof operator, which will check that the left-hand operand object is of the same type as the right-hand operand object.
Support for type-hinting was first introduced in PHP 5, and has been a much-loved feature of the PHP community. Method parameters should take advantage of type hinting when possible because of the improved maintainability it provides, along with the less error-prone code it produces (that is also partially self-documenting). PHP supports the following types: objects, arrays (as of PHP 5.1), callables (as of PHP 5.4), and iterators. If a variable of the incorrect type is passed as an argument to a function, then a fatal error is produced.
Tip:
Always give the third argument to the in_array() function (as TRUE) to preform a strict comparison, unless absolutely necessary. Performing a strict comparison (equivalent to tri-operator comparison: ===, !==) of value- as well as type-checking is important to prevent strange things from happening (check out the "The Mystery of Value Appearance" section of this article).
The opposite to the whitelist approach is to provide a blacklist of all unwanted values. This is done only if you know what possibilities aren't allowed, such as an IP address blacklist.
Due to the amount of content there is to cover when teaching regex, it will have to be done in another tutorial. But for now, if you'd like to check out how to use regular expressions, then I'd recommend the following websites:
Escaping output to prevent interpretation of it is a method of preservation that is carried out upon data exiting an application. There are two primary exits of data from an application: to the browser as client-side code, and to the database inside queries.
Handling user-supplied data is a big part of many web applications, and it's critical that this is done properly to prevent security holes. There are a number of best practices and principles we can follow when handling data, though I'll just be covering the ones I feel to be the most important:
Practices:
- Treat all user data to be tainted until it has been validated; never assume the integrity of such data (guilty until proven innocent).
- Make your users apply by your validation rules. That is to say, do not attempt to correct any invalid data because this gives the potential for security vulnerabilities to arise.
- Keep track of data as it enters and exits parts of your application. This is critical in order to be able to tell what data is potentially tainted, and what data has been validated and is safe to use.
Principles:
- Minimise exposure of sensitive data. This covers not storing passwords in cookies, not using the HTTP GET method as a way of requesting passwords, not storing configuration files in the document root, and so on.
- Defence in Depth - the advocation of using redundant safeguards. This can help to improve the security of a web application through having additional levels of safeguards in-place (that should never have to be used, but are there just in case).
Filtering Input
Filtering input should be done whenever applicable to prevent junk data from entering a web application. It is performed upon the data coming into an application where its validity is inspected. There's a number of ways we can filter our users' input, though the method you choose will be dependent upon the input data you're looking to manipulate. As such, I'll be running through just a few commonly used functions and libraries to give you more of an idea of how this inspection process works. I'll (try to) explicitly reference the practices and principles stated above when I use them.
The Character Type Functions (ctype_)
The character type functions are from the Ctype extension, which is full of handy functions that can be used to validate user input. It does this by checking the characters of a string to see if they're of an appropriate type, much like a simplistic regular expression. All of the Ctype functions are known as predicate functions because they only return a boolean value (TRUE or FALSE). Here's a list of the Ctype functions:
01
02
03
04
05
06
07
08
09
10
11
| ctype_alnum() — Checks for alphanumeric character(s) ctype_alpha() — Checks for alphabetic character(s) ctype_cntrl() — Checks for control character(s) ctype_digit() — Checks for numeric character(s) ctype_graph() — Checks for any printable character(s) except space ctype_lower() — Checks for lowercase character(s) ctype_print() — Checks for printable character(s) ctype_punct() — Checks for any printable character which is not whitespace or an alphanumeric character ctype_space() — Checks for whitespace character(s) ctype_upper() — Checks for uppercase character(s) ctype_xdigit() — Checks for character(s) representing a hexadecimal digit |
ctype_alnum() — Checks for alphanumeric character(s) ctype_alpha() — Checks for alphabetic character(s) ctype_cntrl() — Checks for control character(s) ctype_digit() — Checks for numeric character(s) ctype_graph() — Checks for any printable character(s) except space ctype_lower() — Checks for lowercase character(s) ctype_print() — Checks for printable character(s) ctype_punct() — Checks for any printable character which is not whitespace or an alphanumeric character ctype_space() — Checks for whitespace character(s) ctype_upper() — Checks for uppercase character(s) ctype_xdigit() — Checks for character(s) representing a hexadecimal digit
Tip:
Ensure that you're always passing in strings to these functions, even if the values are numeric. This is because the PHP manual states:
"If an integer between -128 and 255 inclusive is provided, it is interpreted as the ASCII value of a single character (negative values have 256 added in order to allow characters in the Extended ASCII range). Any other integer is interpreted as a string containing the decimal digits of the integer."
Further Reading:
filter_var
The filter_var function accepts three arguments: the variable to validate, the filter to apply (a constant), and any optional flags to be set on the filter used. Some simple scenarios where you're going to want to use this function is for validating URLs and E-mails. Validating them with regular expressions is not a good idea, even if you know your way around them.The function has two primary types of filtering: validation and sanitisation. Validation filtering will check for invalidity in the data, where FALSE is returned if data integrity is not met, and upon success the data is returned. Sanitisation filtering will attempt to replace any invalid characters and return the sanitised string (according to the filtering type used - this does not mean it is safe to exit it from your application without further sanitising).
Here's a few simple and common use-cases:
<?php $email = 'valid@email.com'; if(filter_var($email, FILTER_VALIDATE_EMAIL) !== FALSE) { // valid email } $url = 'http://domain.tld'; if(filter_var($url, FILTER_VALIDATE_URL) !== FALSE) { // valid URL } $age = 20; $options = array('options' => array('min_range' => 18, 'max_range' => 100)); if(filter_var($age, FILTER_VALIDATE_INT, $options) !== FALSE) { // valid age }
1
2
3
4
5
6
7
8
| <?php $output = 'Protecting against XSS: <script>alert(0)</script>' ; echo filter_var( $output , FILTER_SANITIZE_FULL_SPECIAL_CHARS); $int = 3.3; echo filter_var( $int , FILTER_SANITIZE_NUMBER_INT); // 33 // Note that it omits invalid characters, rather than truncating the input like other integer-validating functions |
<?php $output = 'Protecting against XSS: <script>alert(0)</script>'; echo filter_var($output, FILTER_SANITIZE_FULL_SPECIAL_CHARS); $int = 3.3; echo filter_var($int, FILTER_SANITIZE_NUMBER_INT); // 33 // Note that it omits invalid characters, rather than truncating the input like other integer-validating functions
Further Reading:
It's all about Type
It's a well-known fact that PHP is a loosely-typed language. Data types do not need to be explicitly stated before variable definitions or function parameters, and method signature types do not need to be specified either. But that's not to say variable type is not important though.Tip:
It's always best practice to perform strict comparisons because of the loosely-typed nature of PHP.
Type-Checking
Type-checking in PHP can be done with the is_ functions - a set of predicate functions that return TRUE if the type is correct, or FALSE otherwise. The following is a list of these functions:
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
| is_ array is_bool is_callable is_double is_float is_int is_integer is_long is_null is_numeric is_object is_real is_resource is_scalar is_string |
is_array is_bool is_callable is_double is_float is_int is_integer is_long is_null is_numeric is_object is_real is_resource is_scalar is_string
There is also the instanceof operator, which will check that the left-hand operand object is of the same type as the right-hand operand object.
Type-hinting
Support for type-hinting was first introduced in PHP 5, and has been a much-loved feature of the PHP community. Method parameters should take advantage of type hinting when possible because of the improved maintainability it provides, along with the less error-prone code it produces (that is also partially self-documenting). PHP supports the following types: objects, arrays (as of PHP 5.1), callables (as of PHP 5.4), and iterators. If a variable of the incorrect type is passed as an argument to a function, then a fatal error is produced.
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
| <?php class TypeHinting { private $carObject ; private $accessories = array (); public function __construct(CarObjectInterface $carObject ) { $this ->carObject = $carObject ; } public function addAccessories( array $accessories ) { $this ->accessories = $accessories ; } } |
<?php class TypeHinting { private $carObject; private $accessories = array(); public function __construct(CarObjectInterface $carObject) { $this->carObject = $carObject; } public function addAccessories(array $accessories) { $this->accessories = $accessories; } }
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
| <?php class TypeHinting { private $carObject ; private $accessories = array (); public function __construct( $carObject ) { if (!( $carObject instanceof CarObjectInterface)) { trigger_error( 'Fatal error: wrong type!' , E_USER_ERROR); } $this ->carObject = $carObject ; } public function addAccessories( $accessories ) { if (! is_array ( $accessories )) { trigger_error( 'Fatal error: wrong type!' , E_USER_ERROR); } $this ->accessories = $accessories ; } } |
<?php class TypeHinting { private $carObject; private $accessories = array(); public function __construct($carObject) { if(!($carObject instanceof CarObjectInterface)) { trigger_error('Fatal error: wrong type!', E_USER_ERROR); } $this->carObject = $carObject; } public function addAccessories($accessories) { if(!is_array($accessories)) { trigger_error('Fatal error: wrong type!', E_USER_ERROR); } $this->accessories = $accessories; } }PHP does not, however, support type-hinting for scalars (string, int, boolean, float), or the resource and trait types (and probably never will because of its loosely-typed nature). There are solutions to support scalars in the comments section on PHP.net, though beware that some may slow down the performance of your PHP applications.
Type Casting
1
2
3
4
5
6
7
| (int), (integer) - cast to integer (bool), (boolean) - cast to boolean (float), (double), (real) - cast to float (string) - cast to string ( array ) - cast to array (object) - cast to object (unset) - cast to NULL |
(int), (integer) - cast to integer (bool), (boolean) - cast to boolean (float), (double), (real) - cast to float (string) - cast to string (array) - cast to array (object) - cast to object (unset) - cast to NULL
1
2
3
4
5
| <?php if (isset( $_GET [ 'id' ])) { $id = (int) $_GET [ 'id' ]; // ensure that the id from the HTTP GET method is of an integer type } |
<?php if(isset($_GET['id'])) { $id = (int) $_GET['id']; // ensure that the id from the HTTP GET method is of an integer type }We can also use the settype() function to force a variable to a particular type.
The Whitelist Approach
Whitelisting assumes that there will be a limited scope of validity in the data (such as an image uploader, where the file type is limited to that of images). We provide the only possibilities that the data can be, and anything else is discarded as invalid. This is commonly done with an array, where the in_array() function checks that a value exists within the array, and is therefore valid.
01
02
03
04
05
06
07
08
09
10
| <?php $languages = array ( 'PHP' , 'JavaScript' , 'Ruby' , 'Elixir' ); $inputLanguage = 'VB.net' ; if (in_array( $inputLanguage , $languages , TRUE)) { // valid language } else { // invalid language } |
<?php $languages = array('PHP', 'JavaScript', 'Ruby', 'Elixir'); $inputLanguage = 'VB.net'; if(in_array($inputLanguage, $languages, TRUE)) { // valid language }else{ // invalid language }We could also use an if/elseif/else or switch statement - though these are more commonly used for flow control logic with simple comparisons, rather than for whitelisting potential values.
Tip:
Always give the third argument to the in_array() function (as TRUE) to preform a strict comparison, unless absolutely necessary. Performing a strict comparison (equivalent to tri-operator comparison: ===, !==) of value- as well as type-checking is important to prevent strange things from happening (check out the "The Mystery of Value Appearance" section of this article).
The opposite to the whitelist approach is to provide a blacklist of all unwanted values. This is done only if you know what possibilities aren't allowed, such as an IP address blacklist.
Regular Expressions
Regular expressions, or regex, are used for checking the format of input data and matching complex patterns. They should be used sparingly since they come at a cost of performance, but are a powerful and concise DSL (Domain-Specific Language) when used. They do require good knowledge of PCRE regex, and the patterns used should always be extensively tested before being deployed since their complexity can make it easy to slip-up.Due to the amount of content there is to cover when teaching regex, it will have to be done in another tutorial. But for now, if you'd like to check out how to use regular expressions, then I'd recommend the following websites:
- Introduction to PHP Regex (a step-by-step tutorial)
- Rexegg (for detailed information on regex)
- Regular Expressions (another tutorial-based website)
Escaping Output
Escaping output to prevent interpretation of it is a method of preservation that is carried out upon data exiting an application. There are two primary exits of data from an application: to the browser as client-side code, and to the database inside queries.