Network Security Tools/Modifying and Hacking Security Tools/Extending Code Analysis to the Webroot

From WikiContent

< Network Security Tools | Modifying and Hacking Security Tools
Revision as of 22:54, 11 March 2008 by Docbook2Wiki (Talk)
(diff) ←Older revision | Current revision (diff) | Newer revision→ (diff)
Jump to: navigation, search
Network Security Tools

Few static source code analysis tools target security vulnerabilities in popular web application programming languages such as Java/JSP, VB.NET, C#, VBScript (i.e., Active Server Pages), PHP, and Perl. The same tools are common for more traditional languages such as C and C++. Each tool might differ in analysis-engine complexity and ruleset definitions, but the end goal is always the same: to find software flaws. These flaws can comprise poorly written code that results in low-quality software, or insecurely written code that results in security vulnerabilities. Tools designed to detect these flaws often support a single programming language and rely on a default set of rules. Unfortunately, most of these default rules provide little value, given the heterogeneous and custom nature of most production web applications. Typically, production webroots are littered with code written in a variety of scripting languages and contain code developed on object-oriented platforms such as J2EE and .NET. Few tried and true static analysis tools are available for scanning these languages, and those that do exist have few rules (if any) for identifying common web application vulnerabilities.

Despite these shortcomings, this chapter aims to show how you can leverage existing code analysis tools when performing web application security code reviews. To accomplish this goal, the chapter describes a testing approach driven by the identification of symptom code, investigates the freely available static source code analysis tool PMD (http://pmd.sourceforge.net/), and offers suggestions for adapting PMD to perform web application security code reviews.

Contents

Attacking Web Applications at the Source

Historically, network- and operating system-level vulnerabilities have been the sweet spot for attackers. These days, though, hardened firewalls, patched systems, and secure server configurations make these vulnerabilities less desirable than web applications. By their nature, web applications are designed to be convenient for the end user, and security is either overlooked or built in as an afterthought. Web developers lack the real-world security experience of battle-tested firewall and network administrators, who have been targeted by attackers for years. With little or no security experience, developers are unaware of the insecure coding practices that result in web application vulnerabilities. The solution is to test for these vulnerabilities before attackers find them.

The following are two of the most common testing approaches:

Black box
Via the user interface or any other external entry point, this approach pursues the attack vector that provides most of the unauthorized access to the application and/or underlying systems.
White box
Via access to application source code only, this approach identifies insecure coding practices.

The ideal strategy combines identifying insecure code at the source and verifying whether the identified code is exploitable through the user interface. This approach illustrates both the impact and cause of web application vulnerabilities. Access to the application source allows the tester to view the application's "true" attack surface.

Tip

In general, an application'sattack surface is any interface exposed by default. Attack surface in the context of a web application is any accessible page or file in the webroot, including all parameters accepted by that page or file.

When testing from the source code perspective, it's possible to identify every file, page, and page parameter available to attack. If you're testing solely through the user interface, you might miss a page parameter that is not part of the normal user experience and that provides privileged application access to those who know it exists. With access to the application source, you can more easily identify such back doors and remove them from the code base. In addition, the source answers questions about functionality not readily available through the user interface. For example, when fuzzing page parameters through the user interface, the application might respond with unanticipated behavior.

Tip

In the context of a web application, fuzzing entails systematically sending HTTP requests with header, cookie, and parameter values containing unexpected strings of varying character combinations, types, and lengths. Of interest are the HTTP response codes and the page content returned by the web application.

The tester can choose to spend time investigating application responses through the user interface, or dive straight into the source to reveal the actual code implementation. Both techniques might eventually yield the same result—a verified vulnerability. The latter of the two techniques has the added advantage of quickly finding not only the vulnerability but also its root cause.

On the other hand, access to a live instance of the application provides a means for verifying whether a piece of code is vulnerable, and more important, whether it is actually exploitable. This level of access provides other testing benefits as well. If you have access to only the application source, it can be difficult to know where to start looking for vulnerabilities. With access to the live application, you can build an initial map of the user experience. Typically, you do this by crawling through the web application using local proxy tools to log every request and response. With a map of the user experience defined by the request and response log, the tester can return to the source and more intelligently target specific areas of code. For example, the proxy logs contain URLs that often map to specific files and classes, providing a starting point for targeting the most relevant code.

When testing with access to the application source, a disciplined approach is required. Large webroots can swamp even the most experienced testers. An initial test plan that first targets high-risk code helps to avoid incomplete results and missed vulnerabilities. The next section of this chapter outlines a repeatable and measurable approach to source code analysis that strives to accomplish the following goals, in the order shown:

  1. Identify as many potential vulnerabilities/exposures as possible in the allotted time period.
  2. Target high-risk vulnerabilities first.
  3. Confirm the vulnerability through exploitation.

Before delving into details of how to satisfy these objectives, it's important to understand the architecture and code commonly seen when testing web applications.

Scope of a Web Application

Depending on its architecture and size, a production web application can reside on a single server or span across many different servers and tiers, as shown in Figure 6-1. Ideally, a production web application's source is grouped logically into presentation, business, and data layers and is separated physically across tiers. Anyone with experience testing web application security knows this is rarely the case. Table 6-1 provides a brief description of the types of code commonly found at each tier.

Table 6-1. Typical web application architecture

Tier Code description Example code
Client Client-side/mobile code. JavaScript, VBScript, ActiveX, Java applets
Frontend Hosts the user interface (UI)/presentation code. Can also contain business logic and data access code. ASP (VBScript), ASPX (C#/VB.NET), Java/JSP, PHP, Perl
Middle tier Hosts code implementing a company's business logic and data access code. C, C++, C#, VB.NET, Java
Backend Hosts code for the retrieval and storage of application data. Code can also implement business logic rules. T-SQL, PL/SQL, MySQL dialect


Figure 6-1. Typical web application architecture

Typical web application architecture

Symptomatic Code Approach

Given the complexity, size, and custom nature of production web applications, the previously outlined testing objectives might seem daunting. However, a defined test plan driven by the identification of symptom code provides the tester with a solid foundation for identifying initial high-risk code. As the tester becomes more familiar with the source, he can change the initial test plans to target discovered instances of insecure code. From the tester's perspective, knowing the types of code that result in one or more security vulnerabilities is the key to finding the causes of those vulnerabilities. The symptomatic code approach relies on the tester understanding not just the common web application vulnerabilities, but more importantly, the insecure coding practices that cause them.

Symptom Code

As the name of the approach implies, insecure coding practices or techniques that result in web application vulnerabilities are called symptoms or, more specifically,symptom code . To avoid confusion, the terms symptom codeand vulnerability are defined as follows:

Symptom code
Insecure code or coding practices which often lead to exposures or vulnerabilities in web applications. A symptom is not necessarily exploitable. A particular symptom can lead to single or multiple vulnerabilities.
Vulnerability
An exploitable symptom that allows an attacker to manipulate the application in a fashion that was not intended by the developer.

Table 6-2 provides a list of example symptoms and the potential vulnerabilities/attacks that stem from them. This list assumes the reader is already familiar with common web application vulnerabilities and attacks.

Table 6-2. Symptoms of common web application vulnerabilities/attacks

Symptom Vulnerability/attack
Dynamic SQL SQL injection
Dangerous functions Buffer overflows
Methods for executing commands Command injection
File I/O methods Arbitrary filesystem interaction (i.e., creation/deletion/modification/reading of any file)
Writing inline request objects Cross-site scripting
Cookie access methods Broken access control
Hardcoded plain-text passwords Unauthorized access, information leakage


The presence of a symptom doesn't guarantee the code has a particular vulnerability. Once you identify a symptom, you need to analyze the surrounding code to determine whether it is used in an insecure manner. For example, the presence of file I/O methods in the application source doesn't necessarily mean arbitrary filesystem interaction is possible. However, if the code uses a path location from an external input source to access the local filesystem, it will likely result in an arbitrary filesystem interaction vulnerability. With access to a live instance of the application, you can further verify the exploitability of this vulnerability.

The strength of an experienced tester is knowledge of symptom code and poor coding techniques that lead to application vulnerabilities. A skilled tester works with a defined set of insecure code instances, techniques, and conditions (similar to those shown in Table 6-1), which should be flagged at the beginning of a review. This list provides the tester with an initial test plan for quickly identifying easily exploited vulnerabilities. Then the tester can concentrate on less common vulnerabilities specific to the current application.

User-Controllable Input

Most web application vulnerabilities stem from poorly validated, user-controllable input—or any data accepted into the application, regardless of method or source. Typically, the data is sent between client and server in either direction and is completely controllable by the user, regardless of where in the HTTP(S) request it is found (GET/POST parameters, headers, etc.). When testing from the source, we might consider identifying each potential user input and tracing its data path through the code. Once the application accepts the input data it typically reassigns it to variables, carries it across multiple layers of code, and uses it in some transaction or database query. Eventually, the data might return to the user on a similar or alternative data path. The problem is that some paths might lead to symptom code, and others might not. In addition, applications with a large number of inputs increase the likelihood for multiple complex data paths, so tracing data paths from the point of input is inefficient. Given time-constrained testing windows, a more efficient approach is to target symptom code first and trace the paths of any related data out to sources of user-controllable input.

Toolkit 101

The symptomatic code approach requires a combination of manual and automated testing tools. At a minimum, these tools must include the following:

Source code viewer
The tester uses this tool, which typically is a text editor, to browse through the source or drill down a particular piece of code flagged by static analysis tools. When available, an Integrated Development Environment (IDE) is a powerful tool for quickly navigating through sources and tracing method-call hierarchies.
Vulnerability tracking database
This isn't a testing tool, but no discussion of source code analysis is complete without mentioning the need to track identified vulnerabilities. Tracking can range from recording issues in a simple text file to logging them in a bug-tracking database such as Bugzilla. At a minimum, the database should provide a place to document the vulnerability, including file location and line number of the insecure code, and steps for reproducing the vulnerability. Documenting this type of information can be a nuisance. You realize its true value only when presenting findings to management or developers.
Static analysis tools
These tools assist the tester by pointing to specific lines of code, which can be examined more closely within the source code viewer. Database-driven scanning tools that have plug-ins for popular IDEs are ideal. From the IDE console, they allow the tester to launch and view scan results as well as drill down on individual instances of flagged code with a single click.

Static analysis tools are the core component of the tester's toolkit. At a minimum, these tools employ pattern-matching technology common to utilities such as grep , and most database-driven source code scanning tools such as Flawfinder and RATS. Patterns constructed for these tools can represent a simple string or a complex regular expression. The primary benefit of a utility such as grep is ad hoc searches of the source, whereas scanning tools provide a default set of rules for identifying insecure code. Some scanning tools have knowledge about the semantics of the target code, allowing for more intelligent analysis than traditional pattern-matching utilities. grep is valuable when database-driven scanning tools are not available for the target source. This is often the case for web application scripting technologies such as Active Server Pages (VBScript).

The output from static analysis tools produced at the beginning of the review provide an initial road map for identifying known or suspected patterns of insecure code. These tools facilitate tracking down instances of custom code that the tester might otherwise notice only once he's familiar with the source. Compiling a robust symptom code database improves the effectiveness of static analysis tools.

Symptom Code Databases

A symptom code database serves as an initial test plan at the start of each code review and can be continuously updated as new symptoms are discovered. How you construct symptom code depends on which static analysis tool you use and the programming languages it supports. Pattern-matching tools describe symptom code as a combination of regular expressions, and you can build custom regular expressions for any programming language (VBScript, C#, VB.NET, Java, PHP, etc.). Table 6-3 is an updated version of Table 6-2 that includes examples of Perl 5 regular expressions representing potential Java symptom code.

This is not a complete list of potential symptom code regular expressions. In fact, some of these examples might produce false positives, and others might produce false negatives. All special characters that are to be treated as literals are escaped with the \ character.

Table 6-3. Java symptom code

Symptom Perl 5 regexes for Java code Vulnerability/attack
Dynamic SQL
select.+from
insert.+into
update.+set
SQL injection
Methods for executing commands
(Runtime|getRuntime\(\)){0,1}\.exec
Command injection
File I/O methods
new\s+(java\.io\.){0,1}File\s*\(
new\s+(java\.io\.){0,1}FileReader\s*\(
Arbitrary file creation, reading
Writing inline request objects
\<\s*%\s*=.+request
Cross-site scripting
Cookie access methods
getCookies
addCookie
Broken access control
Plaintext database connection strings
jdbc\:
Information leakage, unauthorized access


You should also build regular expressions to flag code that might indicate secure coding practices, such as possible sanitization attempts. By quickly identifying possible sanitization techniques, you might save time overall by avoiding blind exploitation attempts and tailoring attacks to subvert known validation logic. An example of this might be the inclusion of a single JSP file that houses methods for certain input validation routines:

\@\s*include\s+file\s*=\s*\"validate\.jsp\"

As you become more familiar with the code base during the review, you can tune the regular expressions to more accurately capture symptom code. For example, if the code is well documented, it might be useful to search for all instances of a particular developer's name. The analysis tool can run multiple times against the same source tree, revealing new symptom code on each pass. A systematic and iterative approach to source code analysis ensures greater code coverage, increased symptom code detection, and ultimately, real vulnerability identification.

Source code analysis tools and symptom code databases are just components of the symptomatic code approach and they can't find all vulnerabilities. The tool is only as good as its symptom database and the tester's ability to construct meaningful regular expressions. It's important to remember that source code analysis tools and symptom code databases are intended to equip and enable the tester, not to provide a complete solution.

PMD

Anyone who has ever performed a code review of C or C++ code is probably familiar with tools such as Flawfinder and RATS, which rely on pattern matching and have some understanding of the target code. Unfortunately, these tools have vulnerability databases geared primarily toward C and C++ and they are limited in scope.[1]

PMD is a static source code analysis tool for Java maintained by Tom Copeland at http://pmd.sourceforge.net. It performs a number of checks for poor coding practices, but it doesn't provide any rules for identifying common web application vulnerabilities. A detailed explanation of how PMD works is outside the scope of this chapter. Besides, Tom has already done a good job of it (see http://www.onjava.com/pub/a/onjava/2003/02/12/static_analysis.html). PMD's analysis engine converts each Java source file into a nodelike tree structure called an Abstract Syntax Tree (AST). Then rules can traverse or "visit" the AST using the Visitor pattern, looking for object patterns that represent problems in the code. The advantage of this technique over pattern-matching tools is that the source is broken into logical chunks or tokens , allowing for intelligent automated analysis of surrounding code.

PMD Rulesets

PMD comes prepackaged with a number of rules, but this tool's real strength is the ease with which you can create custom rules. The prepackaged rulesets deal primarily with software quality issues and include the following categories:

Basic Braces
Naming Code Size
Unused Code JavaBeans©
Design Finalizers
Import Statements Coupling
JUnit Tests Strict Exceptions
Strings Controversial


The next section builds an example rule to identify code symptomatic of SQL injection vulnerabilities. Although the focus is PMD, the important point is that any static analysis tool that supports custom rule creation can be extended in a similar way. The tester can leverage the existing analysis engine and rules of a particular tool and simply extend the rule base to incorporate web application code signatures. Ideally, you can add to the rule base (i.e., symptom code database) any code that causes application security issues by describing it in the tool's rule definition syntax.

A PMD ruleset is a XML file that consists of one or more rule elements. Each rule element consists of attributes and child elements, such as the following:

  • Name
  • Message
  • Class
  • Description
  • Priority
  • Example

The Class attribute points to the implementation of the rule logic, which can be written as a Java class file or as an XPath expression. A discussion of xpath is outside the scopt of this chapter, but plenty of good xpath resources are available on the internet. The other elements and attributes are informational and can be included in the resulting report. The following example describes a ruleset looking for dynamic SQL:

<?xml version="1.0"?>

<ruleset name="Dynamic SQL Ruleset">
  <desciption>
This ruleset contains a collection of rules that find instances of 
potentially exploitable dynamic SQL.
  </description>

  <rule name="DynamicSqlSelectStmts"
        message="DYNAMIC SQL ''{0}'' DETECTED"
        class="net.sourceforge.pmd.security.web.DynSqlSelectStmts">
    <description>
Dynamic SQL or "string building" techniques that rely on unsanitized input 
are potentially vulnerable to SQL Injection.
    </description>
      <priority>1</priority>
    <example>
<![CDATA[
...
int id = request.getParameter("id");
...
String sql = "select * from employees where employeeid = " + id;
...
]]>
    </example>
  </rule>

<!-- MORE RULES -->

</ruleset>

We'll visit this example rule in more detail in the Section 6.4 later in this chapter.

Installing and Running PMD

PMD runs on any Windows or *nix system with the following installed:

  • JDK 1.3 or higher
  • WinZip or the Unix zip utility from Info-ZIP

You can download PMD as either a binary or a source distribution at http://sourceforge.net/project/showfiles.php?group_id=56262. To install PMD from the command line:

C:\>unzip -q pmd-src-x.y.zi
                  p
C:\>cd pmd-x.y
               

To test PMD from the command line:

C:\pmd-x.y>cd etc
C:\pmd-x.y\etc> pmd ..\mysourcefile.java html ..\rulesets\basic.xml > out.html
               

You can also install PMD as a plug-in to many popular IDEs. Refer to the PMD SourceForge.net home page for a current list of supported IDEs. It's advantageous to run PMD within an IDE because the tester can immediately jump to vulnerable code, whereas from the command line PMD shows the line number and description of the offending code.

Extending PMD

PMD's default rulesets serve as a solid foundation for developing new rulesets to find common insecure coding practices. Initial rules should target code that's high-risk and easily exploitable. Dynamically building SQL statements with user-controllable input is a good example of high-risk code commonly vulnerable to SQL injection. Rule implementations should be adaptable to new or previously unseen custom code. This is an important feature, as web applications differ in how they implement common functionality, such as authentication, authorization, and data access.

What follows is a walkthrough of a web application security rule that flags symptom code commonly vulnerable to SQL injection—SQL select statements concatenated with user-controllable input. The DynSqlSelectStmts class implements the rule logic and is located in the net.sourceforge.pmd.rules.web.security package. This implementation doesn't cover every potential instance of dynamic SQL. It serves only as a guide for writing future security rules that target a variety of symptom code.

Objectives

The primary objective of DynSqlSelectStmts is to identify and report dynamically built SQL statements embedded in Java code. For each instance of dynamic SQL, the class identifies and reports any concatenated expressions, such as variables and method calls that return data. Then the rule traces each expression to determine whether they are linked to sources of user-controllable input. Ultimately, the rule generates a list of PMD security violations that serve as a test plan for manually verifying SQL Injection vulnerabilities.

For the purposes of this rule, method arguments/parameters in the source are considered user-controllable input. Therefore, data tracing from identified symptom code to sources of user-controllable input is limited to the scope of a single method.

Consider the simple doGet method in Example 6-1. Based on the outlined objectives, the class should accomplish the following, in the order shown:

  1. Identify and report the dynamic SQL statement.
  2. Identify the concatenated variable id, and trace it back to the request object (i.e., user-controllable input).
  3. Halt the trace upon reaching the request object and report the finding.

Tip

The request object is a source of user-controllable input and therefore warrants close manual inspection, especially if it's related to a security violation PMD has reported. In the big picture of code review, do not forget to investigate the response object for potential security vulnerabilities.

Example 6-1. Simplified doGet method

                     public void doGet(HttpServletRequest request, HttpServletResponse response)
                               throws ServletException, IOException {
 
    ...  
    String id = request.getParameter("id");
    ...  
    String strSql = "SELECT * FROM USERS WHERE ID = '" + id + "'";

Code Walkthrough

The following code walkthrough includes only the code most relevant to the functionality of the DynSqlSelectStmts class. To view the source code in its entirety, see Example 6-6 later in this chapter.

The SqlInjectionExample.java example source file (refer to Example 6-5) is referenced throughout the code walkthrough to help you better understand the class implementation of DynSqlSelectStmts.

The DynSqlSelectStmts class imports the net.sourceforge.pmd.ast package, much like other PMD rules. Where this rule differs is the org.apache.regexp package, which provides an API for building regular expressions.

                  package net.sourceforge.pmd.rules.web.security;

import net.sourceforge.pmd.ast.*;

import org.apache.regexp.*;
import java.util.*;
import java.text.MessageFormat;

Tip

You can download the regexp package from the Apache Jakarta Project web site at http://jakarta.apache.org. You can substitute this package for any other available Java API supporting regular expressions.

Similar to other PMD rules, DynSqlSelectStmts extends the AbstractRule base class. The debug class variable controls the printing of debug statements to standard out at runtime:

                  public class DynSqlSelectStmts extends AbstractRule {
    
    private static boolean debug = true;

Tip

If you're working with PMD for the first time, you'll want to leave debug set to true.

The PATTERN variable is initialized with the select.+from string—a regular expression to identify SQL select statements. The syntax of this regular expression matches strings containing the words select and from, with one or more of any character in between. You can adapt this regular expression to match other types of dynamic SQL, such as insert into or update statements.


                  private static final String PATTERN = "select.+from";

At first glance, the select.+from pattern seems like it will generate a large number of false positives. For example, commented code or HTML could contain strings with a similar pattern. Unlike pattern-matching tools, PMD can minimize these false positives. By generating an AST from the source file, our class can drill down on specific code (such as string concatenation), look at surrounding nodes for additional symptoms, and more accurately report whether a pattern is potentially vulnerable to SQL injection.

The next set of variables store data about the current method being visited in the AST. For simplicity these are prefixed with the description currMeth:


                  private String currMethName;
    private int currMethXsVis;
    private Map currMethParams;
    private String currMethSymptomCode;
    private List currMethExprsToChase;
    private List currMethVarsChased;

Here is a brief description of each variable:

currMethName
String representing the method name.
currMethXsVis
Maintains the number of visits to the method.
currMethParams
Stores the name and type of each method parameter as a set of key/value pairs.
currMethSymptomCode
String representing the SQL select statement potentially vulnerable to SQL Injection.
currMethExprsToChase
List of expressions (variables, method calls returning data, etc.) concatenated to the identified SQL statement.

currMethVarsChased
Maintains an ordered list of expressions that are initialized when tracing from symptom code to sources of user-controllable input.

Refer back to the doGet method in Example 6-1. After tracing from the SQL select statement to request.getParameter (user-controllable input), currMethVarsChased would contain the following:


Position Value
0
request.getParameter
1
id


The DynS qlSelectStmts class implements visit methods for each node of interest when traversing the AST. The remaining code walkthrough focuses on the visit method for each of the following AST nodes:

  • ASTCompilationUnit
  • ASTClassBodyDeclaration
  • ASTMethodDeclaration
  • ASTMethodDeclarator
  • ASTAdditiveExpression

Relevant sections of the SqlInjectionExample.java AST are shown with each visit method to help you follow the logic (refer to Example 6-5 for the full source code of this file). In addition, you might want to have the following:

  • EBNF grammar reference[2] (useful for understanding the structure of the AST)
  • AST Viewer and Designer utilities included with your downloaded PMD distribution (paste the Java source into these utilities to see the full AST)
  • PMD API documentation

ASTCompilationUnit

Here is the relevant node for ASTCompilationUnit from the SqlInjectionExample.java AST:

     CompilationUnit

The ASTCompilationUnit node is always the first node and the code contained in its visit method is executed for each source file scanned:

public Object visit(ASTCompilationUnit node, Object data)
{
         getInfo(node);
         printDebug("Rule: " + this.getName( ) + "\n\n");
         return super.visit(node,data);
}

Every visit method in the class begins with a call to getInfo, which retrieves the class name and scope of the node and prints this information to standard out. The printDebug method prints debug statements to standard out (if debug is set to true). To programmatically enable and disable debugging code, the setDebug convenience method is provided. By returning with a call to visit of the super class, DynSqlSelectStmts continues analyzing child nodes.

Tip

To make the code more readable, subsequent debug statements have been removed.

ASTClassBodyDeclaration

Here are the relevant nodes for ASTClassBodyDeclaration from the SqlInjectionExample.java AST:

     ClassBodyDeclaration
      MethodDeclaration:(public)

Because the identification of symptom code is on a per-method basis, visit looks at the immediate child node of ASTClassBodyDeclaration to check whether it is an instance of ASTMethodDeclaration. If it is not a method declaration, it returns null to avoid unnecessary visits to children nodes.


                     public Object visit(ASTClassBodyDeclaration node, Object data) 
    {
        getInfo(node);
        if (!(node.jjtGetChild(0) instanceof ASTMethodDeclaration))
        {
            return null;
        } 
        this.init( );

Tip

When traversing the AST, calls to getInfo are handy if you're unsure of the current node. To see this information printed to standard out, you must set debug to true.

Otherwise it continues with a call to init in preparation for the upcoming method. This init( ) method is shown in Example 6-2.

Example 6-2. init( ) method

private void init ( )
{
currMethName = "";
    currMethXsVis = 0;
    currMethParams = new HashMap( );
    currMethSymptomCode = "";        
    currMethExprsToChase = new ArrayList( );    
    currMethVarsChased = new LinkedList( );
}

By returning with a call to visit of the super class, DynSqlSelectStmts continues analyzing child nodes. The previous conditional statement guarantees the next visit is to ASTMethodDeclaration:


                     return super.visit(node,data);
    }

ASTMethodDeclaration

Here are the relevant nodes for ASTMethodDeclaration from the SqlInjectionExample.java AST:

      MethodDeclaration:(public)
       ResultType
       MethodDeclarator:doGet
        FormalParameters
         FormalParameter:(package private)
          Type:
           Name:HttpServletRequest
          VariableDeclaratorId:request
         FormalParameter:(package private)
          Type:
           Name:HttpServletResponse
          VariableDeclaratorId:response

The ASTMethodDeclaration node marks the beginning of the current method in the source:

                     public Object visit(ASTMethodDeclaration node, Object data) 
{
        getInfo(node);

On every visit to this node, currMethXsVis is incremented by 1. If it's the first visit the name of the current method is retrieved:

        currMethXsVis++;
        
        if (currMethXsVis == 1)
        {
            currMethName = ((ASTMethodDeclarator)node.jjtGetChild(1)).getImage( );
        }

When currMethXsVis is greater than 1, the class is in the midst of its data-chasing logic—i.e., tracing from symptom code to sources of user input. To avoid confusion, let's assume this is the first visit to the current method and defer an explanation of the else block until later:


                     else 
        {    
            List locVarDecList =                 
        (ArrayList)node.findChildrenOfType(ASTLocalVariableDeclaration.class);
            for (Iterator j = locVarDecList.iterator( ); j.hasNext( );) 
            {
                if (currMethExprsToChase.size( ) > 0)
                   chkLocVarsForUCI((ASTLocalVariableDeclaration)j.next( ),data);
                else
                     break;
            }
            
            return null;
        }

As mentioned before, a call to visit of the super class ensures our class traverses the children of this node:

        return super.visit(node,data);
    }

ASTMethodDeclarator

Here are the relevant nodes for ASTMethodDeclarator from the SqlInjectionExample.java AST:

       MethodDeclarator:doGet
        FormalParameters
         FormalParameter:(package private)
          Type:
           Name:HttpServletRequest
          VariableDeclaratorId:request
         FormalParameter:(package private)
          Type:
           Name:HttpServletResponse
          VariableDeclaratorId:response

On the first visit to ASTMethodDeclarator, visit retrieves a list of method arguments/parameters with a call to getCurrMethParams:


                     public Object visit(ASTMethodDeclarator node, Object data) 
    {
        getInfo(node);
        
        if (currMethXsVis == 1)
        {
            getCurrMethParams(node);
        }

getCurrMethParams begins with a call to getParameterCount, which returns the parameter count for the method in scope. If this number is greater than zero, the code retrieves each parameter represented by the ASTFormalParameter class and stores the name and type as key/value pairs in currMethParams. This list of parameters represents the sources of user-controllable input for the current method.

private void getCurrMethParams (ASTMethodDeclarator node)
{ 
 if (node.getParameterCount( ) > 0) 
 {
  List methodParams = node.findChildrenOfType(ASTFormalParameter.class);
  for (Iterator i = methodParams.iterator( );i.hasNext( );) 
  {
   ASTFormalParameter p = (ASTFormalParameter)i.next( );    
   ASTName pType = (ASTName)p.jjtGetChild(0).jjtGetChild(0);
   ASTVariableDeclaratorId pName = (ASTVariableDeclaratorId)p.jjtGetChild(1);
currMethParams.put(pName.getImage( ),pType.getImage( ));
  }
 }
}

After calling getCurrMethParams, the visit method resumes execution with a call to visit of the super class:


                     return super.visit(node,data);
    }

With a list of all user-controllable input for the current method, the class could trace each parameter through the AST. The data paths for some of the parameters might lead to symptom code. Methods with a large number of parameters and the likelihood for multiple complex data paths make this approach inefficient. Instead, the class takes a more direct approach by targeting indicators of high-risk code first—i.e., the symptomatic code approach . It navigates down the AST, visiting ASTAdditiveExpression nodes, because these are indicators of string concatenation and, more specifically, the dynamic building of SQL statements. Further analysis is required to confirm this assumption as well as to chase any expressions concatenated to sources of user-controllable input. A benefit of this approach is that methods without parameters are still analyzed for the presence of dynamic SQL.

Pursuing indicators of high-risk code, as opposed to vulnerabilities themselves, enables the tester (and not the tool) to decide if code is vulnerable and exploitable. The added functionality of tracing from symptom code to user-controllable input is valuable because:

  • It saves the tester from having to perform this task manually.
  • It provides information about the data path to further assist the tester when making decisions about the exploitability of a piece of code.

ASTAdditiveExpression

Here are relevant nodes for ASTAdditiveExpression from the SqlInjectionExample.java AST:

AdditiveExpression:+
  PrimaryExpression
   PrimaryPrefix
    Literal:"SELECT * FROM USERS WHERE ID = '"
  PrimaryExpression
   PrimaryPrefix
    Name:id
  PrimaryExpression
   PrimaryPrefix
    Literal:"'"

To hone in on dynamic SQL, the class visits ASTAdditiveExpression nodes and its children ASTLiteral and ASTName nodes. The visit method begins by searching down the AST for ASTLiteral nodes because<tt/>they are likely to contain SQL strings. If it finds any, the code extracts the string stored by the node and passes this value to isMatch:

                     public Object visit(ASTAdditiveExpression node, Object data) {
        
        getInfo(node);
        
        List literals = node.findChildrenOfType(ASTLiteral.class);
        
        for (Iterator l = literals.iterator( ); l.hasNext( );) 
        {
            ASTLiteral astLiteral = (ASTLiteral)l.next( );
            String literal = astLiteral.getImage( );
            if (literal != null && isMatch(literal))
            {

The isMatch method relies on the regular expression stored in the PATTERN variable to detect the presence of SQL select statements. The org.apache.regexp.RE class creates the regular expression and matches it against each literal. The method sets the case-independent flag because the case sensitivity of SQL statements often varies with code implementation. A successful match returns the Boolean true, indicating the existence of dynamic SQL in the source:

private boolean isMatch(String literal)
{
    boolean match = false;
     
    RE sql = new RE(PATTERN);
     
    sql.setMatchFlags(RE.MATCH_CASEINDEPENDENT);
        
    return sql.match(literal);
}

When isMatch returns true, the class prepares to add a security violation to the PMD report. The SQL literal is stored for future reference and is added to the message of the current security violation:

                    RuleContext ctx = (RuleContext) data;
                    currMethSymptomCode = literal;
                    String msg = MessageFormat.format(getMessage( ), new
                        Object[]{"SQL select statement detected: " + 
                            currMethSymptomCode});

The format method of java.text.MessageFormat customizes the generic message in dynamicsql.xml , as in Example 6-3, by including the identified symptom code, which in this case is an SQL select statement.

Example 6-3. Snippet from dynamicsql.xml

<ruleset>
    <rule name="DynSqlSelectStmts" message="'' {0} ''" class="net.sourceforge.pmd.
rules.web.security.DynSqlSelectStmts">
...

The next line of code actually adds the security violation to the PMD report:

ctx.getReport( ).addRuleViolation(createRuleViolation(ctx, 
   astLiteral.getBeginLine( ), msg));

At this point the class implementation satisfies its primary objective: to identify and report dynamically built SQL statements. The next task is to identify expressions concatenated to the dynamic SQL and determine whether they contain user-controllable input. Examples of these expressions include method parameters, local variables, and methods calls that return data, each a potential source of user-controllable input. Examples from the doGet method (Example 6-1) include the following:

Method parameter
Request
Local variable
id
Method that returns data
request.getParameter


In the AST, ASTName nodes represent these expressions and are therefore retrieved for analysis:

List names = node.findChildrenOfType(ASTName.class);

If the list size is greater than zero, the entire list is passed to chkForUCI to determine whether any of the expressions are a source of user-controllable input:

                     if ( names.size( ) > 0 )
{
     ArrayList uci = chkForUCI(names);

The chkForUCI method, shown in Example 6-4, compares each ASTName node to those stored in the currMethParams class variable. Although the nodes can refer to the same instance of an object, they are not always identical expressions. For example, a method parameter named request of type HttpServletRequest could appear in an ASTName node in these forms: request.getParameter, request.getQueryString, request.getCookies, request.getHeader, etc. To determine whether these represent sources of user-controllable input, the class could compare them against a list of HttpServletRequest methods known to retrieve user-controllable input from an HTTP request. While exact-match comparisons are ideal for well-known objects (such as HttpServletRequest), the technique falls short when looking for representations of user-controllable input with unfamiliar or custom objects. Instead, the org.apache.regexp.RE regular expression evaluator class is used to compare method parameters (i.e., request) to specific uses of those objects (i.e., request.getParameter). The most effective approach is a combination of exact match and regular expression comparisons.

Example 6-4. chkForUCI( )

private ArrayList chkForUCI(List names) 
{ 
 ArrayList uci = new ArrayList( );
 for (Iterator i = names.iterator( );i.hasNext( );) 
 {
  ASTName name = (ASTName)i.next( );
  for (Iterator j = currMethParams.keySet( ).iterator( );                                 
    j.hasNext( );) 
  {
   String currMethParam = (String)j.next( );
   RE re = new RE (currMethParam);
   if ( re.match(name.getImage( )) ) 
   {
    uci.add(name);
    break;
   }
  }
 }
  
 return uci;
}

chkForUCI returns a list of ASTName nodes that represent user-controllable input linked to the previously identified SQL select statement. These symptoms point to the existence of a potentially exploitable SQL Injection vulnerability in the source.

Next, the code reports the security violation along with the appended user-controllable input, which is similar to that already described. At this point, the rule has satisfied the objective: to identify and report user-controllable input concatenated to dynamic SQL statements.

                     if ( ! uci.isEmpty( ) )
{
  // Report the violation
}

Tip

The following SQL statement would be reported as a potentially exploitable SQL Injection vulnerability.

String strSql = "select * from user where USER_ID = '" + request.getParameter("id") + "'";

If chkForUCI returns an empty list, none of the expressions concatenated to the SQL statement represents immediate sources of user-controllable input (for example, id in the previous AST). However, these expressions might be on a data path that traces back to user-controllable input. To kick off the data-tracing logic, the code stores the expressions (ASTName nodes) into currMethExprsToChase and revisits the ASTMethodDeclaration node (refer to the next section, Section 6.4.2.6, to step through this code):


                     else 
 {
   currMethExprsToChase = new ArrayList(names);
   visit( (ASTMethodDeclaration)         
node.getFirstParentOfType(ASTMethodDeclaration.class),data);

When visit returns, the data-chasing logic is complete for the expressions in currMethExprsToChase. The calls to init() and super.visit mark the end of analysis for the method in scope and allow the class to visit the next available ASTClassBodyDeclaration node:


                     this.init( );
                        }
                    }
                }
            }
        }
        
        return 
                     super.visit(node,data);
    }

Data tracing

The data-tracing logic presented in this section follows data paths that are linked by consecutive variable initializations. Demonstrating this technique should give you an idea of how to implement data tracing for other potential scenarios.

Here are the relevant nodes from the SqlInjectionExample.java AST:

 LocalVariableDeclaration:(package private)
  Type:
   Name:String
  VariableDeclarator
   VariableDeclaratorId:id
   VariableInitializer
    Expression
     PrimaryExpression
      PrimaryPrefix
       Name:request.getParameter

Revisiting the ASTMethodDeclaration method diverts execution to the else block, which retrieves a list of ASTLocalVariableDeclaration nodes and passes each to chkLocVarsForUCI:


                     public 
                     void chkLocVarsForUCI(ASTLocalVariableDeclaration node, Object data) 
    {

This method retrieves the name of the local variable declaration from ASTVariableDeclaratorId (id in the AST) and stores it in varName:

        ASTVariableDeclarator varDec = (ASTVariableDeclarator)
                                                 node.jjtGetChild(1);
        String varName =
                ((ASTVariableDeclaratorId)varDec.jjtGetChild(0)).getImage( );

Then the code looks for the expression initializing the local variable (such as request.getParameter in the AST). If an ASTName node is found, the method stores the expression into initExp; otherwise, it returns to visit to analyze the remaining ASTLocalVariableDeclaration nodes:

        ASTVariableInitializer varInit =
                         (ASTVariableInitializer)varDec.jjtGetChild(1);
        
        If (varInit.findChildrenOfType(ASTName.class).get(0) instanceof
                                                            ASTName) 
        {
            ASTName initExp = (ASTName) 
                           varInit.findChildrenOfType(ASTName.class).get(0);
        } else {
            return;
        }

Assuming an ASTName node is retrieved, the code iterates over currMethExprsToChase (which would contain id after visiting ASTAdditiveExpression), comparing each expression to the local variable stored in varName (id in this case). A match means the class found the initialization of the expression concatenated to the dynamic SQL:


                     boolean chase = false;
        boolean srcOfUCI = false;
        int cnt = 0;
        int index = 0;
        for (Iterator i = currMethExprsToChase.iterator( ); i.hasNext( );) 
        {
            ASTName currNode = (ASTName)i.next( );
            if ( currNode.getImage( ).matches(varName) )
            {

Tip

The chase Boolean variable controls whether additional data tracing is required (i.e., the initializing expression for the local variable is not user-controllable input) and srcOfUCI triggers the reporting code if the initializing expression is a source of user-controllable input. The cnt integer tracks the current position in the currMethExprsToChase array. index stores the value of cnt when either chase or srcOfUCI is set to true.

If varName matches the name of an expression in currMethExprsToChase, the variable is added to the end of currMethVarsChased and the initializing expression initExp (request.getParameter in this case) is checked as a source of user-controllable input. This implementation of chkForUCI is an overloaded version of the previously discussed chkForUCI. It takes a single ASTName node as an argument and returns a string containing the user-controllable input, if the passed-in node matches one in currMethParams (as a result of visiting ASTMethodDeclarator, currMethParams would contain the request object and match the initializing expression request.getParameter identifying it as a source of user-controllable input):

                ((LinkedList)currMethVarsChased).addLast(currNode.getImage( ));
                String uci = chkForUCI(initExp);

Given that uci is not null, srcOfUCI is set to true, triggering the following block of code that reports initExp as user-controllable input. The index integer stores the current position in the currMethExprsToChase array so that the previously matched expression (id in this case) can be removed, as it no longer needs to be chased. The break keyword exits the loop.


                     if (uci != null)
                {        
                    srcOfUCI = true;
                    index = cnt;
                    break;
                }

If uci is null (i.e., initExp is not a source of user-controllable input), chase is set to true, which repeats the data-tracing code for initExp. Similar to the preceding if block, the index integer stores the current position in the currMethExprsToChase array so that its contents can be replaced with initExp, as this initializing expression now needs to be chased. The break keyword exits the loop.


                     else 
                {
                    chase = true;
                    index = cnt;
                    break;                }
            }
        }
        cnt++;
    }

If srcOfUCI is true, the local variable initialized with initExp is removed from currMethExprsToChase and initExp is added to the end of currMethVarsChased. The initializing expression is also added to the PMD report as a source of user-controllable input, making the previously identified dynamic SQL statement a likely SQL Injection candidate.

Tip

Remember, you can verify this vulnerability with access to a live instance of the application.


                     if (srcOfUCI)
    {
        ((ArrayList)currMethExprsToChase).remove(index);
                
        ((LinkedList)currMethVarsChased).addLast(initExp.getImage( ));
        
        // Report the violation
        
        currMethVarsChased = new LinkedList( );
            
    }

If chase is true, currMethExprsToChase is updated with initExp and the data-chasing logic is repeated with a new call to the ASTMethodDeclaration visit method. This last method call of the data-chasing routine ensures that the rule continues to trace variable initializations until the original source of user-controllable input is found.


                     else 
                     if (chase)
    {
        ((ArrayList)currMethExprsToChase).remove(index);

        ((ArrayList)currMethExprsToChase).add(index,initExp);

        visit( 
            (ASTMethodDeclaration)node.getFirstParentOfType
                                (ASTMethodDeclaration.class), data);
    }
}

To illustrate this new rule in action, Figure 6-2 shows the report PDM generated when scanning SqlInjectionExample.java.

Figure 6-2. PMD report for SqlInjectionExample.java

PMD report for SqlInjectionExample.java

In summary, the DynSqlSelectStmts class is designed to help testers find exploitable SQL Injection vulnerabilities by flagging instances of dynamic SQL and tracing backward to determine whether the symptom code is tied to sources of user-controllable input. The concepts, ideas, and code examples provided in this chapter should supply the groundwork for building future security rules that target a variety of symptom code, regardless of the static code analysis tool you use.

SqlInjectionExample.java

Example 6-5 provides the full source code of the SqlInjectionExample.java example discussed in this chapter.

Example 6-5. Source code for SqlInjectionExample.java

import java.io.*;
import java.sql.*;
import javax.servlet.*;
import javax.servlet.http.*;

public class SqlInjectionExample extends HttpServlet {

  public void doGet(HttpServletRequest request, HttpServletResponse response)
                               throws ServletException, IOException {
    Connection con = null;
    Statement stmt = null;
    ResultSet rs = null;

    response.setContentType("text/html");
    PrintWriter out = response.getWriter( );

    String id = request.getParameter("id");

    try {

      Class.forName("oracle.jdbc.driver.OracleDriver");

      con = DriverManager.getConnection(
        "jdbc:oracle:thin:@dbhost:1521:ORCL", "user", "passwd");

      String strSql = "SELECT * FROM USERS WHERE ID = '" + id + "'";

      stmt = con.createStatement( );

      rs = stmt.executeQuery(strSql);

      out.println("<HTML><HEAD><TITLE>SqlInjectionExample</TITLE></HEAD>");
      out.println("<BODY>");
      while(rs.next( )) {
        out.println(rs.getString("firstname") + "&nbsp;" + rs.getString("lastname"));
      }
      out.println("</BODY></HTML>");
    }
    catch(ClassNotFoundException e) {
      out.println("Couldn't load database driver: " + e.getMessage( ));
    }
    catch(SQLException e) {
      out.println("SQLException caught: " + e.getMessage( ));
    }
    finally {

      try {
        if (con != null) con.close( );
      }
      catch (SQLException ignored) { }
    }
  }
}

DynSqlSelectStmts.java

Example 6-6 provides the full source code of the DynSqlSelectStmts.java example discussed in this chapter.

Example 6-6. Source code for DynSqlSelectStmts.jav

package net.sourceforge.pmd.rules.web.security;

import net.sourceforge.pmd.AbstractRule;
import net.sourceforge.pmd.ast.*;
import net.sourceforge.pmd.RuleContext;
import org.apache.regexp.*;
import java.util.*;
import java.text.MessageFormat;

public class DynSqlSelectStmts extends AbstractRule {

  private static boolean debug = true;

  private static final String PATTERN = "select.+from";

  private String currMethName;
  private int currMethXsVis;
  private Map currMethParams;
  private String currMethSymptomCode;
  private List currMethExprsToChase;
  private List currMethVarsChased;

  private void init ( )
  {
    currMethName = "";
    currMethXsVis = 0;
    currMethParams = new HashMap( );
    currMethSymptomCode = "";
    currMethExprsToChase = new ArrayList( );
    currMethVarsChased = new LinkedList( );
  }

  public void setDebug (boolean x)
  {
    debug = x;
  }

  public void printDebug (String str)
  {
    if (debug)
      System.out.print(str + "\n");
  }
  public Object visit(ASTCompilationUnit node, Object data)
  {
    getInfo(node);
    printDebug("Rule: " + this.getName( ) + "\n\n");
    return super.visit(node,data);
  }
  public Object visit(ASTClassBodyDeclaration node, Object data)
  {
    getInfo(node);

    if (!(node.jjtGetChild(0) instanceof ASTMethodDeclaration))
    {
      return null;
    }

    this.init( );

        return super.visit(node,data);
  }

  public Object visit(ASTMethodDeclaration node, Object data)
  {
      getInfo(node);
      currMethXsVis++;
      printDebug ("Number of visits to " + node.getClass( ).getName( ) + ": " + currMethXsVis +
        "\n");

      if (currMethXsVis == 1)
      {
        currMethName = ((ASTMethodDeclarator)node.jjtGetChild(1)).getImage( );
        printDebug ("Current Method: " + currMethName + "\n");
      }

      else
      {
        List locVarDecList = (ArrayList)node.findChildrenOfType
           (ASTLocalVariableDeclaration.class);
        for (Iterator j = locVarDecList.iterator( ); j.hasNext( );)
        {
          if (currMethExprsToChase.size( ) > 0)
            chkLocVarsForUCI((ASTLocalVariableDeclaration)j.next( ),data);
          else
            break;
        }

        return null;
      }

      return super.visit(node,data);
    }

  public Object visit(ASTMethodDeclarator node, Object data)
  {
    getInfo(node);

    if (currMethXsVis == 1)
    {
      getCurrMethParams(node);
      printCurrMethParams( );
    }
    return super.visit(node,data);
  }


  public Object visit(ASTAdditiveExpression node, Object data)
  {

    getInfo(node);

    List literals = node.findChildrenOfType(ASTLiteral.class);

      for (Iterator l = literals.iterator( ); l.hasNext( );)
      {
        ASTLiteral astLiteral = (ASTLiteral)l.next( );
        String literal = astLiteral.getImage( );
        printDebug("Literal: " + literal + "\n");

        if (literal != null && isMatch(literal))
        {
          RuleContext ctx = (RuleContext) data;
          currMethSymptomCode = literal;
          String msg = MessageFormat.format(getMessage( ), new Object[]
            {"SQL select statement detected: " + currMethSymptomCode});
          printDebug("Report message: " + msg + "\n");
          ctx.getReport( ).addRuleViolation(createRuleViolation
            (ctx, astLiteral.getBeginLine( ), msg));

          // Look for expression(s) other than literals appended to SQL
          List names = (ArrayList) node.findChildrenOfType(ASTName.class);
          if ( names.size( ) > 0 )
          {
            // Check whether the appended expression(s) are UCI
            List uci = chkForUCI(names);
            if ( ! uci.isEmpty( ) )
            {
              for (Iterator i = uci.iterator( );i.hasNext( );)
              {
                ASTName n = (ASTName)i.next( );
                msg = MessageFormat.format(getMessage( ), new Object[]
                 {"SQL select statement detected with UCI: " + n.getImage( )});
                printDebug("Report message: " + msg + "\n");
                ctx.getReport( ).addRuleViolation
                  (createRuleViolation(ctx, astLiteral.getBeginLine( ), msg));
              }
            }

            /*
             * Expression(s) appended to SQL are not immediate source of UCI
             * Re-visit method declaration to begin logic for finding initializer of UCI
             */

            else
            {
              printDebug ("Expression(s) appended to SQL are not immediate source of 
                UCI\n\n");
              currMethExprsToChase = new ArrayList(names);
              printDebug("*** Begin expression chasing routine *** \n\n");
              visit( (ASTMethodDeclaration) node.getFirstParentOfType
                  (ASTMethodDeclaration.class),data);
              printDebug("... Exiting from visit - ASTAdditiveExpression ...\n");
              printDebug("*** Returning from expression chasing routine ... 
                        Done with this ASTAdditiveExpression ... any more?? ***\n\n");
              this.init( );
            }
          }

        }
      }

      return super.visit(node,data);
  }

  public void chkLocVarsForUCI(ASTLocalVariableDeclaration node, Object data)
  {
    getInfo(node);

    printCurrMethExprsToChase( );

    ASTVariableDeclarator varDec = (ASTVariableDeclarator)node.jjtGetChild(1);
    String varName = ((ASTVariableDeclaratorId)varDec.jjtGetChild(0)).getImage( );
    printDebug("Local Variable Name: " + varName + "\n");

    ASTVariableInitializer varInit = (ASTVariableInitializer)varDec.jjtGetChild(1);

    ASTName initExp = null;
    if (varInit.findChildrenOfType(ASTName.class).size( ) 
        > 0 && varInit.findChildrenOfType(ASTName.class).get(0) instanceof ASTName)
    {
      initExp = (ASTName) varInit.findChildrenOfType(ASTName.class).get(0);
      printDebug("Local Variable Initializer: " + initExp.getImage( ) + "\n");
    } else {
      return;
    }

    boolean chase = false;
    boolean srcOfUCI = false;
    int cnt = 0;
    int index = 0;
    for (Iterator i = currMethExprsToChase.iterator( ); i.hasNext( );)
    {
      ASTName currNode = (ASTName)i.next( );
      printDebug("Checking: " + currNode.getImage( ) + "\n");
      if ( currNode.getImage( ).matches(varName) )
      {
        printDebug("Loc var: " + varName + " matches '" + currNode.getImage( ) + "', which is
           an expression we are currently chasing\n");
        ((LinkedList)currMethVarsChased).addLast(currNode.getImage( ));
        String uci = chkForUCI(initExp);
        if (uci != null)
        {
          printDebug("Initializing expression: " + initExp.getImage( ) + " is a source of UCI:
            [" + uci + "]\n");
          srcOfUCI = true;
          index = cnt;
          break;
        }
        else
        {
          printDebug("Need to chase the local var initializer: '" 
                    + initExp.getImage( ) + "'\n");
          chase = true;
          index = cnt;
          break;
        }
      }
      cnt++;
    }

    if (srcOfUCI)
    {
      ((ArrayList)currMethExprsToChase).remove(index);

      /* Add uci - Appending the ASTLiteral node with the expectation that the source
       * of uci is from HttpServletRequest ( i.e. something like req.getParameter("id") ).
       * This will not always be the case, and so will have to make this 
         a little more generic.
       */

      ASTLiteral lit = (ASTLiteral)node.findChildrenOfType(ASTLiteral.class).get(0);
      ((LinkedList)currMethVarsChased).addLast(initExp.getImage( ) 
        + "(" + lit.getImage( ) + ")");
      String uciChased = printCurrMethVarsChased( );

      RuleContext ctx = (RuleContext) data;
      String msg = MessageFormat.format(getMessage( ), new Object[]
        {"SQL select statement detected with UCI: " + uciChased });
      printDebug("Report message: " + msg + "\n");
      ctx.getReport( ).addRuleViolation(createRuleViolation(ctx, lit.getBeginLine( ), msg));
      currMethVarsChased = new LinkedList( );

    } else if (chase)
    {
      ((ArrayList)currMethExprsToChase).remove(index);

      ((ArrayList)currMethExprsToChase).add(index,initExp);

      visit( (ASTMethodDeclaration)node.getFirstParentOfType
        (ASTMethodDeclaration.class),data);
      printDebug("... Exiting from chkLocVarsForUCI\n");
    }


  }

  public void getInfo (SimpleNode node)
    {
    printDebug ("\n====================");

    Object o = node;
    Class c = o.getClass( );
    printDebug ("Class Name: " + c.getName( ));

    int begLine = node.getBeginLine( );
    if (begLine != 0)
    {
      printDebug("Line #: " + begLine);
    }

    }

  private void getCurrMethParams (ASTMethodDeclarator node)
  {
   if (node.getParameterCount( ) > 0)
   {
    List methodParams = node.findChildrenOfType(ASTFormalParameter.class);
    for (Iterator i = methodParams.iterator( );i.hasNext( );)
    {
     ASTFormalParameter p = (ASTFormalParameter)i.next( );
     ASTName pType =   (ASTName)p.jjtGetChild(0).jjtGetChild(0);
     ASTVariableDeclaratorId pName =   (ASTVariableDeclaratorId)p.jjtGetChild(1);
     currMethParams.put(pName.getImage( ),pType.getImage( ));
    }
   }
  }

  private void printCurrMethParams ( )
  {
    for (Iterator i = currMethParams.keySet( ).iterator( ); i.hasNext( );)
    {
        String key = (String)i.next( );
        String value = (String)currMethParams.get(key);
        printDebug ("Param Name: " + key + ", Param Type: " + value);
    }
  }

  private void printCurrMethExprsToChase ( )
  {
    printDebug ("Chasing the following expressions:\n");
    for (Iterator i = currMethExprsToChase.iterator( ); i.hasNext( );)
    {
        String value = ((ASTName)i.next( )).getImage( );
        printDebug (value + "\n");
    }
  }

  private String printCurrMethVarsChased ( )
  {
    printDebug ("Chased the following variables to UCI: " + currMethVarsChased.size( ) 
+ "\n");
    String str = "";
    for (Iterator i = currMethVarsChased.iterator( ); i.hasNext( );)
    {
        String value = (String)i.next( );
        if (i.hasNext( ))
        {
          str = str + (value + " --> ");
        }
        else
        {
          str = str + value;
        }
    }

    printDebug(str + "\n");
    return str;
  }

  private boolean isMatch(String literal)
  {
   boolean match = false;

   RE sql = new RE(PATTERN);

   sql.setMatchFlags(RE.MATCH_CASEINDEPENDENT);

   return sql.match(literal);

  }

  private List chkForUCI(List names)
  {
   List uci = new ArrayList( );
   for (Iterator i = names.iterator( );i.hasNext( );)
   {
    ASTName name = (ASTName)i.next( );
    for (Iterator j = currMethParams.keySet( ).iterator( ); j.hasNext( );)
    {
     String currMethParam = (String)j.next( );
     RE re = new RE (currMethParam);
     if ( re.match(name.getImage( )) )
     {
      uci.add(name);
      break;
     }
    }
   }
   return uci;
  }

    private String chkForUCI(ASTName name)
    {
      for (Iterator j = currMethParams.keySet( ).iterator( );                     
  j.hasNext( );)
      {
       String currMethParam = (String)j.next( );
       RE re = new RE (currMethParam);
       if ( re.match(name.getImage( )) )
       {
        return currMethParam;
       }
      }
      return null;
     }
}

dynamicsql.xml

Example 6-7 provides the rule file that is used with Example 6-6.

Example 6-7. Rule file used with DynSqlSelectStmts.java

<?xml version="1.0"?>

<ruleset name="Dynamic SQL Ruleset">
  <description>
This ruleset contains a collection of rules that find instances of potentially 
exploitable dynamic SQL.
  </description>

  <rule name="DynamicSqlSelectStmts"
        message="'' {0} ''"
        class="net.sourceforge.pmd.rules.web.security.DynSqlSelectStmts">
    <description>
Dynamic SQL or "string building" techniques that rely on unsanitized input values 
are potentially vulnerable to SQL Injection.
    </description>
      <priority>1</priority>
    <example>
<![CDATA[

int id = request.getParameter("id");

String sql = "select * from employees where employeeid = " + id;

]]>
    </example>
  </rule>

<!-- MORE RULES -->

</ruleset>


Notes

  1. In addition to C and C++, RATS also scans Perl, PHP, and Python code.
  2. You can review the EBNF grammar reference at http://cvs.sourceforge.net/viewcvs.py/pmd/pmd/etc/grammar/Java1.4-c.jjt?rev=1.3&content-type=text/vnd.viewcvs-markup.
Personal tools