440 & AFL and maybe JavanWartyPig
  1. Testing a system using the inputs that are wholly or partially randomly-generated
  2. Goal: Find inputs that make the SUT behave in interesting ways
  3. Oracles:
    • Determines if a test case passes or fails
    • Usually an oracle for correctness does not exist (beyond source-code assertions)
    • Derived oracles for correctness testing
  4. Generation-based fuzzing:
    • Inputs produced from nothing
  5. Mutation-based fuzzing:
    • Inputs produced from modification and/or combination of existing inputs
  6. Types of inputs the fuzzer may aim to produce:
    • Totally invalid inputs (random strings)
    • Somewhat malformed inputs (sequences of tokens)
    • Inputs with a degree of validity (sentences in the grammar, but not well-typed)
    • High integrity inputs (well-formed inputs free from any undefined behaviour)
  7. Dumb Fuzzing:
    • Generates or mutates inputs without the knowledge of the SUT's input domain
    • Used in the original Fuzzing paper of 1990 that found UB in most of UNIX utilities
    • Could trigger stack overflow, division by 0, maybe an integer overflow
  8. Smart Fuzzing:
    • Uses some information about the expected input format
    • Has a high chance of getting stack underflows and signed integer overflows
    • Can be helped by Compiler sanitisers
  9. Feedback-directed Fuzzing:
    • Can be smart or dumb
    • Input generation or mutation procedure is informed by observing the SUT on previous inputs
    • An input can be flagged if it makes the SUT do something noteworthy:
      • Cover new code
      • Consume more memory than usual
      • Log a previously unseen message
      • Execute a particular syscall
    • Having flagged input:
      • Generation-based fuzzer tries to generate inputs with similar properties
      • Mutation-based fuzzer prioritises it over others when making a mutation
    • Intuition:
      • Input providing new coverage exercises the SUT in a new way
      • Related input may push the SUT further into the unknown
      • New message means either more coverage or has a noteworthy (for the SUT) property
  10. Black-box fuzzing:
    • SUT is exercised in an unmodified form
    • Advantages:
      • Can be applied to closed source SUT's
      • SUT runs at full speed, enabling higher rate of fuzzing
    • Disadvantages:
      • Feedback-directed fuzzing can only make use of externally-visible behaviour
      • No coverage-directed fuzzing -> May not cover all execution paths
  11. Grey-box fuzzing:
    • Instrumentation applied to the SUT, to expose information about its inner workings
    • Typically exposes coverage
    • Instrumentation can be applied at compile or binary level
    • Advantages:
      • A feedback-directed fuzzer has more information to work with
    • Disadvantages:
      • Instrumentation overhead -> Slower rate of fuzzing
      • Compile-time instrumentation requires access to the SUT sources
  12. White-box fuzzing:
    • Analysis via constraint solving used to generate inputs that exercise different parts of SUT
    • Can prove to cover all execution paths (given enough time)
    • Known as Symbolic and Concolic execution
    • In many ways this is not really fuzzing
    • Advantages:
      • Constraint solvers generate inputs that reach hard to hit parts of the SUT
    • Disadvantages:
      • Slow (due to overhead of solving)
      • Heavyweight, requires specialised tool chain
  13. AFL:
    • Mutation-based, dumb, grey-box fuzzer
    • State of the art fuzzing for security
    • TODO: Read the paper on AFL
  14. Fuzzing compilers:
    • What are the challenges:
      • Absence of an oracle (for code generation)
      • Nondeterminism
      • Lack of clear language semantics
      • Undefined, unspecified and implementation-defined behaviour:
        1. Undefined - The standard imposes no requirement
        2. Unspecified - The standard defines a set of allowable behaviours to choose at any occurrence
        3. Implementation-defined - The standard defines a set of allowable behaviours;the compiler has to choose one and document a consistent behaviour
    • Compilers are non-testable because:
      • There does not exist an oracle (for code generation)
      • It is theoretically possible, but practically too difficult to determine the correct output
    • Ways to solve the absence of an oracle for code generation:
      • Derived oracle -- not absolute source of truth
      • Cross-check implementations:
        1. Method:
          1. Compile the code with multiple different compilers and compare the output
          2. Mismatches indicate bugs
          3. Majority is probably correct
          4. Known as differential testing
        2. Requirements:
          1. Program must be deterministic
          2. Program must be free from undefined/unspecified behaviour
          3. Multiple compilers must exist
          4. Compilers must agree on implementation-defined behaviour
      • Cross-check equivalent programs:
        1. Method:
          1. Compile equivalent programs with one compiler
          2. Mismatches indicate bugs
          3. Majority is probably correct
          4. Is a form of metamorphing testing
        2. Requirements:
          1. Equivalent programs (obviously)
          2. Program must be deterministic
          3. Program must be free from undefined/unspecified behaviour
  15. Metamorphic testing:
    • No reference implementation (oracle)
    • Exploit the property of an input when mutating it to obtain a result based on the result from the previous input (like sin x == sin x + 2pi)
    • Simply finding a function for input transformation that mutates output in a deterministic way
  16. Test case reduction:
    • Transform the input in such a way that the property of interest is preserved
    • C-Reduce is a test case reducer for C programs