Developer Guide

TODO: Project info, goals, etc.

Getting Started

Project Layout

TODO: Add the philosophy behind the project layout

Shell

The Shell package (pkg/shell) is a library of commands and utilities used in workflow engine. The standard library way to execute shell commands is by using the os/exec package which has a lot of features and flexibility. In our case, we want to restrict the ability to arbitrarily execute shell commands by carefully selecting a sub-set of features for each command.

For example, if you look at the Syft CLI reference, you'll see dozens of commands and configuration options. This is all controlled by flag parsing the string of the command. This is an opinionated security pipeline, so we don't need all the features Syft provides. The user shouldn't care that we're using Syft to generate an SBOM which is then scanned by Grype for vulnerabilities. The idea of Workflow Engine is that it's all abstracted to the Security Analysis pipeline.

In the Shell package, all necessary commands will be abstracted into a native Go object. Only the used features for the given command will be written into this package.

The shell.Executable wraps the exec.Cmd struct and adds some convenient methods for building a command.

syft version -o json

How to execute regular std lib commands with exec.Cmd

cmd := exec.Command("syft", "version", "-o","json")
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
// some other options
err := cmd.Run()

There's also additional logic with the os.exec standard library command. Since workflow engine is built around executing external binaries, there is an internal library called the pkg/shell used to abstract a lot of the complexities involved with handling async patterns, possible interrupts, and parameters.

Commands can be represented as functions.

func SyftVersion(options ...OptionFunc) error {
	o := newOptions(options...)
	cmd := exec.Command("syft", "version")
	return run(cmd, o)
}

The OptionFunc variadic parameter allows the caller to modify the behavior of the command with an arbitrary number of OptionFunc(s).

newOptions generates the default Options structure and then applies all of passed in functions. The o variable can now be used to apply parameters to the command before execution.

Returning the run function handles off the execution phase of the command to another function which bootstraps a lot of useful functionality without needing to write supported code for each new command.

For example, if you only want to output what the command would run but not actually run the command,

dryRun := false
SyftVersion(WithStdout(os.Stdout), WithDryRun(dryRun))

This would log the final output command without executing.

The motivation behind this architecture is to simply the Methods for all sub-commands on an executable.

Implementing a new sub command is trivial, just write a new function with the same pattern

func SyftHelp(options ...OptionFunc) error {
	o := newOptions(options...)
	cmd := exec.Command("syft", "--help")
	return run(cmd, o)
}

If we wanted to build an optionFunc for version to optionally write JSON instead of plain text, it would go in the pkg/shell/shell.go function.

Since there aren't many commands, they all share the same configuration object Options.

func WithJSONOutput(enabled bool) OptionFunc {
	return func(o *Options) {
		o.JSONOutput = true
	}
}

Now, the version function can reference this field and change the shell command

func SyftVersion(options ...OptionFunc) error {
	o := newOptions(options...)
	cmd := exec.Command("syft", "version")
  if o.JSONOutput {
    cmd = exec.Command("syft", "version", "-o", "json")
  }
	return run(cmd, o)
}

See pkg/shell/docker.go for a more complex example of a command with a lot of parameters.

Pipelines

Concepts

Concurrency

Workflow Engine PR #26

This PR contains a detailed explanation of the concurrency pattern used in the pipeline definitions.

Documentation

Too Long; Might Read (TL;MR)

A collection of thoughts around design decisions made in Workflow Engine, mostly ramblings that some people may or may not find useful.

Why CI/CD Flexible Configuration is Painful

In a traditional CI/CD environment, you would have to parse strings to build the exact command you want to execute.

Local Shell:

syft version

GitLab CI/CD Configuration let's use declare the execution environment by providing an image name

syft-version:
  stage: scan
  image: anchore/syft:latest
  script:
    - syft version

What typically happens is configuration creep. If you need to print the version information in JSON, (one of the many command options), you would have to provide multiple options in GitLab, only changing the script block, hiding each on behind an env variable

.syft:
  stage: scan
  image: anchore/syft:latest

syft-version:text:
  extends: .syft
  script:
    - syft version
  rules:
    - if: $SYFT_VERSION_JSON != "true"

syft-version:json:
  extends: .syft
  script:
    - syft version -o json
  rules:
    - if: $SYFT_VERSION_JSON == "true"

The complexity increase exponentially in a GitLab CI/CD file for each configuration option you wish to support.