Announcing GUAC, a great pairing with SLSA (and SBOM)!

Supply chain security is at the forefront of the industry’s collective consciousness. We’ve recently seen a significant increase in software supply chain attacks, a Log4j vulnerability of catastrophic severity and breadth, and even a cybersecurity executive order.

It is against this background that Google is looking for contributors to a new open source project called GUAC (pronounced dip). GUAC, or Graph for Understanding Artifact Composition, is still in its early stages poised to change how the industry understands software supply chains. GUAC addresses a need created by the growing effort across the ecosystem to generate software architecture, security, and dependency metadata. True to Google’s mission to organize and make the world’s information universally accessible and useful, GUAC is intended to democratize the availability of this security information by making it freely available and useful to any organization, not just those with security and IT funding in enterprise scale.

Thanks to community collaboration in groups like OpenSSF, SLSA, SPDX, CycloneDX and others, organizations increasingly have easy access to:

This data is useful in itself, but it is difficult to combine and synthesize the information to get a more comprehensive overview. The documents are spread across different databases and vendors, are linked to different ecosystem entities, and cannot be easily aggregated to answer higher-level questions about an organization’s software assets.

To help solve this problem, we’ve teamed up with Kusari, Purdue University, and Citi to create GUAC, a free tool for aggregating many different sources of software security metadata. We’re excited to share the project’s proof of concept, which lets you query a small dataset of software metadata, including SLSA provenance, SBOMs, and OpenSSF Scorecards.

Graph for Understanding Artifact Composition (GUAC) aggregates software security metadata into a high-quality graphical database—normalizing entity identities and mapping standard relationships between them. Queries on this graph can lead to higher-level organizational outcomes such as auditing, policy, risk management, and even developer assistance.

Conceptually, GUAC occupies the “aggregation and synthesis” layer of the software supply chain transparency logic model:

GUAC has four main areas of functionality:

  1. Collection
    GUAC can be configured to connect to a variety of sources of software security metadata. Some sources may be open and public (eg OSV); some may be first-party (eg an organization’s internal archives); some may be proprietary third parties (eg from data providers).
  2. Intake
    From its upstream data sources, GUAC imports data about artifacts, projects, resources, vulnerabilities, repositories, and even developers.
  3. Collection
    After ingesting raw metadata from various upstream sources, GUAC assembles it into a coherent graph by normalizing entity identifiers, traversing the dependency tree and recreating implicit entity relationships, e.g. project → developer; vulnerability → software version; artifact → source inventory and so on.
  4. Request
    In relation to an overall graph, one can query for metadata linked to or related to entities in the graph. Querying a given artifact can return its SBOM, provenance, build chain, project scorecard, vulnerabilities, and recent lifecycle events—and those of its transitive dependencies.

    A CISO or compliance officer in an organization wants to be able to reason about the risk of their organization. An open source organization like the Open Source Security Foundation wants to identify critical libraries that need to be maintained and secured. Developers need richer and more reliable intelligence about the dependencies in their projects.

    The good news is that increasingly, the upstream supply chain is already enriched with attestations and metadata to drive higher-level reasoning and insights. The bad news is that today it is difficult or impossible for software consumers, operators and administrators to aggregate this data into a unified view across their software assets.

    To understand something as complex as the blast radius of a vulnerability, one needs to trace the relationship between a component and everything else in the portfolio—a task that can span thousands of metadata documents across hundreds of sources. In the open source ecosystem, the number of documents can reach millions.

    GUAC collects and synthesizes software security metadata at scale and makes it meaningful and actionable. With GUAC in hand, we will be able to answer questions at three important stages of software supply chain security:

    • Proactivee.g,
      • What are the most used critical components in my software supply chain ecosystem?
      • Where are the weak points in my overall security posture?
      • How do I prevent supply chain compromises before they happen?
      • Where am I exposed to risky addictions?
    • Operationale.g,
      • Is there evidence that the application I am deploying complies with the organization’s policy?
      • Are all binaries in production traced back to a secure managed repository?
    • Reactivee.g,
      • What parts of my organization’s inventory are affected by new vulnerability X?
      • A suspicious project lifecycle event has occurred. Where is risk introduced to my organization?
      • An open source project is being phased out. How will I be affected?

GUAC is an Open Source project on Github, and we’re excited to get more people involved and contributing (read the contributor guide to get started)! The project is still in the early stages with a proof of concept that can load SLSA, SBOM and Scorecard documents and support simple queries and exploration of software metadata. The next effort will focus on scaling the current capabilities and adding new document types for loading. We welcome help and contributions with code or documentation.

As the project will consume documents from many different sources and formats, we have assembled a group of “Technical Advisory Members” to help advise the project. These members include representation from companies and groups such as SPDX, CycloneDX Anchore, Aquasec, IBM, Intel and many more. If you are interested in participating as a contributor or advisor representing the needs of end users – or the sources of metadata GUAC uses – you can register your interest in the relevant GitHub issue.

The GUAC team will showcase the project at Kubecon NA 2022 next week. Come by our session if you want to be there and have a chat with us – we’d love to talk in person or virtually!

William

Leave a Reply

Your email address will not be published. Required fields are marked *