Does your deployed infrastructure match what you defined?

In “How SQL can unify access to APIs” I made the case for SQL as a common environment in which to reason about data flowing from many different APIs. The key enabler of that scenario is Steampipe, a Postgres-based tool with a growing suite of API plugins that map APIs to foreign tables in Postgres.

These APIs were, initially, the ones provided by AWS, Azure, and GCP. Such APIs are typically made more accessible to developers by way of wrappers like boto3. A common SQL interface is arguably a better unifier of the sprawling API ecosystems within these clouds, and that’s inarguably true in multicloud scenarios. With Postgres under the hood, by the way, you’re not restricted to SQL: You can hook Python or JavaScript or another language to Postgres and leverage the common SQL interface from those languages too.

The Steampipe ecosystem then expanded with plugins for many other services including GitHub, Google Workspace, IMAP, Jira, LDAP, Shodan, Slack, Stripe, and Zendesk. Joining across these APIs is a superpower best proven by this example that joins Amazon EC2 endpoints with Shodan vulnerabilities in just 10 lines of very basic SQL.

  aws_ec2_instance a
left join
  shodan_host s on a.public_ip_address = s.ip
  a.public_ip_address is not null;

| instance_id         | ports    | vulns              |
| i-0dc60dd191cb84239 | null     | null               |
| i-042a51a815773780d | [80,22]  | null               |
| i-00cf426db9b8a58b6 | [22]     | null               |
| i-0e97f373db42dfa3f | [22,111] | ["CVE-2018-15919"] |

Files are APIs too

But what is an API, really? Must it always entail HTTP requests to service endpoints? More broadly APIs are data sources that come in other flavors too. Web pages are often, still, de facto APIs. I’ve done more web scraping than I care to think about over the years and the skill remains useful.

Files are also data sources: configuration files (INI, YAML, JSON), infrastructure-as-code files (Terraform, CloudFormation), data files (CSV). When plugins for these sources began to join the mix, Steampipe became even more powerful.

First came the CSV plugin, which unlocked all sorts of useful queries. Consider, for example, how we often pretend spreadsheets are databases. In doing so we can assume there’s referential integrity when really there isn’t. If you export spreadsheet data to CSV, you can use SQL to find those flawed assumptions. And that’s just one of the endless ways I can imagine using SQL to query the world’s leading file format for data exchange.

Then came the Terraform plugin, which queries Terraform files to ask and answer questions like: “Which trails are not encrypted?”

  and arguments -> 'kms_key_id' is null;

Using the AWS plugin’s aws_cloudtrail_trail table, we can ask and answer the same question for deployed infrastructure, and return a result set that you could UNION with the first one.

  arn as path
  kms_key_id is null;

Ideally the answers will always be the same. What you said should be deployed, using Terraform, should match what’s actually deployed if you query AWS APIs. In the real world, of course, maintenance and/or incident response can result in configuration drift. Given a common way to reason over defined and deployed infrastructure, we can manage such drift programmatically.

Belt and suspenders

For deployed infrastucture, Steampipe has long provided a suite of mods that layer security and compliance checks onto API-derived foreign tables. The AWS Compliance mod, for example, provides benchmarks and controls to check deployed infrastructure against eleven standards and frameworks including CIS, GDPR, HIPAA, NIST 800-53, and SOC 2.

steampipe aws cis v140 console IDG

With the advent of the Terraform plugin it became possible to create complementary mods, like Terraform AWS Compliance, that provide the same kinds of checks for defined infrastructure.

steampipe terraform aws compliance console output IDG

Does what you defined last month match what you deployed yesterday? A satisfactory answer requires the ability to reason over defined and deployed infrastructure in a common and frictionless way. SQL can’t remove all the friction but it’s a powerful solvent.

Copyright © 2022 IDG Communications, Inc.

Leave a Comment