Big Data, Snowflake and the Reinvention of SIEM

April 27, 2022

Table of Contents

Last Updated on January 30, 2025

Security professionals shouldn’t have to be data science geeks to get answers to security questions. But that’s often the way it is with today’s Security Information and Event Management (SIEM) solutions. Cost and operational barriers to consolidating and scaling diverse data for analysis within a SIEM tool can limit its effectiveness and leave security teams frustrated by a lack of visibility into potential threats.

How can security teams get better, faster, cheaper access to more data at scale?

On a recent episode of The Virtual CISO Podcast, Jack Naglieri, Founder and CEO at Panther Labs, talked about how serverless computing and cloud-based services are enabling the reinvention of SIEM, including a “loose coupling” of compute and storage elements within a hyperscale solution.

Data lakes and data warehouses

Most security pros don’t want to worry about big data storage. But it’s central to why Panther can analyze so much data so quickly.

“In the simplest terms, a data lake is a very, very, very big relational database,” Jack explains. “And there’s a lot of different ways to slice and dice it, because data lakes can be structured and unstructured. Data lakes are using things like generalized blob storage, like Amazon S3 (for Simple Storage Service), to your data. Then a data warehouse is making it usable. Snowflake is trying to be ‘the’ de facto cloud data warehouse. Where you can feed as much data as you want into this relational database. And then it’ll elastically scale because it has a separation of storage and compute. Typically, the bottleneck would be, like, the Splunk scenario—I have more data so I need more indexers.”

Separating storage and compute

Snowflake and similar technology stores petabyte-scale amounts of data in at rest in low-cost storage, but with some special efficiencies (e.g., storing data by columns rather than rows) to accelerate analytical queries. Then you load compute instances to access the data, versus “compute” being a bottleneck on the storage end. Instead, the bottleneck becomes, “How much compute do you want?” How fast do you want your query to execute?

Panther supports stream processing, enabling near real-time security analysis of data feeds.

Jack adds: “What went into this technology is, we have a lot of data and we need to search it fast. So, what are the things we can do with compression and storage and compute to make that happen? If we were just using something like a Postgres or MySQL database, we’d be very limited with that tight coupling of storage and compute. Data lakes sort of decouple it and make it much more scalable, but there’s always going to be tradeoffs. Even with data lake solutions—like AWS has options; we have things in Google now like BigQuery and then you have Snowflake and Azure Databricks. It’s just really a way to operate at a quote-unquote cloud scale without having to worry about the storage and compute coupled together.”