Why File-Based Malware Dominates Cyberattacks

August 6, 2024

Table of Contents

Last Updated on April 14, 2025

Files and documents are indispensable to computing and to running an organization. No wonder injecting malicious code into files is also a primary vector for cyberattacks. Even a phishing email with no attachments is a file-based attack, since emails themselves are files.

This article explains in business terms the anatomy of file-based attacks, why they remain so prevalent, and new file-based attacks to look out for.

What are the most popular file types for injecting malware?

Cybercriminals are incredibly inventive, but they also like to stick with what works. The top three file types for carrying malware payloads have been consistently popular for years:

Archive file formats, especially ZIP and RAR
These formats work well for propagating malware because they are often encrypted, which limits security tools’ ability to scan or disinfect them. In many instances they are passed directly through to users. Users also must open these files to view their contents, which makes the perfect opening move for a malware download.
Microsoft 365 documents
Files associated with Microsoft 365 applications (.DOCX for Word, .XLSX for Excel, etc.) are comparatively easy to weaponize. The typical approach uses macros to run scripts that execute remote code and download malware. Microsoft 365 files are shared ubiquitously in organizations, making them an ideal delivery mechanism to fool trusting users.
PDF documents
Like Microsoft 365 documents, PDF documents can fly below employees’ cyber threat awareness because they are so commonplace and ostensibly harmless. But PDFs can contain malicious JavaScript and other dangerous code. PDFs can also include convincing content and links to malicious websites where hackers can steal login credentials or download malware. Security controls may sniff out dangerous scripts, but malicious links are harder to detect.

Besides offering a technical basis for launching attacks, these “productivity” document types share a common profile: they are widely distributed, and users are comfortable creating them, opening them, sending them to others, etc. on an everyday basis without suspicion. Document sharing sites like Dropbox, OneDrive, and Box just add to the mix of delivery possibilities for hackers.

Programmatic file types like .EXE and .DLL can be weaponized in countless ways, but they are also more likely to put users on guard. A poisoned ZIP file can do just as much damage.

What is the most common file-based attack vector?

Many common cyberattacks, such as ransomware attacks, attacks using compromised credentials, and many social engineering attacks, start with system intrusion. The intrusion occurs when a hacker successfully accesses a victim’s system.

These attacks can be simple or elaborate. But they predominantly rely on file-based malware to gain unauthorized access to a server or endpoint. The hacker’s goal is almost always to exfiltrate and/or encrypt data for financial gain.

To cite but one example, hackers are still gainfully exploiting Log4j vulnerabilities years after the fact, often via malicious documents. According to Verizon’s Data Breach Investigations Report 2023, 8% of organizations still host Log4j vulnerabilities, with 22% of these having multiple instances.

Why do email-based attacks remain so prevalent?

Sending poisoned documents and files directly to users as email attachments has long been the most popular malware distribution method. The reason is simple: it continues to succeed.

Nearly everyone with a computer regularly sends and receives emails containing attachments from within and beyond their company’s walls. Any one of these communications could be a phishing attempt or other attack.

Even among users with ongoing cybersecurity awareness training, open/click rates for attachments and links in emails is rarely below 4%. With more targeted and sophisticated social engineering campaigns like business email compromise (BEC) or spear-phishing attacks, it is often much higher.

Automated document processing is an emerging threat vector

Automated document processing (ADP) software leverages AI and sophisticated algorithms to reduce manual document processing and automate document creation. It classifies business documents and then extracts standard data elements from them, which it places into a template format.

ADP’s goal is to improve efficiency and productivity while reducing errors. Common document types for automated processing include reports, proposals, and contracts. Automated, cloud-based creation of prescriptions and other sensitive healthcare documents is also increasing.

An unwanted side effect of this automation is increased risk of file-based cyberattack. Many document processing platforms do not scan for file-based threats, and conventional scans fail to detect many such threats regardless.

Some of these threats are well established, such as accidentally processing documents that contain malware leading to data corruption, encryption, and/or exfiltration. Ransomware attacks on data lakes assisted by automated document processing are an increasingly common occurrence.

QR code attacks are another rising trend

Quick Response (QR) code phishing attacks—dubbed “quishing”—are one of the latest email-based threats. Quishing attacks use QR codes to redirect victims to toxic websites or entice them to download malicious files. As with other file-based attacks, the goal is overwhelmingly to steal and monetize sensitive data.

QR codes are intended to store diverse data types including URLs, contact information, and product data. To launch a quishing attack, cybercriminals create a QR code that points to a malicious website. The QR code can then be embedded in emails, social media posts, printed documents, etc. along with social engineering content to tempt the unwary.

When they scan the poisoned QR code, victims are directed to a malicious site. Here they are prompted to enter sensitive data and/or download malware.

Because it is so new, quishing can bypass many traditional defense strategies, such as secure email gateways or other email filtering mechanisms. These solutions often register QR codes as just images and cannot resolve them. This leaves users to scan the QR code to see where it leads—possibly exposing them to malware or credential theft.

What can businesses do to reduce risk from file-based threats?

Organizations have traditionally used several tools to block file-based attacks, either alone or as part of a layered defense strategy. These include:

Email filtering, which attempts to identify and reroute phishing emails and other email-based attacks so that users don’t come across them in their inboxes with no warning. The challenge here is keeping up with the endless waves of different attacks and blocking them. There is always a lag, and the tools miss many threats while flagging many legitimate communications. Email filtering also requires significant human intervention and decreases workforce productivity.
Sandboxing, a more advanced adjunct to email filtering, improves detection but has many of the same issues.
Content filtering, which function analogously to a software firewall to block malware in Microsoft 365, ZIP, PDF, and other document types.
Security awareness training, to help users catch some of the attacks that filters miss. Humans are fallible and it only takes one inevitable mistake to cause serious harm.
Endpoint protection, which hopes to minimize the damage from a successful system intrusion.
Network segmentation, which limits malware’s ability to spread across the network.
And so on…

All these approaches work by focusing on the malicious content. A novel alternative is Votiro’s “zero trust content security and data detection and response platform.” This solution works not by scanning files for attack signatures, but by reconstructing each new file or document that enters the environment to preserve its safe, usable content. It does not block content, but instead removes all possible malware, even zero-day threats. Votiro is more successful at nullifying threats than traditional approaches, and users do not lose productivity intervening on false positives.