analyzing-malicious-pdf-with-peepdf

// Perform static analysis of malicious PDF documents using peepdf, pdfid, and pdf-parser to extract embedded JavaScript,

$ git log --oneline --stat
stars:6.3Kforks:881updated:May 16, 2026 at 14:29
SKILL.md
readonly
nameanalyzing-malicious-pdf-with-peepdf
descriptionPerform static analysis of malicious PDF documents using peepdf, pdfid, and pdf-parser to extract embedded JavaScript,

name: analyzing-malicious-pdf-with-peepdf description: Perform static analysis of malicious PDF documents using peepdf, pdfid, and pdf-parser to extract embedded JavaScript, shellcode, and suspicious objects. domain: cybersecurity subdomain: malware-analysis tags:

  • malware-analysis
  • pdf
  • peepdf
  • pdfid
  • pdf-parser
  • static-analysis
  • reverse-engineering
  • dfir version: '1.0' author: mahipal license: Apache-2.0 nist_csf:
  • DE.AE-02
  • RS.AN-03
  • ID.RA-01
  • DE.CM-01

Analyzing Malicious PDF with peepdf

When to Use

  • When triaging suspicious PDF attachments from phishing emails
  • During malware analysis of PDF-based exploit documents
  • When extracting embedded JavaScript, shellcode, or executables from PDFs
  • For forensic examination of weaponized document artifacts
  • When building detection signatures for PDF-based threats

Prerequisites

  • Python 3.8+ with peepdf-3 installed (pip install peepdf-3)
  • pdfid.py and pdf-parser.py from Didier Stevens suite
  • Isolated analysis environment (VM or sandbox)
  • Optional: PyV8 for JavaScript emulation within peepdf
  • Optional: Pylibemu for shellcode analysis

Workflow

  1. Triage with pdfid: Scan PDF for suspicious keywords (/JS, /JavaScript, /OpenAction, /Launch, /EmbeddedFile).
  2. Interactive Analysis: Open PDF in peepdf interactive mode to explore object structure.
  3. Identify Suspicious Objects: Locate objects containing JavaScript, streams, or encoded data.
  4. Extract Content: Dump suspicious streams and decode filters (FlateDecode, ASCIIHexDecode).
  5. Deobfuscate JavaScript: Analyze extracted JS for shellcode, heap sprays, or exploit code.
  6. Check VirusTotal: Use peepdf vtcheck to cross-reference file hash with AV detections.
  7. Generate IOCs: Extract URLs, domains, hashes, and shellcode signatures.

Key Concepts

ConceptDescription
/OpenActionAutomatic action executed when PDF is opened
/JavaScript /JSEmbedded JavaScript code in PDF objects
/LaunchAction that launches external applications
/EmbeddedFileFile embedded within the PDF structure
FlateDecodezlib compression filter used to hide content
Object StreamsPDF objects stored in compressed streams

Tools & Systems

ToolPurpose
peepdf / peepdf-3Interactive PDF analysis with JS emulation
pdfid.pyQuick triage scanning for suspicious keywords
pdf-parser.pyDeep object-level PDF parsing
VirusTotalHash lookup and AV detection cross-reference
CyberChefDecode and transform extracted payloads

Output Format

Analysis Report: PDF-MAL-[DATE]-[SEQ]
File: [filename.pdf]
SHA-256: [hash]
Suspicious Keywords: [/JS, /OpenAction, etc.]
Objects with JavaScript: [Object IDs]
Extracted URLs: [List]
Shellcode Detected: [Yes/No]
Embedded Files: [Count and types]
VirusTotal Detections: [X/Y engines]
Risk Level: [Critical/High/Medium/Low]