Parser System
The parser system (src/parsers/) is a modular architecture for processing markdown content through configurable pipelines.
Architecture Overview
┌─────────────────────────────────────────────────────────────────────────┐
│ CONTENT-TYPES │
│ DocsParser / BlogParser │
│ (Orchestrates the entire flow, provides context) │
└─────────────────────────────────┬───────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ CORE │
│ ProcessingPipeline │
│ (Manages the sequential execution) │
└─────────────────────────────────┬───────────────────────────────────────┘
│
┌───────────────────────┼───────────────────────┐
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ PREPROCESSORS │ │ RENDERERS │ │ POSTPROCESSORS │
│ │ ──▶ │ │ ──▶ │ │
│ • code-protect │ │ • marked.ts │ │ • heading-ids │
│ • asset-embed │ │ (MD → HTML) │ │ • external-links│
└─────────────────┘ └─────────────────┘ └─────────────────┘
│
▼
┌─────────────────┐
│ TRANSFORMERS │
│ │
│ • callout │
│ • tabs │
│ • collapsible │
│ (Custom tags) │
└─────────────────┘
File Structure
src/parsers/
├── index.ts # Main exports + parser factory
├── types.ts # Type definitions
│
├── core/ # Pipeline orchestration
│ ├── index.ts
│ ├── pipeline.ts # ProcessingPipeline class
│ └── base-parser.ts # BaseContentParser abstract class
│
├── content-types/ # Content-specific parsers
│ ├── index.ts
│ ├── docs.ts # DocsParser (XX_ prefix)
│ └── blog.ts # BlogParser (date prefix)
│
├── preprocessors/ # Before rendering
│ ├── index.ts
│ ├── code-protect.ts # Protect code blocks
│ └── asset-embed.ts # [[path]] embedding
│
├── renderers/ # Markdown → HTML
│ ├── index.ts
│ └── marked.ts # Marked library wrapper
│
├── postprocessors/ # After rendering
│ ├── index.ts
│ ├── heading-ids.ts # Add IDs to headings
│ └── external-links.ts # Security attrs for external links
│
└── transformers/ # Custom tag transformation
├── index.ts
└── registry.ts # TagTransformerRegistry class
The 6 Components
| # | Folder | Role | When |
|---|---|---|---|
| 1 | core/ |
Pipeline orchestration + BaseContentParser | Controls flow |
| 2 | preprocessors/ |
Transform raw markdown | Before rendering |
| 3 | transformers/ |
Custom tag → HTML (<callout>, <tabs>) |
After rendering (as postprocessor) |
| 4 | renderers/ |
Markdown → HTML (Marked) | Middle stage |
| 5 | postprocessors/ |
Enhance HTML (IDs, links) | After rendering |
| 6 | content-types/ |
DocsParser/BlogParser (filename parsing, asset paths) | Entry point |
Key Insight
Transformers are actually postprocessors - the TagTransformerRegistry has a createProcessor() method that wraps itself as a Processor to be added to the pipeline's postprocessor chain.
Processing Flow
The flow in base-parser.ts:
const content = await this.pipeline.process(rawContent, context, this.render);
Which executes in pipeline.ts:
preprocess()→ runs all preprocessors sequentiallyrender()→ converts markdown to HTMLpostprocess()→ runs all postprocessors + transformers
Data Loader Integration
The data loader (src/loaders/data.ts) uses the parser system:
import { loadContent } from '@loaders/data';
// Load docs with DocsParser
const docs = await loadContent('docs', 'docs', {
pattern: '**/*.{md,mdx}',
sort: 'position',
requirePositionPrefix: true,
});
// Load blog with BlogParser
const posts = await loadContent('blog', 'blog', {
pattern: '*.md',
sort: 'date',
order: 'desc',
});
LoadContent Options
| Option | Type | Description |
|---|---|---|
pattern |
string |
Glob pattern for files |
sort |
'position' | 'date' | 'title' |
Sort method |
order |
'asc' | 'desc' |
Sort direction |
includeDrafts |
boolean |
Include draft content |
requirePositionPrefix |
boolean |
Enforce XX_ prefix (docs) |
Return Type
interface LoadedContent {
id: string; // Unique identifier
slug: string; // URL path
content: string; // Rendered HTML
data: {
title: string;
description?: string;
sidebar_position?: number;
date?: string;
tags?: string[];
draft?: boolean;
};
filePath: string;
relativePath: string;
fileType: 'md' | 'mdx';
}
Caching
Content is cached during production builds:
- Development: No caching, files re-parsed on each request
- Production: Content cached after first parse
Cache invalidation happens automatically when files change during development.