Scryber.Core Architecture Document
Table of Contents
- Overview
- System Architecture
- Project Structure
- Component Model
- PDF Generation Pipeline
- Subsystem Deep Dive
- Data Flow
- Design Patterns
- Extension Architecture
- Performance Considerations
Overview
Purpose
Scryber.Core is a .NET PDF generation engine that transforms HTML/XML templates with CSS styling into high-quality PDF documents. It bridges web technologies (HTML, CSS, JavaScript-like expressions) with PDF output, enabling developers to create complex documents using familiar web development patterns.
Key Design Goals
- Web-First: Use HTML and CSS as primary authoring format
- Data Binding: Support dynamic content through expression evaluation
- Extensibility: Allow custom components, styles, and behaviors
- Multi-Platform: Support .NET 6, 8, 9, and Standard 2.0
- WASM Compatible: Run in Blazor WebAssembly environments
- Performance: Efficient layout and rendering for large documents
- Standards Compliance: Follow CSS box model and HTML semantics
Technology Stack
- .NET Multi-targeting: net6.0, net8.0, net9.0, netstandard2.0
- HTML Parsing: HtmlAgilityPack for loose HTML, System.Xml for strict XHTML
- Image Processing: SixLabors.ImageSharp
- Font Support: Custom OpenType parser (Scryber.Core.OpenType)
- Expression Engine: Custom expression parser and evaluator
- PDF Generation: Direct PDF structure writing (no dependencies on external PDF libraries)
System Architecture
High-Level Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Public API Layer β
β Document.ParseDocument() / Document.ParseHtmlDocument() β
β Document.SaveAsPDF() / Document.SaveAsPDFAsync() β
βββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββββ
β Component Tree Layer β
β HTMLDiv, HTMLSpan, Page, Section, Table, Image, etc. β
βββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββΌββββββββββββββββββββ
β β β
βΌ βΌ βΌ
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β Styles β β Binding β β Layout β
β Subsystem β β Subsystem β β Subsystem β
β β β β β β
β CSS Parser β β Expression β β Engine β
β Selectors β β Evaluator β β Managers β
β Cascading β β Data Path β β Measurers β
ββββββββ¬ββββββββ ββββββββ¬ββββββββ ββββββββ¬ββββββββ
β β β
βββββββββββββββββββββΌββββββββββββββββββββ
β
ββββββββββββ΄ββββββββββββ
β β
βΌ βΌ
ββββββββββββββββ ββββββββββββββββ
β Resources β β PDF Writer β
β β β β
β Fonts β β Objects β
β Images β β Streams β
β Shared β β References β
ββββββββββββββββ ββββββββββββββββ
Layered Architecture
Layer 1: Foundation (Scryber.Common)
- Core interfaces and contracts
- PDF primitive types and structures
- Resource management abstractions
- Configuration and logging
Layer 2: Specialized Services
- Drawing (Scryber.Drawing): Graphics primitives, fonts, colors, SVG
- Expressions (Scryber.Expressive): Expression parsing and evaluation
- Styles (Scryber.Styles): CSS parsing and style management
- Imaging (Scryber.Imaging): Image loading and format conversion
Stem Layer: Object Graph Generation (Scryber.Generation)
- HTML to XHTML Conversion
- XML Namespace to Assembly Namespace mapping
- Reflective parser for Object Type lookup
- Graph construction and property assigment
Layer 3: Integration and Component Definition (Scryber.Components)
- Component implementations
- HTML element mapping
- Layout engine
- PDF generation orchestration
Layer 4: Framework Integration (Scryber.Components.Mvc)
- ASP.NET MVC extensions
- HTTP response integration
Significant Referenced Libraries
Nuget Packages
- Scryber.Core.OpenType: Handles the reading of OpenType font files (.otf, .ttf, .otc, .otf), along with evaluating font properties and measuring strings.
- SixLabours.ImageSharp: Handles evaluation and conversion of all supported image types to binary data (where needed).
- HtmlAgilityPack: Supports the conversion of βlooseβ html to valid XHTML that can be processed by the reader
- Newtonsoft.JSON: Supports the extraction of values from .json content, that has been decoded from a JSON string, or stream by the library.
Open source frameworks
- bijington/expressive : The source has been modified significantly to work with the library, however forms the base of the Expression parser and function list.
Project Structure
Dependency Graph
Scryber.Common
β
βββ Scryber.Drawing βββββββββ
β (fonts, graphics) β
β β
βββ Scryber.Expressive βββββββ€
β (expressions) β
β βΌ
βββ Scryber.Styles βββββ Scryber.Generation
β (CSS) (parsing, binding)
β β
βββ Scryber.Imaging ββββββββββ€
β (images) β
β β
βββ Scryber.Components βββββββ
(main engine)
β
βΌ
Scryber.Components.Mvc
(ASP.NET)
Project Responsibilities
Scryber.Common
Purpose: Foundation layer with core abstractions
Key Namespaces:
Scryber: Core interfaces (IComponent,IDocument,IBindableComponent)Scryber.PDF: PDF primitive types (PDFString,PDFNumber,PDFDictionary)Scryber.PDF.Native: Low-level PDF reading and writingScryber.PDF.Resources: Resource management (ISharedResource)Scryber.Html: HTML entity definitionsScryber.Logging: Trace and performance logging
Key Types:
IComponent: Base interface for all components with lifecycle methodsIPDFComponent: Components that can render to PDFIResourceContainer: Manages document-level resourcesPDFObjectRef: Indirect object references in PDF structure
Scryber.Drawing
Purpose: Graphics primitives and typography
Key Namespaces:
Scryber.Drawing: Core types (PDFColor,PDFUnit,PDFPoint,PDFRect)Scryber.Drawing.Fonts: Font management and metricsScryber.Drawing.Svg: SVG path parsing and renderingScryber.PDF.Resources: Font resource generation
Key Types:
FontFactory: Creates and caches font instancesPDFFontResource: Manages font resources in PDF outputPDFSolidBrush,PDFSolidPen: Drawing stylesSVGPath: SVG path data parsing and rendering
Design Notes:
- Embeds standard PDF fonts as resources (Helvetica, Times, Courier, etc.)
- Uses Scryber.Core.OpenType for TrueType/OpenType parsing
- Font metrics used for text measurement during layout
Scryber.Expressive
Purpose: Expression parsing and evaluation engine, based on (bijington/expressive)[https://github.com/bijington/expressive]
Key Namespaces:
Scryber.Expressive: Core expression types and parserScryber.Expressive.Expressions: Expression tree nodesScryber.Expressive.Functions: Built-in functionsScryber.Expressive.Operators: Mathematical and logical operators
Key Types:
ExpressionParser: Tokenizes and parses expression stringsIExpression: Base interface for expression tree nodesContext: Evaluation context with variables and functionsBinaryExpressionBase: Base for operator expressionsFunctionExpression: Function call expressionsVariableExpression: Variable lookup expressions
Expression Syntax:
Variables: {{model.name}}
Properties: {{model.user.firstName}}
Indexing: {{model.items[0]}}
Math: {{price * 1.2}}
Functions: {{concat(firstName, ' ', lastName)}}
Conditionals: {{age >= 18 ? 'Adult' : 'Minor'}}
Built-in Functions:
concat(...): String concatenationif(condition, true, false): Conditional evaluationindex(): Current iteration index in templateslength(array): Array/collection length- And moreβ¦
Scryber.Styles
Purpose: CSS parsing, selector matching, and style cascading
Key Namespaces:
Scryber.Styles: Style classes and definitionsScryber.Styles.Parsing: CSS parser infrastructureScryber.Styles.Parsing.Typed: Individual CSS property parsersScryber.Styles.Selectors: Selector matching and specificity
Key Types:
CSSStyleParser: Main CSS parsing entry pointStylesDocument: Container for style collections (can be remote loaded)CSSStyleItemReader: Tokenizes CSS content- Individual parsers:
CSSBackgroundParser,CSSFontParser,CSSBorderParser, etc. StyleMatcher: Matches selectors to componentsStyleStack: Manages style inheritance and cascading
CSS Feature Support:
- Selectors: element, class, ID, attribute, pseudo-classes (
:before,:after) - Properties: Most CSS 2.1 properties plus common CSS3 features
- Variables:
var(--custom-property) - Calc:
calc(100% - 20px)(partial support) - Counters:
counter-reset,counter-increment,counter() - Content:
contentproperty for generated content
Design Pattern: Each CSS property has a dedicated typed parser class
- Example:
CSSFontParserhandlesfont,font-family,font-size, etc. - Parsers convert CSS text values to typed style objects
- Allows clean separation and easy extension
Scryber.Generation
Purpose: Reflective XML parsing and data binding infrastructure
Key Namespaces:
Scryber.Generation: Parser infrastructure and component creationScryber.Binding: Data binding and expression evaluationScryber.Binding.Expressions: Data path navigation
Key Types:
ParserDefintionFactory: Creates parser definitions for component typesParserControllerDefinition: Defines controller attachmentsBindingCalcExpressionFactory: Creates binding expressions from templatesBindingCalcParser: Integrates Expressive engine with template bindingParserItemExpression: XPath-like data navigation
Binding Architecture:
Template: <div>{{model.user.name}}</div>
β
βΌ
BindingCalcParser.Parse("model.user.name")
β
βΌ
Creates Expression Tree
β
βΌ
DataBind phase evaluates with Context
β
βΌ
Result written to component property
Scryber.Imaging
Purpose: Image loading, decoding, and PDF formatting
Key Namespaces:
Scryber.Imaging: Factory infrastructureScryber.Imaging.Formatted: PDF image data formatters
Key Types:
ImageFactoryList: Manages registered image factoriesImageFactoryJpeg,ImageFactoryPng, etc.: Format-specific factoriesPDFImageData: Abstract base for image dataPDFImageJpegData: JPEG passthrough (no re-encoding)PDFImageSharpRGB24Data,PDFImageSharpRGBA32Data: Color format converters
Design Notes:
- Uses SixLabors.ImageSharp for decoding
- JPEG images passed through without re-encoding
- Other formats converted to RGB24 or RGBA32 for PDF
- Supports data URLs:
data:image/png;base64,...
Scryber.Components
Purpose: Main PDF generation engine - orchestrates all subsystems
Key Namespaces:
Scryber.Components: Core component typesScryber.Html.Components: 80+ HTML element implementationsScryber.Html.Parsing: HTML parsing and component factoryScryber.PDF.Layout: Layout engine and layout itemsScryber.PDF.Native: PDF generation and writingScryber.Components.Lists: List and list item componentsScryber.Components.Tables: Table, row, and cell components
Key Types:
Document: Root component and main public APIPage,PageBase,Section: Page-level componentsHTMLParser: Parses HTML using HtmlAgilityPackPDFLayoutDocument: Manages layout stateLayoutEngineDocument,LayoutEnginePage, etc.: Layout enginesPDFLayoutPage,PDFLayoutBlock,PDFLayoutLine: Layout itemsPDFWriter: Low-level PDF structure writing
Component Hierarchy:
Component (abstract base)
β
βββ VisualComponent
β β
β βββ ContainerComponent
β β β
β β βββ PageBase β Page, Section
β β βββ Panel β Div, Span, Canvas
β β βββ ListItem
β β βββ TableCell
β β
β βββ Image
β βββ LineBreak
β βββ Shape (line, rectangle, etc.)
β
βββ TextLiteral (pure text)
βββ StylesDocument (external CSS)
βββ Template (repeating content)
HTML Element Mapping (examples):
<div>βHTMLDivβ extendsDivβ extendsPanel<span>βHTMLSpanβ extendsSpanβ extendsPanel<table>βHTMLTableβ extendsTableGrid<img>βHTMLImageβ extendsImage-> extendsImageBase<p>βHTMLParagraphβ extendsDiv
Component Model
Component Lifecycle
All components implement IComponent with these lifecycle phases:
1. Construction
Component created via factory or constructor
2. Init(InitContext)
- Register with document by ID
- Initialize child components
- Set up component relationships
3. Load(LoadContext)
- Load external resources (async)
- Images, fonts, CSS files
- Process remote references
4. DataBind(DataContext)
- Evaluate {{...}} expressions
- Populate templates
- Apply dynamic data
5. Layout (implicit during render)
- Build applied styles
- Measure component dimensions
- Calculate positions
- Handle page breaks
- Create layout items
6. Render (implicit during SaveAsPDF)
- Generate PDF structure
- Write to output stream
7. Dispose()
- Clean up resources
- Release cached data
Component Responsibilities
Base Component:
- Lifecycle management
- Parent/child relationships
- ID registration
- Style association
Visual Component (extends Component):
- Position and size
- Margins, padding, borders
- Background and fill
- Visibility
Container Component (extends Visual):
- Child management
- Layout strategy
- Content flow
- Page breaking
Context Objects
Context objects thread through lifecycle phases without being stored in components:
InitContext:
- Document reference
- Trace logging
- Performance tracking
LoadContext:
- Async loading support
- Resource cache
- Base URL for relative paths
DataContext:
- Data stack (scoped variables)
- Expression evaluation
- Template iteration
LayoutContext:
- Current page
- Available space
- Font resources
- Graphics state
RenderContext:
- PDF writer
- Resource registration
- Current stream
PDF Generation Pipeline
Stage 1: Parsing
Input: HTML/XML string or file path Output: Component tree
Two Parser Paths:
- XML Parser (strict XHTML):
- Uses
System.Xml.XmlReader - Requires well-formed XML
- Namespace-aware
- Fast and memory-efficient
- Uses
- HTML Parser (loose HTML):
- Uses
HtmlAgilityPack - Tolerates malformed HTML
- Auto-closes tags
- Slightly slower
- Ultimately calls back to XHTML parser with sanitised content
- Uses
Process:
HTML String
β
βΌ
HtmlDocument.Load() [HtmlAgilityPack]
β
βΌ
HTMLParser.Parse(HtmlNode)
β
βββ HTMLParserComponentFactory.Create(tag name)
β β
β βββ Creates component instance
β
βββ Recursively parse children
β
βΌ
Complete Component Tree
Component Creation:
- Factory maintains dictionary: tag name β component type
- Example:
"div"βtypeof(HTMLDiv) - Unknown tags create generic containers or are ignored
- Attributes parsed and applied to component properties
Stage 2: Initialization
Input: Component tree Output: Registered and initialized components
Process:
Document.Init(InitContext)
β
βββ Register component IDs
β (enables ID-based lookups)
β
βββ Initialize child components
β (recursive)
β
βββ Set up resource containers
(fonts, images)
Key Activities:
- Components register themselves by ID with document
- Parent-child relationships established
- Style classes validated
- Font families resolved to font definitions
Stage 3: Loading
Input: Initialized component tree Output: Tree with loaded external resources
Process:
Document.Load(LoadContext)
β
βββ Load external CSS files
β β
β βββ StylesDocument.Load()
β β
β βββ HTTP GET (async)
β
βββ Load external images
β β
β βββ Image.Load()
β β
β βββ HTTP GET (async)
β βββ ImageFactory.Load()
β β
β βββ Decode to PDFImageData
β
βββ Load external fonts
β
βββ FontFactory.GetFont()
β
βββ HTTP GET (async) for web fonts
βββ Parse TrueType/OpenType
WASM Considerations:
- All HTTP requests are async
- No blocking I/O allowed
- Can use
DocumentTimerExecutionto yield periodically - Resources cached to avoid redundant downloads
Stage 4: Data Binding
Input: Loaded component tree + data model Output: Tree with evaluated expressions
Process:
Document.DataBind(DataContext)
β
βββ For each component:
β
βββ Evaluate {{...}} in attributes
β β
β βββ BindingCalcParser.Parse()
β β
β βββ ExpressionParser.Parse()
β β
β βββ Expression tree
β
βββ Evaluate {{...}} in text content
β
βββ Process templates (<template data-bind="...">)
β β
β βββ For each item in bound collection:
β β
β βββ Clone template content
β βββ Push item to data stack
β βββ DataBind clone
β
βββ Recursively bind children
Data Context Stack:
- Context maintains stack of data scopes
model.nameresolves from current scope.name(dot prefix) means current item- Template iterations push new scope
Example:
<div>{{model.title}}</div>
<ul>
<template data-bind="{{model.items}}">
<li>{{.name}}</li> <!-- . refers to current item -->
</template>
</ul>
Stage 5: Style Resolution
Input: Data-bound component tree + CSS Output: Components with resolved styles
Process:
For each component:
β
βββ Collect applicable styles:
β β
β βββ Inline styles (highest priority)
β βββ ID selector styles
β βββ Class selector styles
β βββ Element selector styles
β βββ Inherited styles
β
βββ Calculate specificity
β β
β βββ Sort by: !important > inline > ID > class > element
β
βββ Apply cascading
β β
β βββ Merge in specificity order
|
βββ Push 'Applied Style' on to currrent Style Stack
β
βββ Resolve computed values
β β
β βββ Inherit from parent where applicable
β βββ Resolve relative units (%, em)
β βββ Evaluate var() and calc()
β βββ Build *complete style* from direct and inherited values.
β
β (continue to use the style and process children)
β
βββ Pop 'Applied Style' from the stack once processed
Style Inheritance:
Inheritance is controlled by the individual StyleKeys and the Items they belong to.
- Font properties inherit by default
- Box model properties donβt inherit
Stage 6: Layout
Input: Styled component tree
Output: PDFLayoutDocument with positioned items
Core Concept: Two-stage rendering separates measurement from output
Layout Process:
Document.RenderToPDF(context)
β
βββ LayoutEngineDocument.Layout()
β
βββ For each page component:
β β
β βββ LayoutEnginePage.Layout()
β β
β βββ Measure header
β βββ Measure footer
β βββ Calculate content area
β β
β βββ Layout content:
β β
β βββ LayoutEnginePanel.Layout()
β β
β βββ For each child:
β β β
β β βββ Measure required space
β β βββ Apply positioning
β β βββ Layout child content
β β
β βββ Handle page breaks:
β β
β βββ If content overflow
β βββ Create continuation
β βββ Split content across pages
β
βββ Returns PDFLayoutDocument
Layout Engines (by component type):
LayoutEngineDocument: Top-level coordinatorLayoutEnginePage: Page layout with header/footerLayoutEnginePanel: Block and inline flowLayoutEngineTable: Table layout with colspan/rowspanLayoutEngineList: Numbered and bulleted listsLayoutEngineText: Text flow and line breaking
Text Layout:
LayoutEngineText.Layout(available width)
β
βββ Split text into words
β
βββ For each word:
β β
β βββ Measure word width with font metrics
β β
β βββ If fits on current line:
β β βββ Add to line
β β
β βββ If doesn't fit:
β βββ Apply hyphenation if enabled
β βββ Break line
β βββ Continue on next line
β
βββ Create PDFLayoutLine items
Layout Items (output structure):
PDFLayoutDocument
β
βββ Pages: List<PDFLayoutPage>
β
βββ HeaderBlock: PDFLayoutBlock
βββ FooterBlock: PDFLayoutBlock
β
βββ ContentBlock: PDFLayoutBlock
β
βββ Columns: List<PDFLayoutRegion>
β
βββ Contents: List<PDFLayoutItem>
β
βββ PDFLayoutBlock (container)
βββ PDFLayoutLine (text)
βββ PDFLayoutRun (content)
βββ PDFLayoutImage, etc.
Box Model:
βββββββββββββββββββββββββββββββββββββββ
β Margin (transparent) β
β βββββββββββββββββββββββββββββββββ β
β β Border β β
β β βββββββββββββββββββββββββββ β β
β β β Padding β β β
β β β βββββββββββββββββββββ β β β
β β β β Content β β β β
β β β β β β β β
β β β βββββββββββββββββββββ β β β
β β β β β β
β β βββββββββββββββββββββββββββ β β
β β β β
β βββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββ
Positioning Modes:
- Block Flow (default for divs):
- Stacks vertically
- Takes full width
- Respects margins
- Inline Flow (default for spans):
- Flows horizontally
- Wraps at container edge
- Vertical alignment
- Relative Positioning:
- Offset from normal position
- Space still reserved in flow
- Absolute Positioning:
- Removed from flow
- Positioned relative to container
- No space reserved
- Float (left/right):
- Removed from flow
- Content wraps around
- Cleared with
clearproperty
Stage 7: Rendering
- Input:
PDFLayoutDocument - Output: PDF byte stream
- Co-ordinator:
PDFWriter - Helper:
PDFGraphics
The PDFWriter handles the output of content to the underlying stream, using a stack approach of PDFIndirectObjects.
The PDFLayoutItems, and associated PDFXOjects and PDFResources begin βobjectsβ, and instruct the writer what to output.
Once an indrirect object is completed and closed, then (and only then) are itβs contents written to the base stream and it will be popped from the stack.
Any further writing will then continue on the previous object in the stack.
The PDFGraphics instance acts as a higher level stream writer, to render the instructions for drawing rectangles, set up fonts and rendering characters in a way that PDF Readers understand.
Process:
PDFLayoutDocument.OutputToPDF(PDFRenderContext, PDFWriter)
β
βββ Write PDF header
β (%PDF-1.4 or later)
β
βββ Write document catalog
β β
β βββ Pages tree root
β βββ Outlines (bookmarks)
β βββ Named destinations
β βββ Metadata
β
βββ For each layout page:
β β
β βββ Create page object
β βββ Create content stream
β β β
β β βββ Write drawing commands:
β β β
β β βββ Set graphics state
β β βββ Draw backgrounds
β β βββ Draw borders
β β βββ Draw text
β β βββ Draw images
β β βββ Draw shapes
β β
β βββ Register page resources
β
βββ Write font resources
β β
β βββ Font descriptors + font programs
β
βββ Write image resources
β β
β βββ Image XObjects with data
β
βββ Write cross-reference table
β (byte offsets of all objects)
β
βββ Write trailer
(points to catalog and xref)
PDF Structure (simplified):
%PDF-1.4
1 0 obj % Document Catalog
<< /Type /Catalog
/Pages 2 0 R >>
endobj
2 0 obj % Pages Tree
<< /Type /Pages
/Kids [3 0 R]
/Count 1 >>
endobj
3 0 obj % Page 1
<< /Type /Page
/Parent 2 0 R
/Contents 4 0 R
/Resources << /Font << /F1 5 0 R >>
/XObject << /Im1 6 0 R >> >> >>
endobj
4 0 obj % Page Content Stream
<< /Length 123 >>
stream
BT
/F1 12 Tf
100 700 Td
(Hello World) Tj
ET
endstream
endobj
5 0 obj % Font Resource
<< /Type /Font
/Subtype /TrueType
... >>
endobj
6 0 obj % Image Resource
<< /Type /XObject
/Subtype /Image
/Width 100
/Height 100
... >>
endobj
xref % Cross-reference table
0 7
0000000000 65535 f
0000000009 00000 n
...
trailer
<< /Size 7
/Root 1 0 R >>
startxref
1234
%%EOF
Subsystem Deep Dive
CSS Parser Architecture
Entry Point: CSSStyleParser.ParseCSS(string cssContent)
Parse Flow:
CSS String
β
βΌ
CSSStyleItemReader (tokenizer)
β
βββ Remove /* comments */
βββ Identify selectors
βββ Extract property blocks
β
βΌ
For each selector + properties:
β
βββ Create Style object
β
βββ For each property:
β
βββ Route to typed parser:
β
βββ CSSFontParser for font-*
βββ CSSColorParser for color, background-color
βββ CSSBorderParser for border-*
βββ CSSPaddingParser for padding-*
βββ etc.
β
βββ Parse value and set on Style
Example CSS Property Parser:
// CSSFontParser.cs
public class CSSFontParser : CSSStyleAttributeParser<FontStyle>
{
protected override bool DoSetStyleValue(Style style,
CSSStyleItemReader reader,
PDFContextBase context)
{
string value = reader.CurrentTextValue;
if(property == "font-family")
{
style.Font.FontFamily = ParseFontFamily(value);
}
else if(property == "font-size")
{
style.Font.FontSize = ParseUnit(value);
}
// ... more properties
return true;
}
}
CSS Specificity Calculation:
Inline styles: 1000 points
ID selectors: 100 points
Class selectors: 10 points
Element selectors: 1 point
Example:
div.header β 1 + 10 = 11
#main β 100
div#main.header β 1 + 100 + 10 = 111
style="..." β 1000
Expression Engine Architecture
Expression Grammar:
expression := term (('+' | '-') term)*
term := factor (('*' | '/') factor)*
factor := number | string | variable | function | '(' expression ')'
variable := identifier ('.' identifier | '[' expression ']')*
function := identifier '(' arguments ')'
arguments := expression (',' expression)*
Parse Example:
Input: "{{(model.price * 1.2)}}"
Tokenize:
OPEN_BRACE
IDENTIFIER("model")
DOT
IDENTIFIER("price")
MULTIPLY
NUMBER(1.2)
CLOSE_BRACE
Parse to Expression Tree:
BinaryExpression(*)
βββ PropertyExpression
β βββ VariableExpression("model")
β βββ Property("price")
βββ ConstantExpression(1.2)
Evaluate with Context:
context["model"] = { price: 10.0 }
PropertyExpression.Evaluate()
β model.price β 10.0
BinaryExpression.Evaluate()
β 10.0 * 1.2 β 12.0
Result: 12.0
Function Evaluation:
// Built-in function: concat
public class ConcatFunction : IFunction
{
public string Name => "concat";
public object Evaluate(IExpression[] arguments, Context context)
{
var values = arguments.Select(a => a.Evaluate(context));
return string.Concat(values);
}
}
// Usage in template:
{{concat(user.firstName, ' ', user.lastName)}}
Layout Engine Architecture
Layout Engine Selection:
// Panel.cs - implements IPDFViewPortComponents so has a GetEngine() function
public IPDFLayoutEngine GetEngine(Component component)
{
if(component is PageBase)
return new LayoutEnginePage();
else if(component is TableGrid)
return new LayoutEngineTable();
else if(component is ListOrdered || component is ListUnordered)
return new LayoutEngineList();
else if(component is Panel)
return new LayoutEnginePanel();
else if(component is Label || component is TextLiteral)
return new LayoutEngineText();
// ... more types
}
Layout Algorithm (simplified):
NOTE: Most layout engines inherit from Scryber.PDF.Layout.LayoutEngineBase (or the higher level LayoutEnginePanel) and override the methods they are interested in altering. Implemnting a complete layout engine is an arduous task.
// LayoutEnginePanel.cs
public override void DoLayoutComponent()
{
Panel panel = (Panel)this.Component;
PDFLayoutContext context = this.Context;
PDFRect availableSpace = context.Space;
// Create block for this panel
PDFLayoutBlock block = new PDFLayoutBlock(this.FullStyle);
block.Position = availableSpace.Location;
var container = context.CurrentPage.LastAavailableBlock();
if(container.AvailableSpace < block.RequiredSize)
if(!this.BeginNewRegion()) return;
foreach(var child in panel.Children)
{
// Get layout engine for child
var childEngine = GetEngine(child);
// Calculate available space for child
PDFRect childSpace = new PDFRect(
x: availableSpace.X + child.Margins.Left,
y: currentY,
width: availableSpace.Width - child.Margins.Horizontal,
height: availableSpace.Height - currentY
);
// Layout child
context.Space = childSpace;
childEngine.Layout(context, child);
// Get child's layout block
PDFLayoutBlock childBlock = context.DocumentLayout.CurrentPage
.LastBlock;
// Move Y position down
currentY += childBlock.Height + child.Margins.Bottom;
// Check for page break
if(currentY > availableSpace.Height)
{
// Create new page
context.DocumentLayout.AddPage();
currentY = 0;
}
// Add child block to panel block
block.Add(childBlock);
}
block.Height = currentY;
context.DocumentLayout.CurrentPage.Add(block);
}
Text Line Breaking:
// LayoutEngineText.cs (simplified)
public void LayoutTextLine(string text, PDFUnit availableWidth,
PDFFont font, PDFUnit fontSize)
{
List<string> words = SplitIntoWords(text);
PDFLayoutLine currentLine = new PDFLayoutLine();
PDFUnit currentWidth = 0;
foreach(string word in words)
{
PDFUnit wordWidth = MeasureWord(word, font, fontSize);
if(currentWidth + wordWidth <= availableWidth)
{
// Word fits on current line
currentLine.Add(new PDFTextRun(word));
currentWidth += wordWidth;
}
else
{
// Word doesn't fit - check hyphenation
if(EnableHyphenation && wordWidth > availableWidth * 0.5)
{
var parts = HyphenateWord(word);
currentLine.Add(new PDFTextRun(parts.First + "-"));
FinishLine(currentLine);
// Continue with remaining part
currentLine = new PDFLayoutLine();
currentLine.Add(new PDFTextRun(parts.Second));
currentWidth = MeasureWord(parts.Second, font, fontSize);
}
else
{
// Break line and continue
FinishLine(currentLine);
currentLine = new PDFLayoutLine();
currentLine.Add(new PDFTextRun(word));
currentWidth = wordWidth;
}
}
}
FinishLine(currentLine);
}
Resource Management
Resource Lifecycle:
1. Request Resource
Component needs font/image during Load or Layout
2. Check Cache
IResourceContainer.TryGetResource(key)
3. If Not Cached:
βββ Load resource (file, HTTP, embedded)
βββ Create resource object (PDFFontResource, PDFImageXObject)
βββ Register with document: IResourceContainer.AddResource(key, resource)
βββ Return resource
4. If Cached:
βββ Return cached resource
5. During Render:
βββ Resource registers itself in page resources dictionary
βββ PDF writer outputs resource object once
6. References:
Multiple components reference same resource by name
(e.g., /F1 for font, /Im1 for image)
Example: Font Resource:
Component A needs Helvetica 12pt
β
βββ FontFactory.GetFont("Helvetica", 12)
β
βββ Check cache: "Helvetica-12"
β Not found
β
βββ Load Helvetica font definition
βββ Create PDFFontResource
βββ Cache: fonts["Helvetica-12"] = resource
βββ Return resource
Component B needs Helvetica 12pt
β
βββ FontFactory.GetFont("Helvetica", 12)
β
βββ Check cache: "Helvetica-12"
Found β Return cached resource
During Render:
Page 1 renders Component A
βββ References font as /F1
Page 2 renders Component B
βββ References same font as /F1
Font written to PDF once:
5 0 obj
<< /Type /Font
/Subtype /TrueType
/BaseFont /Helvetica
... >>
endobj
Data Flow
Complete Example: HTML to PDF
Input HTML:
<!DOCTYPE html>
<html>
<head>
<style>
.header {
font-size: 24pt;
color: #336699;
margin-bottom: 20pt;
}
.item {
margin: 10pt;
padding: 5pt;
border: 1pt solid black;
}
</style>
</head>
<body>
<div class="header">{{model.title}}</div>
<div>
<template data-bind="{{model.items}}">
<div class="item">{{.name}}: ${{.price}}</div>
</template>
</div>
</body>
</html>
Data Model:
var model = new {
title = "Product List",
items = new[] {
new { name = "Widget", price = 10.00 },
new { name = "Gadget", price = 25.50 }
}
};
Data Flow:
1. Parse (HTML β Components):
HTMLParser.Parse(html)
β
βββ HTMLBody
β β
β βββ HTMLDiv (class="header")
β β βββ TextLiteral("{{model.title}}")
β β
β βββ HTMLDiv
β βββ HTMLTemplate (data-bind="{{model.items}}")
β βββ HTMLDiv (class="item")
β βββ TextLiteral("{{.name}}: ${{.price}}")
2. Init (Register components):
Document.Init(context)
βββ Each component initializes
(No visual change, just setup)
3. Load (External resources):
Document.Load(context)
β
βββ StylesDocument.Load()
β
βββ CSSStyleParser.Parse(<style> content)
β β
β βββ Parse .header { font-size: 24pt; ... }
β β βββ Create Style with specificity 10
β β
β βββ Parse .item { margin: 10pt; ... }
β βββ Create Style with specificity 10
β
βββ Register styles with document
4. DataBind (Evaluate expressions):
Document.DataBind(context)
β
βββ HTMLDiv (class="header")
β β
β βββ TextLiteral
β β
β βββ Parse "{{model.title}}"
β βββ Evaluate with context (model.title = "Product List")
β βββ Set text = "Product List"
β
βββ HTMLTemplate (data-bind="{{model.items}}")
β
βββ Parse "{{model.items}}"
βββ Evaluate β returns array with 2 items
β
βββ For each item:
β
βββ Clone template content (HTMLDiv with TextLiteral)
βββ Push item to data stack
β
βββ DataBind clone:
β β
β βββ Parse "{{.name}}: ${{.price}}"
β β
β βββ .name evaluates to "Widget" (1st) / "Gadget" (2nd)
β βββ .price evaluates to 10.00 / 25.50
β βββ Result: "Widget: $10.00" / "Gadget: $25.50"
β
βββ Add clone to parent
Result Component Tree:
HTMLBody
βββ HTMLDiv (class="header")
β βββ TextLiteral("Product List")
β
βββ HTMLDiv
βββ HTMLDiv (class="item")
β βββ TextLiteral("Widget: $10.00")
β
βββ HTMLDiv (class="item")
βββ TextLiteral("Gadget: $25.50")
5. Style Resolution:
For HTMLDiv (class="header"):
β
βββ Match selectors:
β βββ .header (specificity: 10)
β
βββ Apply styles:
β βββ font-size: 24pt
β βββ color: #336699
β βββ margin-bottom: 20pt
β
βββ Store computed style
For each HTMLDiv (class="item"):
β
βββ Match selectors:
β βββ .item (specificity: 10)
β
βββ Apply styles:
β βββ margin: 10pt
β βββ padding: 5pt
β βββ border: 1pt solid black
β
βββ Store computed style
6. Layout:
LayoutEngineDocument.Layout()
β
βββ LayoutEnginePage.Layout()
β
βββ LayoutEnginePanel.Layout(HTMLBody)
β
βββ Layout HTMLDiv.header:
β β
β βββ LayoutEngineText("Product List", 24pt)
β β β
β β βββ Measure: width=150pt, height=24pt
β β βββ Create PDFLayoutLine
β β
β βββ Add margin-bottom: 20pt
β Total height: 44pt
β
βββ Layout HTMLDiv (container):
β β
β βββ Layout HTMLDiv.item (1):
β β β
β β βββ Add margin: 10pt
β β βββ Add padding: 5pt
β β βββ LayoutEngineText("Widget: $10.00")
β β β βββ Measure: 80pt x 12pt
β β βββ Add border: 1pt
β β βββ Total: 102pt x 34pt
β β
β βββ Layout HTMLDiv.item (2):
β βββ (same process)
β Total: 102pt x 34pt
β
βββ Create PDFLayoutDocument:
β
βββ Page 1:
βββ Block (y=0, h=44pt): "Product List"
βββ Block (y=44pt, h=34pt): "Widget: $10.00"
βββ Block (y=78pt, h=34pt): "Gadget: $25.50"
7. Render (Layout β PDF):
PDFLayoutDocument.OutputToPDF(writer)
β
βββ Write page object
β
βββ Write content stream:
β β
β βββ Block 1 (header):
β β β
β β βββ Set color: 0.2 0.4 0.6 rg
β β βββ Set font: /F1 24 Tf
β β βββ Position: 0 750 Td
β β βββ Draw text: (Product List) Tj
β β
β βββ Block 2 (item 1):
β β β
β β βββ Draw border:
β β β 10 706 92 24 re
β β β S
β β β
β β βββ Set font: /F1 12 Tf
β β βββ Position: 15 711 Td
β β βββ Draw text: (Widget: $10.00) Tj
β β
β βββ Block 3 (item 2):
β βββ (similar)
β
βββ Write font resources:
/Font << /F1 5 0 R >>
Result PDF:
Page with formatted content
Design Patterns
1. Component Pattern
Purpose: Uniform treatment of individual and composite components
Structure:
IComponent: Common interface- Leaf components:
Label,Image,Shape - Composite components:
Panel,Page,Table
Benefits:
- Recursive operations (Init, Load, DataBind)
- Uniform lifecycle management
- Easy to add new component types
2. Factory Pattern
Purpose: Decouple component creation from usage
Examples:
HTMLParserComponentFactory: HTML tag β ComponentImageFactoryList: Image format β Image handlerFontFactory: Font family β Font resource
Benefits:
- Centralized creation logic
- Easy to extend with new types
- Configuration-driven instantiation
3. Strategy Pattern
Purpose: Select algorithm at runtime
Examples:
IPDFLayoutEngine: Different layout strategies per component typeCSSStyleAttributeParser<T>: Different parsing strategies per CSS propertyImageFactoryBase: Different decoding strategies per format
Benefits:
- Algorithm encapsulation
- Easy to add new strategies
- Runtime selection based on component type
4. Visitor Pattern (Context Objects)
Purpose: Separate operations from object structure
Examples:
InitContext,LoadContext,DataContextpassed through tree- Operations (Init, Load, DataBind) implemented as methods
- Context accumulates state without modifying components
Benefits:
- Components remain stateless
- Easy to add new operations
- Clear separation of concerns
5. Template Method Pattern
Purpose: Define algorithm skeleton, defer steps to subclasses
Examples:
LayoutEngineBase.Layout(): Common setup, subclasses implement specificsCSSStyleAttributeParser.DoSetStyleValue(): Framework calls, subclass implements
Benefits:
- Code reuse through inheritance
- Enforces consistent workflow
- Extension points for customization
6. Flyweight Pattern
Purpose: Share common data to reduce memory
Examples:
ISharedResource: Fonts and images cached and shared- Style objects marked immutable after calculation
- Font metrics shared across all uses
Benefits:
- Reduced memory footprint
- Smaller PDF file size
- Faster resource lookup
7. Builder Pattern
Purpose: Construct complex objects step by step
Examples:
PDFWriter: Builds PDF structure incrementally- Document construction through parsing
- Layout item construction during layout phase
Benefits:
- Stepwise construction
- Immutable result
- Clear construction process
Extension Architecture
These guides need updating and separating out. Feel free to read, but the examples and descriptions should not be relied on.
There are other examples in the Configuration Guide that are more complete.
Adding Custom Components
1. Define Component Class:
public class CustomBanner : Panel
{
public string BannerText { get; set; }
public PDFColor BannerColor { get; set; } = PDFColors.Blue;
protected override void DoDataBind(DataContext context)
{
base.DoDataBind(context);
// Add custom data binding logic
if(!string.IsNullOrEmpty(BannerText))
{
var label = new Label();
label.Text = BannerText;
label.ForeColor = BannerColor;
this.Contents.Add(label);
}
}
}
2. Register with Factory (for HTML parsing):
public class CustomComponentFactory : HTMLParserComponentFactory
{
public CustomComponentFactory()
{
// Register custom tag
this.RegisterTag("banner", typeof(CustomBanner));
}
}
// Use custom factory
var parser = new HTMLParser(new CustomComponentFactory());
var doc = parser.Parse(html);
3. Use in HTML:
<banner banner-text="Welcome!" banner-color="#FF0000" />
Adding Custom CSS Properties
1. Define Style Property:
public class CustomStyle : StyleBase
{
public string CustomProperty { get; set; }
}
2. Create Parser:
public class CSSCustomParser : CSSStyleAttributeParser<CustomStyle>
{
public CSSCustomParser()
{
// Register CSS property name
this.RegisterProperty("custom-property");
}
protected override bool DoSetStyleValue(Style style,
CSSStyleItemReader reader,
PDFContextBase context)
{
string value = reader.CurrentTextValue;
if(reader.CurrentAttribute == "custom-property")
{
style.Custom.CustomProperty = value;
return true;
}
return false;
}
}
3. Register Parser:
CSSStyleParser.RegisterParser(new CSSCustomParser());
4. Use in CSS:
.my-class {
custom-property: "my value";
}
Adding Custom Expression Functions
1. Implement Function:
public class UpperFunction : IFunction
{
public string Name => "upper";
public object Evaluate(IExpression[] arguments, Context context)
{
if(arguments.Length != 1)
throw new ArgumentException("upper() requires 1 argument");
var value = arguments[0].Evaluate(context);
return value?.ToString().ToUpper();
}
}
2. Register Function:
var context = new Context();
context.RegisterFunction(new UpperFunction());
3. Use in Template:
<div>{{upper(model.name)}}</div>
Adding Custom Layout Engine
1. Implement Engine:
public class CustomLayoutEngine : LayoutEngineBase
{
public override void Layout(PDFLayoutContext context,
Component component)
{
CustomComponent custom = (CustomComponent)component;
// Measure component
PDFSize size = MeasureComponent(custom, context);
// Create layout block
PDFLayoutBlock block = context.DocumentLayout
.CurrentPage
.CreateBlock();
block.Position = context.Space.Location;
block.Size = size;
// Layout children if any
foreach(var child in custom.Contents)
{
LayoutChild(child, context);
}
// Register block
context.DocumentLayout.CurrentPage.Add(block);
}
}
2. Register Engine:
public class CustomComponent : VisualComponent
{
public override IPDFLayoutEngine GetEngine()
{
return new CustomLayoutEngine();
}
}
Performance Considerations
Memory Management
1. Resource Sharing:
- Fonts and images cached at document level
- Multiple references to same resource
- Reduces memory footprint and PDF size
2. Lazy Loading:
- External resources loaded only when needed
- Images not decoded until layout phase
- CSS parsed on demand
3. Layout Item Pooling:
- Consider pooling for high-volume scenarios
- Current implementation creates new objects
- Potential optimization for large documents
Layout Performance
1. Two-Stage Rendering:
- Layout phase separate from render phase
- Allows optimization and pre-calculation
- Can abort before rendering if layout fails
2. Incremental Layout:
- Components laid out as encountered
- Page breaks handled during layout
- No need to layout entire document at once
3. Font Metrics Caching:
- Character widths cached per font
- Reduces measurement overhead
- Critical for text-heavy documents
Async Operations
1. WASM Compatibility:
- All I/O operations async
- No blocking calls
- Uses
async/awaitthroughout
2. Parallel Resource Loading:
- Images and fonts can load in parallel
- HttpClient for remote resources
- Reduces total load time
3. Timer Execution:
DocumentTimerExecutionyields periodically- Keeps UI responsive during generation
- Essential for large documents in WASM
PDF Output Optimization
1. Stream Compression:
- Content streams can be compressed
- Uses FlateDecode filter
- Reduces file size by 50-70%
2. Resource Deduplication:
- Identical resources referenced once
- Image hash checking
- Font embedding optimized
3. Incremental Writing:
- Objects written as created
- No need to buffer entire PDF
- Supports streaming to output
Profiling and Diagnostics
1. Trace Logging:
- Performance logging available
- Track time per pipeline stage
- Identify bottlenecks
2. Memory Profiling:
- Monitor resource cache size
- Track layout item count
- Detect memory leaks
3. Layout Diagnostics:
- Can dump layout tree
- Visualize box model
- Debug positioning issues
Conclusion
Scryber.Coreβs architecture enables sophisticated PDF generation through:
- Clean Separation: Specialized projects with clear responsibilities
- Extensibility: Multiple extension points for customization
- Performance: Optimized resource management and lazy loading
- Standards Compliance: CSS box model and HTML semantics
- Modern .NET: Multi-targeting, async/await, WASM compatible
The pipeline architecture (Parse β Init β Load β DataBind β Style β Layout β Render) provides natural breakpoints for debugging and extension, while the component model ensures consistent behavior across all element types.
Understanding this architecture enables effective development, debugging, and extension of the Scryber.Core PDF generation engine.