Technology — Image Processing Software

Where pixels become decisions

From deterministic rule-based measurement to deep learning classification, we deploy the right combination of software techniques to turn raw image data into reliable, real-time quality decisions on your production line.

Machine vision image processing software interface

Processing techniques

< 5 ms

Rule-based cycle time

1/50 px

Sub-pixel accuracy

360°

Rotation invariance

Jump to:

Rule-Based Deep Learning CNNs Transformers Sub-Pixel Pattern Matching Visual Programming

Traditional Rule-Based Vision

Deterministic precision through hard-coded logic

Rule-based vision relies on explicitly programmed geometric and mathematical rules — measure pixel distances, count features, threshold intensities. It is highly deterministic and repeatable, but requires controlled conditions and identical parts.

How it works

An engineer manually defines strict processing steps: find edges using gradient operators, measure the pixel distance between Edge A and Edge B, compare the result to a tolerance band. Every decision is an explicit mathematical operation with no ambiguity. The system either passes or fails a part based on exact numerical thresholds.

Think of it this way: Like using a rigid metal stencil to check a manufactured part. If the part fits perfectly inside the stencil, it passes. If the part is slightly warped or the lighting shifts a shadow, it fails immediately because it doesn't match the exact mathematical rules.

Best for

Ultra-precise dimensional measurement and gauging
Alignment and positioning with known geometries
Go/no-go checks on identical parts
Applications with tightly controlled lighting and presentation
Regulatory environments requiring deterministic, auditable logic

Key specifications

Sub-pixel measurement accuracy (1/10th to 1/50th pixel)
Deterministic, repeatable results — no inference variability
Processing times typically < 5 ms per frame
No training data required — rules defined by engineers
Full auditability of every pass/fail decision

Deep Learning (Neural Networks)

Learned intelligence from example images

Instead of writing explicit rules, deep learning systems are trained on hundreds or thousands of labeled example images. The algorithm autonomously learns the abstract features that distinguish a defect from an acceptable variation — handling complexity that no hand-written rule can capture.

How it works

A neural network is fed labeled training images ("good" and "bad" examples). Through iterative optimization, the network adjusts millions of internal parameters to learn statistical patterns that correlate with each class. Once trained, it classifies new, never-before-seen images by recognizing the same learned patterns — generalizing from examples rather than following explicit instructions.

Think of it this way: Like training a new employee on an assembly line by handing them a stack of flashcards. You show them 50 pictures of bruised apples and 50 pictures of healthy apples. Eventually they develop an intuitive "gut feeling" for what a bruise looks like, even if the next bruise is a completely different shape.

Best for

Complex, unpredictable, or organic defect detection
Scratches on textured metal or fabric weave defects
Classification tasks with high visual variability
Applications where writing mathematical rules is impractical
Anomaly detection with "good-only" training data

Key specifications

Training on 50–5,000+ labeled images per class (application-dependent)
Inference times from 5 ms to 100 ms per image (GPU-dependent)
Confidence scoring for every classification decision
Transfer learning from pre-trained models reduces data needs
Continuous retraining as new defect types emerge

Convolutional Neural Networks (CNNs)

Localized feature extraction at scale

CNNs are a specific deep learning architecture purpose-built for image data. They slide small mathematical filters across the image to detect local features — edges, corners, textures — and progressively combine them in deeper layers to recognize complex objects and defects.

How it works

A series of convolutional layers apply small, learned filter kernels (typically 3×3 or 5×5 pixels) that scan across the entire image. Early layers detect primitive features like edges and gradients. Deeper layers combine these into increasingly abstract representations — corners become shapes, shapes become objects. Pooling layers reduce spatial resolution between convolutions, making the network efficient and translation-invariant.

Think of it this way: Imagine inspecting a massive wall painting by holding a small magnifying glass and scanning it inch by inch, top to bottom. You note down local details — a brush stroke here, a color gradient there — and piece them together later to figure out what the entire painting depicts.

Best for

Fast inference on standard image resolutions
Detecting localized defects (pinholes, cracks, blobs)
Real-time classification on production lines
Object detection and instance segmentation
Applications requiring embedded/edge deployment

Key specifications

Architectures: ResNet, EfficientNet, YOLO, U-Net
Inference < 10 ms on modern GPUs
Scales from embedded edge devices to multi-GPU servers
Supports classification, detection, and segmentation tasks
Well-suited for fixed-resolution, consistent image inputs

Vision Transformers (ViTs)

Global context through self-attention

Adapted from natural language processing, Vision Transformers divide an image into discrete patches and use self-attention to compare every patch to every other patch simultaneously. This gives them an immediate understanding of global context that CNNs build only gradually through depth.

How it works

The image is split into a grid of fixed-size patches (e.g., 16×16 pixels). Each patch is flattened into a vector and passed through a transformer encoder. The self-attention mechanism computes a weighted relationship between every pair of patches, allowing the network to understand how a feature in one corner of the image relates to a feature in the opposite corner — in a single layer, without sequential scanning.

Think of it this way: Imagine cutting a photograph into a jigsaw puzzle and spreading all the pieces on a table. Instead of examining one piece at a time, you look at every piece simultaneously and instantly understand how a dark piece in the bottom-left corner relates to a bright piece in the top-right corner.

Best for

Complex scenes requiring global spatial understanding
Finding anomalies hidden in cluttered or textured backgrounds
Large image analysis where distant features are correlated
Multi-object relationship reasoning
State-of-the-art accuracy on challenging classification benchmarks

Key specifications

Architectures: ViT, DeiT, Swin Transformer
Self-attention over all image patches simultaneously
Typically requires larger training datasets than CNNs
Higher compute requirements — best on GPU/accelerator hardware
Excels when global context matters more than local features

Sub-Pixel Edge Detection

Measurement beyond the physical pixel grid

Sub-pixel algorithms use mathematical interpolation to estimate the true position of an edge to 1/10th or even 1/50th of a physical pixel. This breaks through the resolution limit of the camera sensor, delivering micrometer-level accuracy without needing excessively expensive, ultra-high-resolution hardware.

How it works

When an edge falls between physical pixels, the camera records a gradual brightness transition across several pixels rather than a sharp step. Sub-pixel algorithms model this blur profile mathematically — fitting a curve (typically Gaussian or error function) to the intensity gradient — and calculate the precise fractional pixel position where the true edge lies.

Think of it this way: Imagine trying to measure the exact edge of a fuzzy, blurry shadow using a school ruler. Instead of just picking the grayest part, you use an advanced math formula to calculate exactly where the light ends and the dark begins — giving you a measurement far more precise than the lines on your ruler.

Best for

High-precision industrial metrology
Semiconductor pin and lead measurement
Gear tooth and micro-feature gauging
Achieving micrometer accuracy with standard camera hardware
Telecentric lens pairings for calibrated measurement

Key specifications

Edge position accuracy to 1/10th–1/50th of a pixel
Gaussian, parabolic, and moment-based interpolation methods
Works with both area scan and line scan image data
Processing time typically < 1 ms per edge measurement
Pairs with calibrated telecentric optics for traceable metrology

Geometric Pattern Matching

Find, locate, and align with invariance

Pattern matching searches a live camera image for a known reference template, locating it regardless of rotation, scale, partial occlusion, or lighting variation. It returns the precise position and orientation of the found object — the foundation of robot guidance and part alignment.

How it works

A reference template (or "model") of the target is created from a training image. The algorithm encodes the geometric features of this template — edge gradients, contour shapes, spatial relationships — into a compact descriptor. At runtime, it searches the full image for regions that best match this descriptor, scoring candidates by geometric similarity rather than raw pixel intensity.

Think of it this way: The machine vision equivalent of playing "Where's Waldo?" — you know exactly what Waldo looks like, and you scan a chaotic, crowded scene to find him, even if he's standing upside down or partially hidden behind a building.

Best for

Robotic pick-and-place guidance
Part localization on moving conveyors
Alignment and registration for multi-step inspection
Locating features of interest before detailed measurement
Identifying parts in cluttered or overlapping scenes

Key specifications

Rotation invariant (0°–360°)
Scale invariant across defined ranges
Robust to partial occlusion (up to 40–60%)
Sub-pixel position and angle accuracy
Search times typically < 5 ms for single-model matching

Visual Programming (No-Code Interfaces)

Drag-and-drop vision system configuration

Visual programming environments replace complex code with drag-and-drop flowchart interfaces. Operators build inspection sequences by connecting algorithm blocks — Find Edge, Measure Distance, Classify Defect — without writing a single line of C++ or Python.

How it works

The software provides a library of pre-built processing blocks, each representing a specific algorithm or I/O operation. Users drag blocks onto a visual canvas and connect them in sequence — defining the data flow from image acquisition through processing to pass/fail output. Parameters are configured through graphical dialogs rather than code. The underlying engine compiles and optimizes the flowchart for real-time execution.

Think of it this way: Like building a complex machine out of digital LEGO blocks, or drawing a flowchart on a whiteboard. You snap the processing steps together in the order you want them to happen — no programming degree required.

Best for

Enabling line operators to configure and adjust inspections
Rapid prototyping and proof-of-concept development
Applications requiring frequent recipe changes
Training and maintenance by non-programming staff
Reducing dependence on specialized software engineers

Key specifications

Drag-and-drop flowchart-based interfaces
Pre-built algorithm libraries (edge detection, measurement, OCR, barcode)
Integrated deep learning training tools in modern platforms
Real-time execution with hardware-optimized backends
Platforms: HALCON, VisionPro, MERLIC, NeuroCheck, In-Sight

Choosing the right approach

The best vision software combines multiple techniques — rule-based measurement alongside deep learning classification, all within a visual programming environment. Here are the factors that guide the mix.

Defect predictability

If defects are consistent and well-defined, rule-based methods offer speed and auditability. If defects are variable or organic, deep learning adapts where rules cannot.

Training data availability

Deep learning requires labeled examples. If you have few samples, rule-based or anomaly detection approaches may be more practical initially.

Accuracy requirements

Sub-pixel edge detection and calibrated measurement pipelines deliver micrometer-level precision. Classification tasks may tolerate lower spatial accuracy.

Processing speed

Rule-based algorithms and CNNs run in single-digit milliseconds. Vision Transformers and complex deep learning may need GPU acceleration to meet cycle time.

Auditability & regulation

Regulated industries may require deterministic, fully traceable decision logic — favoring rule-based methods over black-box neural networks.

Team skillset

Visual programming platforms empower operators directly. Custom deep learning pipelines require data science expertise. Match the tool to your team.

Discuss your application

Related technology

Software is only as good as the image it receives. Explore the hardware that makes it possible.

Camera Systems

The imaging hardware that feeds data into your processing pipeline.

Explore

Optics & Lenses

Precision optics that determine the quality of every pixel your software analyzes.

Explore

Lighting Solutions

Illumination that creates the contrast your algorithms depend on.

Explore

Ready to solve your vision challenge?

Tell us about your application. Our engineers will evaluate your requirements and recommend the right approach — no obligation.

Request a consultation View case studies