Technology — Image Processing Software
Where pixels become decisions
From deterministic rule-based measurement to deep learning classification, we deploy the right combination of software techniques to turn raw image data into reliable, real-time quality decisions on your production line.

7
Processing techniques
< 5 ms
Rule-based cycle time
1/50 px
Sub-pixel accuracy
360°
Rotation invariance
Traditional Rule-Based Vision
Deterministic precision through hard-coded logic
Rule-based vision relies on explicitly programmed geometric and mathematical rules — measure pixel distances, count features, threshold intensities. It is highly deterministic and repeatable, but requires controlled conditions and identical parts.
How it works
An engineer manually defines strict processing steps: find edges using gradient operators, measure the pixel distance between Edge A and Edge B, compare the result to a tolerance band. Every decision is an explicit mathematical operation with no ambiguity. The system either passes or fails a part based on exact numerical thresholds.
Think of it this way: Like using a rigid metal stencil to check a manufactured part. If the part fits perfectly inside the stencil, it passes. If the part is slightly warped or the lighting shifts a shadow, it fails immediately because it doesn't match the exact mathematical rules.
Best for
- Ultra-precise dimensional measurement and gauging
- Alignment and positioning with known geometries
- Go/no-go checks on identical parts
- Applications with tightly controlled lighting and presentation
- Regulatory environments requiring deterministic, auditable logic
Key specifications
- Sub-pixel measurement accuracy (1/10th to 1/50th pixel)
- Deterministic, repeatable results — no inference variability
- Processing times typically < 5 ms per frame
- No training data required — rules defined by engineers
- Full auditability of every pass/fail decision
Deep Learning (Neural Networks)
Learned intelligence from example images
Instead of writing explicit rules, deep learning systems are trained on hundreds or thousands of labeled example images. The algorithm autonomously learns the abstract features that distinguish a defect from an acceptable variation — handling complexity that no hand-written rule can capture.
How it works
A neural network is fed labeled training images ("good" and "bad" examples). Through iterative optimization, the network adjusts millions of internal parameters to learn statistical patterns that correlate with each class. Once trained, it classifies new, never-before-seen images by recognizing the same learned patterns — generalizing from examples rather than following explicit instructions.
Think of it this way: Like training a new employee on an assembly line by handing them a stack of flashcards. You show them 50 pictures of bruised apples and 50 pictures of healthy apples. Eventually they develop an intuitive "gut feeling" for what a bruise looks like, even if the next bruise is a completely different shape.
Best for
- Complex, unpredictable, or organic defect detection
- Scratches on textured metal or fabric weave defects
- Classification tasks with high visual variability
- Applications where writing mathematical rules is impractical
- Anomaly detection with "good-only" training data
Key specifications
- Training on 50–5,000+ labeled images per class (application-dependent)
- Inference times from 5 ms to 100 ms per image (GPU-dependent)
- Confidence scoring for every classification decision
- Transfer learning from pre-trained models reduces data needs
- Continuous retraining as new defect types emerge
Convolutional Neural Networks (CNNs)
Localized feature extraction at scale
CNNs are a specific deep learning architecture purpose-built for image data. They slide small mathematical filters across the image to detect local features — edges, corners, textures — and progressively combine them in deeper layers to recognize complex objects and defects.
How it works
A series of convolutional layers apply small, learned filter kernels (typically 3×3 or 5×5 pixels) that scan across the entire image. Early layers detect primitive features like edges and gradients. Deeper layers combine these into increasingly abstract representations — corners become shapes, shapes become objects. Pooling layers reduce spatial resolution between convolutions, making the network efficient and translation-invariant.
Think of it this way: Imagine inspecting a massive wall painting by holding a small magnifying glass and scanning it inch by inch, top to bottom. You note down local details — a brush stroke here, a color gradient there — and piece them together later to figure out what the entire painting depicts.
Best for
- Fast inference on standard image resolutions
- Detecting localized defects (pinholes, cracks, blobs)
- Real-time classification on production lines
- Object detection and instance segmentation
- Applications requiring embedded/edge deployment
Key specifications
- Architectures: ResNet, EfficientNet, YOLO, U-Net
- Inference < 10 ms on modern GPUs
- Scales from embedded edge devices to multi-GPU servers
- Supports classification, detection, and segmentation tasks
- Well-suited for fixed-resolution, consistent image inputs
Vision Transformers (ViTs)
Global context through self-attention
Adapted from natural language processing, Vision Transformers divide an image into discrete patches and use self-attention to compare every patch to every other patch simultaneously. This gives them an immediate understanding of global context that CNNs build only gradually through depth.
How it works
The image is split into a grid of fixed-size patches (e.g., 16×16 pixels). Each patch is flattened into a vector and passed through a transformer encoder. The self-attention mechanism computes a weighted relationship between every pair of patches, allowing the network to understand how a feature in one corner of the image relates to a feature in the opposite corner — in a single layer, without sequential scanning.
Think of it this way: Imagine cutting a photograph into a jigsaw puzzle and spreading all the pieces on a table. Instead of examining one piece at a time, you look at every piece simultaneously and instantly understand how a dark piece in the bottom-left corner relates to a bright piece in the top-right corner.
Best for
- Complex scenes requiring global spatial understanding
- Finding anomalies hidden in cluttered or textured backgrounds
- Large image analysis where distant features are correlated
- Multi-object relationship reasoning
- State-of-the-art accuracy on challenging classification benchmarks
Key specifications
- Architectures: ViT, DeiT, Swin Transformer
- Self-attention over all image patches simultaneously
- Typically requires larger training datasets than CNNs
- Higher compute requirements — best on GPU/accelerator hardware
- Excels when global context matters more than local features
Sub-Pixel Edge Detection
Measurement beyond the physical pixel grid
Sub-pixel algorithms use mathematical interpolation to estimate the true position of an edge to 1/10th or even 1/50th of a physical pixel. This breaks through the resolution limit of the camera sensor, delivering micrometer-level accuracy without needing excessively expensive, ultra-high-resolution hardware.
How it works
When an edge falls between physical pixels, the camera records a gradual brightness transition across several pixels rather than a sharp step. Sub-pixel algorithms model this blur profile mathematically — fitting a curve (typically Gaussian or error function) to the intensity gradient — and calculate the precise fractional pixel position where the true edge lies.
Think of it this way: Imagine trying to measure the exact edge of a fuzzy, blurry shadow using a school ruler. Instead of just picking the grayest part, you use an advanced math formula to calculate exactly where the light ends and the dark begins — giving you a measurement far more precise than the lines on your ruler.
Best for
- High-precision industrial metrology
- Semiconductor pin and lead measurement
- Gear tooth and micro-feature gauging
- Achieving micrometer accuracy with standard camera hardware
- Telecentric lens pairings for calibrated measurement
Key specifications
- Edge position accuracy to 1/10th–1/50th of a pixel
- Gaussian, parabolic, and moment-based interpolation methods
- Works with both area scan and line scan image data
- Processing time typically < 1 ms per edge measurement
- Pairs with calibrated telecentric optics for traceable metrology
Geometric Pattern Matching
Find, locate, and align with invariance
Pattern matching searches a live camera image for a known reference template, locating it regardless of rotation, scale, partial occlusion, or lighting variation. It returns the precise position and orientation of the found object — the foundation of robot guidance and part alignment.
How it works
A reference template (or "model") of the target is created from a training image. The algorithm encodes the geometric features of this template — edge gradients, contour shapes, spatial relationships — into a compact descriptor. At runtime, it searches the full image for regions that best match this descriptor, scoring candidates by geometric similarity rather than raw pixel intensity.
Think of it this way: The machine vision equivalent of playing "Where's Waldo?" — you know exactly what Waldo looks like, and you scan a chaotic, crowded scene to find him, even if he's standing upside down or partially hidden behind a building.
Best for
- Robotic pick-and-place guidance
- Part localization on moving conveyors
- Alignment and registration for multi-step inspection
- Locating features of interest before detailed measurement
- Identifying parts in cluttered or overlapping scenes
Key specifications
- Rotation invariant (0°–360°)
- Scale invariant across defined ranges
- Robust to partial occlusion (up to 40–60%)
- Sub-pixel position and angle accuracy
- Search times typically < 5 ms for single-model matching
Visual Programming (No-Code Interfaces)
Drag-and-drop vision system configuration
Visual programming environments replace complex code with drag-and-drop flowchart interfaces. Operators build inspection sequences by connecting algorithm blocks — Find Edge, Measure Distance, Classify Defect — without writing a single line of C++ or Python.
How it works
The software provides a library of pre-built processing blocks, each representing a specific algorithm or I/O operation. Users drag blocks onto a visual canvas and connect them in sequence — defining the data flow from image acquisition through processing to pass/fail output. Parameters are configured through graphical dialogs rather than code. The underlying engine compiles and optimizes the flowchart for real-time execution.
Think of it this way: Like building a complex machine out of digital LEGO blocks, or drawing a flowchart on a whiteboard. You snap the processing steps together in the order you want them to happen — no programming degree required.
Best for
- Enabling line operators to configure and adjust inspections
- Rapid prototyping and proof-of-concept development
- Applications requiring frequent recipe changes
- Training and maintenance by non-programming staff
- Reducing dependence on specialized software engineers
Key specifications
- Drag-and-drop flowchart-based interfaces
- Pre-built algorithm libraries (edge detection, measurement, OCR, barcode)
- Integrated deep learning training tools in modern platforms
- Real-time execution with hardware-optimized backends
- Platforms: HALCON, VisionPro, MERLIC, NeuroCheck, In-Sight
Choosing the right approach
The best vision software combines multiple techniques — rule-based measurement alongside deep learning classification, all within a visual programming environment. Here are the factors that guide the mix.
If defects are consistent and well-defined, rule-based methods offer speed and auditability. If defects are variable or organic, deep learning adapts where rules cannot.
Deep learning requires labeled examples. If you have few samples, rule-based or anomaly detection approaches may be more practical initially.
Sub-pixel edge detection and calibrated measurement pipelines deliver micrometer-level precision. Classification tasks may tolerate lower spatial accuracy.
Rule-based algorithms and CNNs run in single-digit milliseconds. Vision Transformers and complex deep learning may need GPU acceleration to meet cycle time.
Regulated industries may require deterministic, fully traceable decision logic — favoring rule-based methods over black-box neural networks.
Visual programming platforms empower operators directly. Custom deep learning pipelines require data science expertise. Match the tool to your team.
Related technology
Software is only as good as the image it receives. Explore the hardware that makes it possible.
Ready to solve your vision challenge?
Tell us about your application. Our engineers will evaluate your requirements and recommend the right approach — no obligation.