- Automatisiertes Session-Management
- Wählen Sie eine beliebige Stadt in 195 Ländern
- Unbegrenzte Anzahl gleichzeitiger Sessions
Semantic Segmentation
TLDR: Semantic segmentation labels every pixel in an image with a class category. It gives machines a detailed, pixel-level understanding of a scene.
Semantic segmentation is a task in computer vision. It classifies every pixel in an image into a predefined category — for example: road, car, pedestrian, sky, or building. Unlike object detection, which draws bounding boxes, semantic segmentation produces an exact pixel mask per class. It gives the model a fine-grained understanding of object shape and location. This precision is critical for autonomous driving and medical imaging.
Types of Segmentation
- Semantic Segmentation: All pixels of the same class share one label. Two cars are both labeled ‘car’ — no distinction between individual instances.
- Instance Segmentation: Distinguishes individual objects of the same class. Each car gets a unique ID and mask.
- Panoptic Segmentation: Combines both — all pixels labeled by class, with unique instance IDs for countable objects like cars and people.
Key Model Architectures
- Fully Convolutional Network (FCN): The first end-to-end model for semantic segmentation. Replaces dense layers with convolutional ones for pixel-wise output.
- U-Net: Encoder-decoder with skip connections. Standard architecture for medical image segmentation.
- DeepLab v3+: Uses atrous convolutions and ASPP to capture multi-scale context. State-of-the-art on benchmark datasets.
- Segment Anything Model (SAM): Meta’s foundation model for zero-shot segmentation across any object category.
Applications
- Autonomous Vehicles: Segment road, lane markings, vehicles, and pedestrians in real time. Combined with LiDAR point clouds for 3D scene understanding.
- Medical Imaging: Segment tumors, organs, and tissue in MRI, CT, and pathology scans.
- Satellite Imagery: Map land use, detect deforestation, and monitor infrastructure from aerial images.
- Robotics: Segment workspace surfaces to guide robotic manipulation and safe navigation.
- Augmented Reality: Separate foreground from background for scene overlays and effects.
Training Data for Segmentation Models
Semantic segmentation requires densely annotated images. Every pixel must carry a label — among the most labor-intensive forms of data labeling. A single driving scene can take 90 minutes to annotate pixel-perfectly. Synthetic data from simulations provides free pixel-level ground truth and dramatically lowers annotation cost. Bright Data’s datasets offer large-scale image collections for building training datasets for segmentation at scale.