Spaces:

FatemehT
/

apec-segment

Sleeping

App Files Files Community

apec-segment / segmentation_models_pytorch /docs /insights.rst

FatemehT

style: run pre-commit

8e6512c 2 months ago

raw

history blame

4.31 kB

	🔧 Insights
	===========

	1. Models architecture
	~~~~~~~~~~~~~~~~~~~~~~

	All segmentation models in SMP (this library short name) are made of:

	- encoder (feature extractor, a.k.a backbone)
	- decoder (features fusion block to create segmentation mask)
	- segmentation head (final head to reduce number of channels from decoder and upsample mask to preserve input-output spatial resolution identity)
	- classification head (optional head which build on top of deepest encoder features)


	2. Creating your own encoder
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	Encoder is a "classification model" which extract features from image and pass it to decoder.
	Each encoder should have following attributes and methods and be inherited from `segmentation_models_pytorch.encoders._base.EncoderMixin`

	.. code-block:: python

	class MyEncoder(torch.nn.Module, EncoderMixin):

	def __init__(self, **kwargs):
	super().__init__()

	# A number of channels for each encoder feature tensor, list of integers
	self._out_channels: List[int] = [3, 16, 64, 128, 256, 512]

	# A number of stages in decoder (in other words number of downsampling operations), integer
	# use in in forward pass to reduce number of returning features
	self._depth: int = 5

	# Default number of input channels in first Conv2d layer for encoder (usually 3)
	self._in_channels: int = 3

	# Define encoder modules below
	...

	def forward(self, x: torch.Tensor) -> List[torch.Tensor]:
	"""Produce list of features of different spatial resolutions, each feature is a 4D torch.tensor of
	shape NCHW (features should be sorted in descending order according to spatial resolution, starting
	with resolution same as input `x` tensor).

	Input: `x` with shape (1, 3, 64, 64)
	Output: [f0, f1, f2, f3, f4, f5] - features with corresponding shapes
	[(1, 3, 64, 64), (1, 64, 32, 32), (1, 128, 16, 16), (1, 256, 8, 8),
	(1, 512, 4, 4), (1, 1024, 2, 2)] (C - dim may differ)

	also should support number of features according to specified depth, e.g. if depth = 5,
	number of feature tensors = 6 (one with same resolution as input and 5 downsampled),
	depth = 3 -> number of feature tensors = 4 (one with same resolution as input and 3 downsampled).
	"""

	return [feat1, feat2, feat3, feat4, feat5, feat6]

	When you write your own Encoder class register its build parameters

	.. code-block:: python

	smp.encoders.encoders["my_awesome_encoder"] = {
	"encoder": MyEncoder, # encoder class here
	"pretrained_settings": {
	"imagenet": {
	"mean": [0.485, 0.456, 0.406],
	"std": [0.229, 0.224, 0.225],
	"url": "https://some-url.com/my-model-weights",
	"input_space": "RGB",
	"input_range": [0, 1],
	},
	},
	"params": {
	# init params for encoder if any
	},
	},

	Now you can use your encoder

	.. code-block:: python

	model = smp.Unet(encoder_name="my_awesome_encoder")

	For better understanding see more examples of encoder in smp.encoders module.

	.. note::

	If it works fine, don`t forget to contribute your work and make a PR to SMP 😉

	3. Aux classification output
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	All models support ``aux_params`` parameter, which is default set to ``None``.
	If ``aux_params = None`` than classification auxiliary output is not created, else
	model produce not only ``mask``, but also ``label`` output with shape ``(N, C)``.

	Classification head consist of following layers:

	1. GlobalPooling
	2. Dropout (optional)
	3. Linear
	4. Activation (optional)

	Example:

	.. code-block:: python

	aux_params=dict(
	pooling='avg', # one of 'avg', 'max'
	dropout=0.5, # dropout ratio, default is None
	activation='sigmoid', # activation function, default is None
	classes=4, # define number of output labels
	)

	model = smp.Unet('resnet34', classes=4, aux_params=aux_params)
	mask, label = model(x)

	mask.shape, label.shape
	# (N, 4, H, W), (N, 4)