read

SMF is the Solaris/OpenSolaris/Illumos Service Management Facility.
It starts, monitors & restarts your services. It's the Solaris equivalent of doing what init, upstart and systemd do, but very clearly focused on starting processes, keeping them up and stopping them. It has quite a lot of small tools you need to understand at least basically. Due to the time it took me to build my own startscripts I felt like it's my duty to explain the basic concepts.

The basics

Let's start with the bad news - your service defintions are XML. Puh, that might be unfortunate. But bare with me, it's worth learning.
In the service definition you specify:

  • what your service depends on - other services, files, system-initialization-levels (e.g. multi-user, network)
  • the way of starting and stopping your service (as a daemon, in foreground, as a one-off script)
  • optionally your service configuration

Services are identified by their unique fmri, the fault management resource identifier. In the example below this fmri would be svc:/application/foo/my_service. SVC:/ is implicit here and identifies this resource as a service, but there are other fmri-described entities, mainly your hardware. Having a compatible server means that your CPU, RAM, Disks, even complete mainboards are hot-swappable and managed by a Solaris service.

<?xml version="1.0"?>  
<!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">  
<service_bundle type="manifest" name="my_service">

    <service name="application/foo/my_service" type="service" version="1">

        <create_default_instance enabled="false"/>
        <single_instance/>

        <dependency name="network" grouping="require_all" restart_on="error" type="service">
            <service_fmri value="svc:/milestone/network:default"/>
        </dependency>

        <dependency name="filesystem" grouping="require_all" restart_on="error" type="service">
            <service_fmri value="svc:/system/filesystem/local"/>
        </dependency>

        <method_context>
        </method_context>

        <exec_method type="method" name="start" exec="/opt/local/bin/gunicorn -w 4 -b 127.0.0.1:4000 app:app -D" timeout_seconds="60">
            <method_context working_directory='/opt/my_service/'></method_context>
        </exec_method>

        <exec_method type="method" name="stop" exec=":kill" timeout_seconds="60"/>
        <exec_method type="method" name="refresh" exec=":kill -HUP" timeout_seconds="60" />
        <property_group name="startd" type="framework">
            <propval name="duration" type="astring" value="contract"/>
        </property_group>

        <property_group name="application" type="application"></property_group>


        <stability value="Evolving"/>

        <template>
            <common_name>
                <loctext xml:lang="C">
                   My awesome service
                </loctext>
            </common_name>
        </template>
    </service>

</service_bundle>  

This is an examplatory configuration for a gunicorn-powered python application. It won't start without enabling it first via svcadm enable my_service. After that command it will be started, restarted and shut down automatically. As you can see we defined a variable concerning the start, the working directory.

We defined a few things to be a dependency, mainly the network and the local filesystem, but you could also include other services here, identified by there fmri.

As gunicorn supports restarting via SIG-HUP we map the refresh-method to doing just that. The stability value is just for the administrator's reference, as is the common_name.

We start the application under "startd" with the duration set to "contract". The option "child" seems much more useful when you read the docs first, but actually thats misleading. "Child" starts the process as a child (ok) - but it treats all startup-errors as nonexistent. It just keeps restarting the process - which our monitoring can't pick up.

We rely on "svcs -xv" output to alarm us about service problems. The "contract" type works for that. The process has to daemonize (which almost all servers are capable of doing) and its PID will be automatically tracked by the contract system. You can observe it working by running ptree -c.

This is what you need to build a basic service definition and run it. Advanced topics like monitoring and starting multiple child processes of the same process are covered in the extensive documentation provided by Oracle.

Blog Logo

Tim Buchwaldt

Infrastructure Architect. Optimizing rare edge cases since 1990.

https://twitter.com/timbuchwaldt


Published

Image

The Engine Room

thoughts, stories and ideas | from the grandcentrix team

Back to Overview