UPnP Crash Course

UPnP is a set of protocols used to provide services across a network, designed so that no manual configuration is necessary, because the clients can discover the servers they need.

An UPnP server is called a device. Devices are primarily identified with a unique identifier (UDN/UUID). They also have a user-friendly name used for display to humans, but which is not necessarily unique.

Devices have types (Media Server, Media Renderer), but these are not very useful. Actually, devices are just a way to group a number of services under a given UDN, and list them through an XML document named the device description.

UPnP services are the really interesting entities. The group of provided services is what really makes a device what it is (UPnP is duck-typed?). For example there are two families of services which provide services for rendering music: UPnP-AV and OpenHome. UPnP-AV renderer devices are supposedly typed as 'MediaRenderer' and OpenHome ones as 'Source'. This will not prevent, for example, a Control Point seeing a 'MediaRenderer' also implementing the set of OpenHome services to use it as an OpenHome device.

So, a device is just a UUID, and a description. The TCP endpoint association with the device is very loose: it can change with a restart, and multiple devices can share one.

A UPnP service is defined by an XML document which describes the set of 'State Variables' and 'Actions' which it provides. An Action is a remote callable which either performs changes on the device or returns state values. Additionally, Events, which are asynchronous callbacks from the service to the client, provide information about autonomous State Variable changes.

All Actions and Events are provided through HTTP connections (SOAP), which means that both the client and the server run an HTTP server. On the device side, the port used by the server is communicated by to client during the initial discovery phase (client broadcasts "who is there", all devices respond, and clients choses which it wants to talk to). The client port to use for event connections from the Device is sent to the Device through a 'Subscription Request'.

UPnP and media services

There are two families of UPnP services implementing multimedia function: UPnP-AV and OpenHome. UPnP-AV is part of the standard. OpenHome is a later addition from the Linn company.

The overall structure of both frameworks is similar. There are three roles:

  • The Media Server device presents a catalog of media object and their attributes.

  • The Media Renderer devices actually plays or displays.

  • The Control Point has the user interface. It retrieves data from the Content Directory, accepts user instructions and translates them to the Media Renderer for action.

Once it is instructed what to play, the Media Renderer fetches the needed data directly from where it is stored. The Control Point only has a role of status display (and switching to the next track in the playlist in the case of UPnP-AV). This is true in all cases, but somewhat muddied by the fact that some Control Point implementation may coexist with a hidden Media Server implementation, e.g. for playing local data without needing to set up a separate Media Server.

Media Server services.

A Media Server device presents audio data to Control Points. Its main service is the Content Directory service which manages the catalog (it also has a mandatory Connection Manager service, which is not of much use).

Most Media Server implementations also have the capability to serve the media data (through HTTP or something else). This is not mandatory in any way. The end result of catalog traversal is an URL which can point anywhere. It is just the case that, in practise, it often points to an HTTP server inside the Media Server implementation.

The Content Directory service

The Content Directory service implements a mandatory action: Browse, and an optional one: Search. The exact functionality of the Search action can be retrieved through another action: GetSearchCapabilities.

The contents of the directory are presented much like a traditional file system, with Containers (directories), and Items (leaf objects, files). All objects are primarily identified by Object IDs, not names/titles, which are just there for presentation to the user. The root of the content hierarchy has a well-known ID (the "0" string). The form of all other IDs are up to the Content Directory implementation: object IDs just need to be unique, they can have a hierarchical structure matching the tree or not at all. For example a Minimserver Object ID may look something like 0$folders$f1982$f2153$f2155$*i12024 while one from MiniDLNA could just be a number.

Each object in the hierarchy also has a ParentID, which allows reverse traversal.

Note
The lack of enforced consistency between the Object IDs and the visible structure of the tree (name hierarchy), and the absence of guaranty of stability across rebuilds means that it’s difficult to build client-side playlists of UPnP items.

The Browse action can either list the contents of a container, or return the attributes of a given object, depending on the value of a flag. In practise, only the "list" operation is commonly used.

The Search action returns the results in the form of a container listing.

Containers contain Items (leaf objects) and other containers. Both are represented by XML fragments with a schema named DIDL-Lite. Example element names are "dc:title" or "upnp:author".

Most object properties are described as XML elements.

Items which actually describe a media object contain one or several resource (<res>) elements. The data in a resource element is the URL from which the media can be retrieved. Element attributes are used to describe the characteristics of the content: format, bitrate, etc.

Having several resource elements (with different formats/bitrates) allows the Control Point to choose the right one for the Renderer device.

Both 'Browse' and 'Search' return XML data describing one or several items and containers.

Example of an item element, with only one resource in this case. Some values are elided for compactness

<item id="0$folders$f57..." parentID="0$folders..." restricted="1">
  <dc:title>Rolling In The Deep</dc:title>
  <upnp:genre>Pop</upnp:genre>
  <dc:date>2011-01-01</dc:date>
  <upnp:album>21</upnp:album>
  <upnp:artist role="AlbumArtist">Adele</upnp:artist>
  <upnp:artist>Adele</upnp:artist>
  <dc:creator>Adele</dc:creator>
  <upnp:originalTrackNumber>1</upnp:originalTrackNumber>
  <upnp:albumArtURI dlna:profileID="PNG_MED">
  http://192.168.4.4:9790/minimserver/.../picture-...png</upnp:albumArtURI>
  <res duration="0:03:49.331" size="9515008" bitrate="40000"
       sampleFrequency="44100" nrAudioChannels="2"
       protocolInfo="http-get:*:audio/mpeg:DLNA.ORG_PN...">
  http://192.168.4.4:9790/minimserver/...Deep.mp3</res>
  <upnp:class>object.item.audioItem.musicTrack</upnp:class>
</item>

See Section "2.8 Theory of operation" in the following document for illustrative examples: http://upnp.org/specs/av/UPnP-av-ContentDirectory-v1-Service.pdf

Upmpdcli Media Server implementation

Upmpdcli implements both Media Renderer and Media Server UPnP devices.

Until recently the venerable libupnp library on which it is based (through the libupnpp C++ layer) could only support one UPnP device per instance (process).

In consequence upmpdcli mostly implements the Media Server and the Media Renderer in separate processes (the main process which reads the configuration forks the Media Server process if it is needed).

Note
UPnP also has the concept of an embedded device (which libupnp supports), and upmpdcli can also run the Media Server as an embedded device inside the root Media Renderer, then needing only one process. This has a tendancy to confuse control points, so it is not done by default.

Recent libupnp code supports multiple root devices, so it is also now possible to run a single process with two root devices, but this is still not the default.

The Upmpdcli Media Server

A Media Server instance is created by the main upmpdcli process if a call to ContentDirectory::mediaServerNeeded() returns true, which occurs if the root directory of the Content Directory would not be empty (see further for how this is determined).

Depending on the command line option, the Media Server can be as a separate device/process (the default, which also allows to run a Media Server only, with no Renderer device), or as an embedded device of the main Media Renderer device. Very few Control Points can handle the latter approach, and the code was kept mostly because we can.

A recent version of libupnp, newly supporting multiple root devices, has enabled the implementation of a new option for running the Media Renderer and Media Server in the same process. This is still not the default.

The process set up is done in the src/main.cxx file.

ContentDirectory.cxx

This file contains the libupnpp callbacks, and the root directory creation code, together with plugin activation and dispatch code.

Root directory creation

It was originally thought that ContentDirectory.cxx would see several different provider modules, so that it would be natural that it builds the root directory (with top entries for each of the modules).

What happened finally is that there is only one provider module, plgwithslave.cxx, so that the root directory creation code, which has knowledge of its internals should probably be moved to plgwithslave.cxx: if other modules were added, we’d need an interface so that they each can provide their entries for the root dir. For now, things can stay this way, but ContentDirectory.cxx knows a bit too much about plgwithslave internals.

ContentDirectory::makerootdir(), which should be in plgwithslave, uses the plgwithslave plugin names to create root entries.

The plugin names are currently used in quite a few places. For a module named 'plgname', 'plgname' is used as:

  • The name of a subdirectory under cdplugins, used to enumerate the plugins on startup.

  • The name of the main plugin Python module, named cdplugins/plgname/plgname-app.py

  • An element of the object IDs belonging to this module (all beginning with 0$plgname$)

  • An element of the resource URIs generated by the module (all beginning with /plgname).

On startup makerootdir() walks the cdplugins directory (skipping pycommon). For each subdirectory, it decides that the module is configured if there is a variable named 'plgnameuser' in the upmpdcli configuration file (so even plugins without users should have such a variable).

If the plugin is configured, makerootdir() creates an entry named Plgname (capitalized), with objid 0$plgname$

Plugin activation and dispatch

When the browse method is called for an object Id beginning with 0$plgname$, the code looks up (Internal::pluginforapp(), and possibly creates (Internal::pluginfactory()) the plugin. Currently, this is always a PlgWithSlave object, created with 2 parameters: the service name (which it will use to find and exec the python module), and a service interface which it can use to request information from the ContentDirectory.

The knowledge that all of a plugin’s object IDs must begin with 0$plgname$ is spread in multiple places in the C++ and Python code.

ContentDirectory service interface

ContentDirectory provides a number of service methods to its providers. They are defined in cdplugins/cdplugin.hxx:CDPluginServices, an instance of which is passed to the plugin constructor. See the file for more details.

Notes

At the moment, the interface between ContentDirectory and the plgwithslave plugin is a bit of a mess:

  • There is a lack of interfaces which would be needed if other modules than plgwithslaves were added.

  • There is too much knowledge of the plgwithslave internals in ContentDirectory.

  • Both problems are linked.

  • Probably plgwithslave should provide the root entries, and ContentDirectory would just keep a mapping of object id to plugin.

But, things are ok, even if a bit difficult to understand (hence this doc), as long as all providers are modules under plgwithslave.

libupnp addr and port: these are retrieved in the ContentDirectory init, and the plugins get it through service interface calls (getupnpaddr/port()) The host part is then communicated to the subprocesses through the environment (UPMPD_HTTPHOSTPORT), but the port part is not used: the microhttpd port is used instead. The port part would have been used if we had been using the libupnp miniserver, which we don’t, because miniserver can’t do redirects.

The uprcl local media server plugin does not use UPMPD_HTTPHOSTPORT at all, because it uses its own HTTP server (started on uprclhostport)

Audio data access

The actual media data access can be obtained in three different ways:

  • Qobuz, Tidal and Deezer all work the same way. The directory data contains numeric track identifiers (trackids). These trackids can be translated to a short-lived URL using a specific service call which particularly checks the authorizations. The URLs allow normal HTTP access to unencrypted data, but they expire after a relatively short time, so they can’t be stored in, e.g. an OpenHome playlist. Instead of using them, upmpdcli generates permanent trackid-based URLs pointing to its own microhttpd server. The translation to a real media URL is performed when the Renderer tries to fetch the data. It is then redirected to the short-lived URL, and upmpdcli is not involved for the rest of the data transfer.

  • Spotify uses permanent URLs which point to encrypted data. Instead of redirecting, upmpdcli implements a proxy, reading the encoded data, and decoding it before sending it to the renderer. The HTTP interaction is handled by the embedded microhttpd server.

  • Uprcl, the local Media Server, uses a Python HTTP server to send the data, the C++ code is not used for the transfer.

HTTP server

The resource URLs are initially generated by the plugins, and possibly transformed when they are actually requested.

There are 3 possibilities for the actual HTTP server:

  • The libupnp miniserver (unused at this time).

  • An internal microhttpd instance.

  • An external server.

In all current cases, the miniserver port value which the plgwithslave constructor receives and stores in upnpport is not used, as the miniserver itself.

Tidal, Qobuz, Deezer

The Tidal, Qobuz, and Deezer modules currently work in the same way:

  • They use the port supplied from plgwithslave which is the one on which the microhttpd instance is listening. The URIs are like http://host:port/someplugin/xxx.mpr?trackId=tid

  • The microhttpd request method only does redirects. When it gets a connection, it parses the above url, passes it to the appropriate plugin (as per the path), which provides the actual streaming service URL, to which the client is redirected. An hypothetical client which would not handle redirects would be out of luck.

  • No configuration data is required (the microhttpd port could be configured with plgmicrohttpport, default 49149).

Spotify

The Spotify plugin is similar to the three above, but the microhttpd instance actually proxies the data instead of doing a redirect, because Spotify data is encrypted and needs to be decoded before sending to the Renderer.

Uprcl (local media server)

The uprcl plugin uses a separate http service, which began as a completely external server (e.g. an Apache server), but is now more conveniently the embedded Python server. The URLS which are generated in the directory resource records point directly to the HTTP server host/port, there is no redirection through the microhttpd server (this is the big difference with the streaming service plugins.

The paths are in three parts: - The plugin name as prefix - A mapping between Recoll (actual) paths and paths as seen by the http server (document_root equivalent) - The real tail

The translation routine is now part of the Python request handler, but would need to be restored if we went back to using an external server.

The following configuration data is used: The host and port for the HTTP server. This is used in generated URLs and as parameter to start the Python server: uprclhost = 192.168.4.4:8080 Paths translation between the Recoll file paths (as used for topdirs) and the paths relative to the HTTP server document root, e.g.: uprclpaths = /y/av/mp3:/mp3 The path map was really useful with Apache, much less so with the internal server, but has been kept as a way to define the file system area allowed for access: any path not in the map will be rejected.

In the first implementation, the external HTTP server had to be separately configured to serve the media directories. The internal server uses the config data.

The complicated travels of a metadata bundle

All streaming services present a WEB API where HTTP POST operations send selection and authorization parameters and retrieve metadata in JSON form.

Happily enough the different services copied a lot on each other (or from the same original), so that the JSON data is similar in its structure, even if the details differ. There is of course no standard.

In all cases, the native data is translated into the Python 'models' defined in 'mediaserver/cdplugins/pycommon/upmplgmodels.py'. This module was initially imported from the Thomas Amland implementation of the Tidal API.

All the top-level plugin modules ('modulename'-app.py) were derived from the Kodi Tidal add-on by the same author. There is a bit of strangeness in them, due to their Kodi lignage. For example, the Models thing was kept to preserve the original code from the Kodi module, but it is really an unnecessary step in the current pipeline.

The service interface modules themselves (the pieces which actually implement the POST operations) were initially imported from diverse implementations (see the comments).

For all modules, the data originates from the native interfaces, and is then massaged into the Python objects. For Qobuz, Spotify, Deezer, this is done in a module named 'plugname'/session.py. For Tidal, we use the tidalapi package practically unmodified, so this is implemented under the 'tidal/api' subdirectory.

The Python code communicates with the father process using a primitive RPC protocol, which originated in Recoll. This is a very simple thing where the parent sends a bunch of attributes, and receives another bunch in response. This is implemented in the pycommon/cmdtalk.py module, with a small additional layer in cmdtalkplugin.py implementing dispatching from the parent parameter bunch to a Python method call.

All modules return directory data to the server as Python dicts encoded into JSON for serialization. The Models to dict translation is done inside 'pycommon/upmplgutils.py' (see 'trackentries', 'direntry'). The dictionary keys are mostly DIDL tags, so the data is quite in structure to UPnP data.

Once inside the parent process C++ code, the data is translated to the common upmpdcli media metadata object representation (UpSong), inside 'plgwithslave.cpp', then finally into UPnP DIDL data in 'mediaserver.cpp' before being sent to UPnP.

Service (JSON) → Service API Proxy (Model) → Common Python (dict) → cmdtalk (JSON) → plgwithslave.cpp (UpSong) → mediaserver.cpp (DIDL) → UPnP

All in all the data is translated 5 times…​