Skeletonization

The skeleton service enables you to generate and retrieve skeletons from the server. This saves you the trouble of performing skeletonization routines on your local machine, while also saving you the trouble of understanding many of the details inside of skeletonization. Furthermore, the server offers the ability to generate multiple skeletons simultaneously, in parallel. And lastly, the server caches each skeleton that is generated so that it only ever needs to be generated once. All subsequent requests for the same skeleton will simply return the cached skeleton with notable turn-around-time improvements.

Method¶

The skeletons provided by the CAVEclient.SkeletonClient are generated by pcg_skel, which itself relies upon MeshParty for the underlying skeletonization process. Please see the associated documentation for more information.

Initializing the client¶

The simplest way to initialize the CAVEclient is by merely providing the datastack of interest:

import caveclient as cc

client = cc.CAVEclient(<datastack_name>)

With a CAVEclient built, you can now investigate the current build version of the SkeletonClient:

client.skeleton.get_version()

And you can see a list of available skeleton versions. In most cases you will want to use the highest (most recent) version provided.

client.skeleton.get_versions()

Skeleton output formats and versions¶

Skeletons are available in a combination of output formats versions. Output formats refer to various ways in which skeleton data can be packaged and presented. The two primary formats are:

'dict' (default if unspecified)
'swc' (a Pandas Dataframe)

Skeletons are also available in a variety of numerical versions:

The first skeleton version, probably no longer needed.
An extension of v1 to include radii and compartments, and which is compatible with Neuroglancer.
An alteration of v2 that stores compartments more efficiently as uint8 instead of float32, but which is therefore incompatible witb Neuroglancer.
An extension of v3 to include level-2 ids. Note that the SWC output format does not support or include level-2 ids and therefore a v4 skeleton is identical to a v3 skeleton when the SWC output format is indicated.

Retrieving a skeleton¶

Retrieve a skeleton using get_skeleton(). The simplest usage is:

sk = client.skeleton.get_skeleton(
    <root_id>,
    output_format=<output_format>,
)

where the available output_formats are indicated and described above. If the skeleton doesn't exist in the server cache, it may take 20-60 seconds to generate the skeleton before it is returned. This function will block during that time. Any subsequent retrieval of the same skeleton should go very quickly however.

To specify a nondefault skeleton version:

sk = client.skeleton.get_skeleton(
    <root_id>,
    skeleton_version=<sk_version>,
    output_format=<output_format>,
)

To specify a specific datastack:

sk = client.skeleton.get_skeleton(
    <root_id>,
    datastack_name=<datastack_name>,
    skeleton_version=<sk_version>,
    output_format=<output_format>,
)

The root id refusal list¶

Some root ids are consistently problematic and should not be subject to repeated skeletonization attempts. Examples include enormous objects (like volume-spanning blood vessel networks). To prevent such root ids from continually blocking resources that could be better applied to skeletonizing other root ids, the skeleton service utilizes a refusal list. Any root id added to, and subsequently found in, the refusal list will not be processed. Since a user might find it helpful to understand that certain root ids will not work for this reason, the service provides a function that lets one see the entire refusal list:

get_refusal_list(
    datastack_name=<datastack_name>,
)

This function will return a Pandas Dataframe indicating, on a per-row basis, root ids specific to the requested datastack that currently reside in the refusal list. If you believe a root id ought to be skeletonizable and has been added to the list in error, contact the support team and we will remove it from the refusal list so it can be reattempted.

Peering into the contents of the cache¶

Most end-users shouldn't need to use the following function very much, but to see the contents of the cache for a given root id, set of root ids, root id prefix, or set of prefixes:

get_cache_contents(
    root_id_prefixes=<root_id>,
)

You can also add additional parameters as needed:

get_cache_contents(
    datastack_name=<datastack_name>,
    skeleton_version=<sk_version>,
    root_id_prefixes=<root_id>,
)

The primary parameter, root_id_prefixes, can be a list of root ids:

get_cache_contents(
    root_id_prefixes=[<root_id>, <root_id>, ...],
)

The primary parameter can also be a root id prefix, which will match any associated root ids. Since this could potentially return a large list of results, there is an optional limit parameter so you don't overwhelm the memory of your processing environment, e.g., a Jupyter notebook or some other Python script running on your local machine:

get_cache_contents(
    root_id_prefixes=<root_id_prefix>,
    limit=<limit>,
)

Note that the limit only constraints the size of the return value. The internal operation of the function will still receive the full list when it passes the prefix on to CloudFiles. Consequently, calling this function for a short prefix may block for a long time.

And of course you can also pass in a list of prefixes (or a mixture of full ids and partial prefixes):

get_cache_contents(
    root_id_prefixes=[<root_id_prefix>, <root_id_prefix>, ...],
    limit=<limit>,
)

Querying the presence of a skeleton in the cache¶

The function shown above isn't necessarily the most direct way to simply inquire whether a skeleton exists in the cache for a given root id. For that purpose, the following function is better suited:

skeletons_exist(
    root_ids=<root_id>,
)

Or:

skeletons_exist(
    root_ids=[<root_id>, <root_id>, ...],
)

Note that this function doesn't accept prefixes, as supported by cache_query_contents(). Only full root ides are supported. When querying with as single root id, the return value will be a boolean. When querying with a list of ids, the return value will be a Python dictionary mapping from each id to a boolean.

This function also takes the same optional parameters described above:

skeletons_exist(
    datastack_name=<datastack_name>,
    skeleton_version=<sk_version>,
    root_ids=<root_id>,  # Or [<root_id>, <root_id>, ...],
)

Retrieving multiple skeletons¶

You can retrieve a large set of skeletons in a single function call:

get_bulk_skeletons(
    root_ids=[<root_id>, <root_id>, ...],
)

If any skeletons are not generated yet, the default behavior is to skip those root ids and only return skeletons that are already available. But you can override this default behavior:

get_bulk_skeletons(
    root_ids=[<root_id>, <root_id>, ...],
    generate_missing_skeletons=[False|True],
)

Any root ids for which skeletonization is required will be generated one at a time, at a cost of 20-60 seconds each. Consequently, there is a hard-coded limit of 10, after which all subsequent missing skeletons will not be returned.

By default, skeletons are returned in JSON format. However SWC is also supported, thusly:

get_bulk_skeletons(
    root_ids=[<root_id>, <root_id>, ...],
    output_format=<"json"|"swc">
)

And the usual defaults can be overridden again:

get_bulk_skeletons(
    root_ids=[<root_id>, <root_id>, ...],
    datastack_name=<datastack_name>,
    skeleton_version=<sk_version>,
)

Generating multiple skeletons in parallel¶

get_bulk_skeletons() is not an effective way to produce a large number of skeletons since it operates synchronously, generating one skeleton at a time. In order to generate a large number of skeletons it is better to do so in parallel. The following function dispatches numerous root ids for skeletonization without returning anything immediately. The root ids are then distributed on the server for parallel skeletonization and eventual caching. Once they are in the cache, you can retrieve them. Of course, it can be tricky to know when they are available. That is addressed further below. Here's how to dispatch asynchronous bulk skeletonization:

generate_bulk_skeletons_async(
    root_ids=[<root_id>, <root_id>, ...],
)

And with the usual overrides:

generate_bulk_skeletons_async(
    root_ids=[<root_id>, <root_id>, ...],
    datastack_name=<datastack_name>,
    skeleton_version=<sk_version>,
)

Retrieving asynchronously generated skeletons¶

In order to retrieve asynchronously generated skeletons, it is necessary to poll the cache for the availability of the skeletons and then eventually retrieve them. Here's an example of such a workflow:

# Dispatch multiple asynchronous, parallel skeletonization and caching processes
generate_bulk_skeletons_async(root_ids)

# Repeatedly query the cache for the existence of the skeletons until they are all available
while True:
    skeletons_that_exist = client.skeleton.skeletons_exist(root_ids=root_ids)
    num_skeletons_found = sum(skeletons_that_exist.values())
    if num_skeletons_found == len(root_ids):
        break
    sleep(10)  # Pause for ten seconds and check again

# Retrieve the skeletons (remember, SWC is also offered)
skeletons_json = get_bulk_skeletons(root_ids)