Squonk2 switchover

tdudgeon · 20 July 2023 11:36

The Squonk Computational Notebook is being replaced with the Squonk Data Manager

June 2023

TLDR;

The Squonk Computational Notebook is being phased out and replaced by the Squonk Data Manager. If you want to transfer then the process should be quite simple. The Squonk Computational Notebook will be shut down at the end of July 2023. If you do not choose to migrate your account will be removed.

For more details read on.

What is the Squonk Data Manager?

This is the key component of the Squonk2 set of applications that let you perform computational workflows. It comprises back end REST APIs and a front end browser based web application. These services all run on a scalable Kubernetes architecture.

Why not continue with the Squonk Computational Notebook?

The technology used by the Squonk Computational Notebook was becoming increasingly difficult to support, and we had identified a design flaw that made it difficult to re-use data from one notebook in another, and to effectively share results with other users.
As a result we chose to create a new set of applications that were based on best of breed technologies and designed from the ground up to address the use cases that Squonk users demanded.

What is the business model?

Like the Squonk Computational Notebook the Squonk Data Manager is based on Open Source tools, and is itself Open Source. This means that we can deliver a highly cost effective solution. You don’t pay high licence fees just for the privilege of using the software. In fact, there is an evaluation tier that is free to use, and always will remain free to use. Other tiers are subscription based and provide different levels of service. You will pay for what you use in terms of CPU, memory and storage which will turn out to be very cost effective.

In addition organisations wanting their own, dedicated, private instance can have this at agreed costs.

How are things organised

The Squonk Data Manager environment is organised in a hierarchical manner. The top level is an organisation, below that are units, and below that are projects where your perform your work. Access is controlled at each level allowing you to share information with only the people you want. Projects can be public or private. If public (all projects in the evaluation trier are public), then any Squonk user can view the data in that project. If private, you can restrict access to the data in that project to only those people you want. That access can be as a viewer or as an editor.

Each a project has a bit of disk space where your project’s data resides. This is just like a Linux file system with user and group based access control. Importantly, when you run applications or jobs (see below) in your project those applications or jobs have access to your project’s bit of disk space, allowing you to easily work on your data in a series of operations. This page describes how to manage your environment, and this one provides more information of working with projects.

Different tiers

The evaluation tier mentioned above provides free access, but all data is public. This allows to evaluate and to perform work where keeping data private is not necessary. The bronze, silver and gold subscription tiers allow projects to be make private and provide larger amounts of resource and service level.

Currently we have not implemented a mechanism for charging for these subscription tiers, but that will come in future.

Job and application execution

The main way to work on your data is to run applications or jobs. Applications are potentially long running applications that let you work on your data. The one current example is Jupyter notebooks. You can fire up a Jupyter notebook and use it to work on the files in your project. This provides a high degree of flexibility. See here for how to run a Jupyter Notebook.

Jobs on the other hand are processes that run to perform a specific task. For instance you can run a job that calculates molecular properties for molecules in an input SD-file or file containing SMILES strings. A more complex job would be to execute a workflow that performs virtual screening using docking, in a way that is parallelised and scales out to utilise the available CPU capacity of the cluster. See here for more information on running jobs.

Applications and jobs, whether simple or complex, execute are pods in the Kubernetes cluster providing a high level of isolation and scalability.

Key future features

We have two major features that are in the planning stage. The first is a workflow builder, much like the Squonk Computational Notebook that allows you to build complex workflows out of the available jobs, the second is “Squonk Viz” an application that allows you to perform interactive visualisation of your results.

How do I access it?

If you are already a Squonk Computational Notebook user then you already have access to Squonk Data Manager as an evaluation user. Access it here and log in using the icon in the top right corner.

If you are not already a Squonk user then you can register as an evaluation user on the login page. If doing so please ensure you use a username that contains only alphanumeric characters.

In either case, if you want to access the subscription tiers then please get in touch with us as, as mentioned above, we have not yet implemented the mechanism to charge for subscription tiers.

Programmatic access through the REST APIs is also possible allowing you to use Squonk Data Manager as a job execution engine from your own applications. There is also a Python client to make this easier.

And if you need your own dedicated Squonk Data Manager instance for your organisation’s exclusive use then we can set that up for you.

Feedback and contributing

As the Squonk Data Manager is a new product we very much welcome feedback and suggestions. Also, it is designed as an open system where people can contribute to make it a better system. The most notable way to do this is in providing additional jobs, or even just giving us suggestions for one. For instance we aim to significantly enhance the availability of predictive models and virtual screening tools over the coming months.

How can I get further information and get help?

For support use our Discourse support forum.

A recording of the launch webinar can be found here.

Key starting points

Access Squonk Data Manager
Support forum