How to Develop Plugins Using the Delphix Virtualization SDK
Delphix recently released the Virtualization SDK, a software development kit uniquely designed for easy data source integration into Delphix. As part of this exciting new announcement, here's a technical deep dive on a development strategy that can help guide developers to build powerful data management plugins for relational, NoSQL, PaaS, and containerized databases.
But first, here are a couple general notes before we dive into the strategy.
Shell Scripts: Currently, only shell scripts (Bash on Unix and PowerShell on Windows) can be executed on a remote host. A network connection cannot be made from the Python code executing on the Delphix Engine to the database, so shell scripts are needed to interact with the database. In each of the steps below, your life will be made much easier by writing the shell scripts needed to implement each operation before writing any Python code. The shell scripts can be developed in isolation, but think about what they output. The Python code will get back stdout and stderr and will need to make sense of it.
Testing: The Python code can be tested with any vanilla Python unit testing framework. We use pytest internally. You will need to mock out any calls to dlpx.virtualization.libs. In order to test remote execution or to run end to end tests, you’ll have to upload the plugin to the Delphix Engine and test it by executing the Delphix end user workflows.
Diving into the Strategy
There are three general categories of plugin operations: discovery, ingestion, and provisioning. Start with hard coding discovery since it can be tricky, then implement ingestion, followed by provisioning. Finally, circle back and implement auto-discovery if desired.
It’ll likely be easiest to start with empty schemas and add to them as needed. As you implement the operations, it will become clear what information is needed. Keep in mind that changes to schemas will require that objects created from the changing schemas be deleted before the new plugin can be uploaded. This does not apply for repositories or auto-discovered source configs. For example, if you added a property to the linked source definition, all sources linked with your plugin will need to be deleted along with all VDBs provisioned from those linked sources.
Discovery consists of repository discovery and source config discovery. With traditional relational databases, a repository typically represents an RDBMS whereas a source config typically represents the database itself. Repository discovery is always automated and implemented by the plugin. Source config discovery can be done either by the plugin or by the end user manually through the UI.
To start, it’s best to return the RepositoryDefinition and SourceConfigDefinition objects with the metadata for your testing environment instead of running any remote operations. Discovery, particularly repository discovery, can quickly get tricky since most RDBMSs don’t offer clean APIs to discover their run status or location.
Here’s an example of what these operations might look like:
@plugin.discovery.repository() def repository_discovery(source_connection): return [RepositoryDefinition(name='PostgreSQL', port=5432, user='postgres')] @plugin.discovery.source_config() def source_config_discovery(source_connection, repository): return [SourceConfigDefinition(name='TestDB']
Data ingestion is done by two operations: linked.pre_snapshot and linked.post_snapshot. Both need to be implemented to ingest data. All data ingestion must be done in linked.pre_snapshot and the snapshot object is returned from linked.post_snapshot. If you are writing a plugin with the STAGED ingestion strategy, linked.mount_specification also needs to be implemented.
This is where it helps to have the shell commands needed to ingest data already written out. They can be added to the plugin’s source directory. For more details, here’s a guide on how best to manage remote scripts.
Once data has been ingested, it can be provisioned. While it’s highly dependent on your data source, provisioning is usually more straight forward than discovery and ingestion. Only virtual.configure and virtual.mount_specification need to be implemented to get provisioning working. Using the mounted data from a snapshot, configure needs to stand up a new, running database. It returns a virtual source object. mount_specification returns a mount specification object.
Source config discovery (again)
If your plugin will only support manual source config discovery, skip this part. Otherwise, implement a more automated version of source config discovery. Most RDBMSs have a query to list the databases they manage. This is likely the best way to implement it.
Repository discovery (again)
Repository discovery is often the most brittle operation. As mentioned above, there is rarely a supported, documented way to discover the repositories on a given host. The approach will be highly dependent on the data source your plugin supports. You usually have to resort to looking for binaries, processes, ports, and/or config files.
While these are the core Delphix operations, there are more user workflows that will not work without implementing additional operations. You can find the workflows and corresponding plugin operations here. You can also continue to extend the plugin to support more configurations and features of the target data source.
Now that you have the first version of your plugin written, here are some questions you might want to think about:
- Which operating systems do you support? Which versions?
- Which data source versions do you support?
- Does your plugin need access to the source database? If so, what permissions does it need?
- How does your plugin ingest data? Does it use replication? Does it ingest from backups? Something else?
- If it ingests from backups, does your plugin care which backup vendor was used?
We are always looking for feedback, suggestions, and interesting use cases. If you any or just need some questions answered, don’t hesitate to post in the Delphix community.