The business case
You launched a software application. You installed it on two customer sites. You support the application mostly by SSH to customer server and run Bash commands, or slightly better, Bash scripts.
The product is a hit to the market. You hired 20 support specialists in a customer service department. The dream client came through: an enterprise giving you a fleet of 100 servers to deploy your application on.
More staff, more business, more installations, more incidents, but the same old command driven steps. Problems:
- Non-standard support procedures. Every one takes notes and everyone’s notes are slightly different.
- Information sharing among team members are ad hoc, and at high level.
- Post-mortem discussion is driven by memory and command fragments, instead of evidence end-to-end
If that looks like your organization, chances are you also suffer from some secondary damages over the long term, such as:
- Downtime resolutions rely on the knowledgeable few
- Documentation helps. But it never catches up to the latest version of application
- Lack of auditing of commands during support
I propose an automation scheme to existing support and deployment practice. This automation scheme combines a suite of common technologies, such as Bash, Python, Ansible, OpenSSH and Jenkins. The automation allows the department to, either fully or partially, operationalize the steps in support and deployment, and eventually shift towards agile practice.
Bash, Python and Ansible
Bash script is based on shell command, perfect for running critical system tasks such as volume management. When it turns into a script, it can be cumbersome, especially with complex data structure. Python, as a tool for system administration, is a good complement to that.
Python 2 comes with most Linux distributions, and is also a dependency of other built-in tools such as YUM. Python3 can be installed easily from default YUM repositories. Both Python2 and Python3 can exist on the same operating system, although new module development are now shifted to Python3. Python’s syntax is very simple and offers object-oriented programming ability. Moreover, there is an entire open-source community behind Python, which offers modules in every aspect of IT (for example, Datastax has a driver module for connecting to Cassandra). Those modules are installed with PIP tool, or PIP3 for python3.
Both bash and Python executes on local machine. To run them on remote servers over SSH. You want to have a list of target hosts, and specify which one to execute the script against. This is where Ansible comes in handy. Ansible is superior in the following aspects:
- Free and open-source, with commercial alternative (Towers);
- Inventory management (inventory);
- Desire state engine (roles)
Ansible is built on Python and is agent-less. Connectivity to remote host is done via secure shell so it can take advantage of existing SSH configurations. Job execution on the target machine is done through Python. With Python you can also develop custom module in Ansible. For some use cases in customer support with Ansible, refer to my two previous postings about Ansible at scale.
Jenkins
The tools above forms a package for automation. The issue is that all of them are command-line based. Any task that requires Ansible requires the IT professional craft up long command, such as running playbook, executing a role, or ad-hoc command. This is inconvenient when a task needs to be done during an incident. Such tasks also require trained professional with the relevant skills.
These tasks can be stored in, or initiated by Jenkins. Although Jenkins is well known for build automation in continuous integration, it is automation engine for any command-line based IT tasks. The button to start such tasks in Jenkins UI is called “Build”, which is also a misnomer that underplays Jenkins’ versatility: building application from source code is just one of the many IT tasks that involves multiple long running commands. In this and next article we introduce Jenkins as an engine for deployment automation.
The infrastructure architecture is diagramed as above, and with the connection across Internet, the target hosts must be hardened properly in the following aspects:
- Connectivity to remote host is via SSH chaining, through an SSH proxy;
- Root login must be disabled for remote session or by password;
- Service user may be shared, but must be authenticated by individual RSA key pair;
- Service user connected remotely needs to escalate privilege by su if needed;
I want to make a theoretical distinction between our topic here and continuous deployment. We simply focus on the technical side of deployment automation. Essentially automating a few bash scripts. On the other hand, a continuous deployment process is an extension to an existing continuous integration pipeline, with the vision to streamline the process end-to-end from code commit to production rollout. Implementing CI/CD pipelines should be approached as an organizational program rather than an individual technical initiative. Here is a good technical overview on CI/CD pipeline with Jenkins and Ansible.
Security
The security mechanism of this system is based on OpenSSH because the connectivity between servers are through SSH chaining. RSA key authentication must be used in order to encrypt traffic with password-less login. Connection to an SSH host can be done through a proxy server. Below is an example of SSH configuration:
Include customer1.config
Include customer2.config
Host *
IdentityFile ~/.ssh/id_rsa
ServerAliveInterval 60
ServerAliveCountMax 3
Compression yes
ControlPersist 3h
ControlPath ~/.ssh/sockets/%r@%h-%p
Host gateway
Hostname support.digihunch.com
User jdoe
Port 2223
Host customer-server-0
Hostname 192.168.201.12
User support
ProxyCommand ssh -W %h:%p gateway
Open SSH configuration file (~/.ssh/config) needs to be configured properly with useful host names and aliases. To prevent the config files from growing too long, include statement can be used to reference other configuration file (available with OpenSSH v7.3 sp1 and up).
The host names (as well as aliases) listed in SSH configuration can be directly referenced in Ansible inventory, allowing Ansible (and Jenkins) to reference site by alias and connect to target host through proxy.
Plugins
Jenkins has a community that develops a variety of plugins, which makes Jenkins the most powerful automation platform. Here are some examples of useful plugins:
- Audit Trail: output job execution history to file or Elasticsearch;
- Credentials: stores credentials in Jenkins;
- Pipeline: build declarative (new) or scripted (old) pipeline for Jenkins jobs;
- Simple Theme: just a theme but allows console output to be dark (using CSS);
- Job Configuration History: job configuration audit;
- Mask password: mask variables (including password) from console output
- Ansible: invokes ad-hoc commands and playbooks
- SSH agent, SSH pipeline steps, SSH credentials: features related in SSH in Jenkins pipelines.
- Purge Job History: purge all of build history, or purge by time and number of old builds.
- Parameterized Scheduler: schedule to run a job and provide parameter
- Workspace cleanup: clean up workspace when invoked.
In the next article, we will go over some common job configurations.