Amazon Cloud Training

Fundamentals: Amazon Web Services for Science & Engineering

Training Agenda & Course Information

Intended Audience

Researchers, IT management, systems administrators & software developers looking for a practical & solid understanding of current AWS capabilities and how best to leverage them for research, scientific or engineering uses. We cover a broad scope of course topics of interest to scientists, IT operations staff and software developers. We generally see a diverse attendee mix of scientists, IT professionals and software developers in our sessions.

Level of Interaction

Materials are presented in a dynamic lecture format with frequent instructor-led demonstrations, discussions and highlighted examples. Recorded screen-casts may be used on topics that are difficult to orchestrate live. Attendees will have dedicated access to cloud-resident training systems along with example code, scripts and self-paced activity worksheets that can be used for individual exploration and experimentation at any time during the course. It is important to note that we have made a conscious decision to favor the inclusion of more content & topics at the expense of interactive lab exercises which can consume significant amounts of class time. Other courses are currently being prepared that are biased towards hands-on, highly-interactive training labs and exercises.

Level of Difficulty

Familiarity with Linux and shell scripting is expected. Attendees should generally be comfortable using SSH clients and operating from the Linux command line. No programming experience is required but an understanding of software development practices will help when deployment & architecture strategies are discussed. Basic familiarity with Amazon Web Services is expected (see Prerequisites below) to minimize the amount of introductory materials that need to be covered. As a general rule we will use AWS-aware utilities, wrappers, command-line tools and GUIs to show and orchestrate AWS actions rather than directly manipulating the Amazon APIs. Any questions or concerns should be addressed to <>.

Schedule & Enrollment

Public classes are offered several times per year in partnership with Cambridge Healthtech Institute, see for details.

BioTeam also offers private training delivered onsite at client facilities with content customized to meet interests and requirements.


The class is generally taught by two dedicated instructors.

  • Chris Dagdigian <>
  • Adam Kraut <>


Attendees should have wireless-capable laptops with SSH clients. The Mozilla Firefox web browser is recommended for attendees interested in exploring the various AWS-aware browser plugins & extensions. Attendees will be provided with remote access to Linux systems containing the necessary AWS software, utility, library & resource requirements needed for the course.
Attendees with Mac OS X or Linux systems should have SSH & Java 1.5 (JDK or SDK) installed and available if they want to locally install the AWS command-line utilities on their machines. This is not required. Attendees may also wish to have personal accounts set up with Amazon Web Services. This is not required for training as shared credentials belonging to BioTeam will be used for exercises, labs and demonstrations.
Attendees new to Amazon Web Services are encouraged to follow the Amazon self-paced “Getting Started With EC2” tutorial online prior to attending the class. The tutorial can be completed in a short time and presents an excellent introduction to the core EC2 service. Comments or questions can be addressed directly to Chris Dagdigian <>

Day 1 Agenda

Objective: Progress iteratively through the topics essential for building out larger or more production-focused workstreams on the AWS platform. Day One will focus on the basic foundations and will use the Maq assembler as an example use case for building out a more traditional (or ‘legacy’) workflow on Amazon AWS.

I.    Intro & Logistics

II.    AWS Overview
Goal: There are a huge number of AWS service and product offerings. We’ll cover the ones most of interest to people involved in informatics and high performance computing.

III.    Mapping Informatics to the Cloud
Goal: Cover the major environmental, performance and architecture differences between HPC, grid and cluster environments and the AWS cloud environment.

IV.    AWS: Billing & Credential Management
Goal: Briefly cover the logistics and mechanisms behind organizational billing and credential management with focus on the newly announced AWS ‘consolidated billing’ offering.

V.    AWS: EC2 Overview
Goal: Light introduction to Amazon EC2 to cover definitions & capabilities before we start making heavy use of EC2 instances in live demos and recorded screen-casts.

VI.    AWS: Configuration Management
Goal: Configuration management of EC2 AMIs is a major component in deploying cloud applications in a reliable, repeatable and easy to manage process. For this topic, we will be using Chef Server ( to demonstrate configuration management of cloud-based server AMIs.

VII.    AWS: Identity Management
Goal: There are some cases where individual access via SSH keys may not be sufficient (such as with web applications). Topic will be covered with a demonstration of either LDAP server integration or OpenID integration.

VIII.    AWS: Monitoring & Reporting
Goal: Discuss and demonstrate a number of different monitoring & reporting options. Specific focus on Amazon Cloudwatch (AWS product offering), Server Density (commercial solution from, Hyperic HQ Open Source Edition (open source solution from and SyslogNG ( for logfile consolidation.

IV.    Putting it all together: Maq Assembler
Goal: Using the Maq assembler algorithm as our demonstration use-case we will discuss and show several different legacy deployment methods utilizing Amazon Web Services. The “legacy” methods are for supporting existing applications and workstreams that may have been built for HPC clusters and compute farms. Day One will showcase the “legacy” methods while Day Two will showcase a more traditional cloud architecture using current AWS best practices.

V.    Wrap-up & Discussion
Goal: Discuss and review the topics of the day with particular focus on identifying attendee interest in areas that were not covered or were not covered enough. Time is being left open in the “Day Two” schedule to handle inclusion of additional topics or demonstrations.

Day 2 Agenda

Objective: Continue progressing iteratively through the topics essential for building out larger or more production-focused workstreams on the AWS platform. Day Two will focus on continued use of the Maq assembler as our example use case. The focus today will be on architecting solutions using current AWS products and best practices. Due to estimated session size, this training will be lecture, discussion and live demo driven.

I.    Intro & Logistics

II.    S3 Object Storage Overview
Goal: Coverage of the object-based AWS storage service.

III.    EBS Block Storage Overview
Goal: Coverage of the block-based EBS storage service.

IV.    Data Movement
Goal: Data movement in and out of “the cloud” is problematic for data heavy fields like life science informatics. Cover known issues, alternatives such as the Amazon physical ingest/outgest services and where to “draw the line”.

V.    SQS Overview
Goal: Review and demonstrate the AWS SQS service, often a central component of cloud-resident workstreams.

VI.    Additional Topics
Placeholder topic for areas identified during Day One as needing more depth, discussion or demonstrations.

VII.    Putting it all together: Maq assembly revisited
Review the “legacy” Maq solutions shown in Day One and discuss the pros and cons of those approaches. Continue on with discussion of current-day AWS best practices culminating in a revised/revisited Maq demonstration using more traditional cloud workflow methods.

VIII.    Wrap-up & Discussion