UNIT IV
PROGRAMMING MODEL
7.4.1 Open Source Grid Middleware Packages
7.4.1.1 Grid Standards and APIs
7.4.1.2 Software Support and
Middleware
7.4.2 The Globus Toolkit Architecture (GT4)
7.4.2.1 The GT4 Library
7.4.2.2 Globus Job Workflow
7.4.2.3 Client-Globus Interactions
Globus Toolkit4 Architecture
Globus Toolkit Programming Model
7.4.1 Open
Source Grid Middleware Packages
§ Many
software, middleware, and programming environments have been developed for grid
computing over past 15 years.
§ Popular
grid middleware packages.
o BOINC –
Berkeley Open Infrastructure for Network Computing.
o UNICORE –
Middleware developed by the German grid computing community.
o Globus
(GT4) A middleware library jointly developed by Argonne National Lab., Univ. of
Chicago, and USC Information Science Institute, funded by DARPA, NSF, and NIH.
o CGSP – ChinaGrid Support Platform is a middleware library
developed by 20 top universities in China.
o Condor-G
Originally developed at the Univ. of Wisconsin for general distributed computing,
and later extended to Condor-G for grid job management.
o Sun Grid
Engine (SGE) Developed by Sun Microsystems for business grid applications.
Applied to private grids and local clusters within enterprises or campuses.
7.4.1.1
Grid Standards and APIs
§ Grid
standards have been developed over the years.
§ Important
organizations involved
o The Open
Grid Forum (formally Global Grid Forum) and
o Object
Management Group
§ Important
standards
o OGSA (Open
Grid Services Architecture
o GLUE for
resource representation,
o SAGA
(Simple API for Grid Applications),
o GSI (Grid
Security Infrastructure),
o OGSI (Open
Grid Service Infrastructure), and
o WSRE (Web
Service Resource Framework).
§ The grid
standards have guided the development of several middleware libraries and API
tools for grid computing.
§ They are
applied in both research grids and production grids today.
o Research
grids tested include the EGEE, France Grilles, D-Grid (German), CNGrid (China), TeraGrid (USA),
etc.
o Production
grids built with the standards include the EGEE, INFN grid (Italian), NorduGrid, Sun Grid, Techila, and
Xgrid.
7.4.1.2
Software Support and Middleware
§ Grid
middleware is specifically designed a layer between hardware and the software.
§ The
middleware products enable
o the sharing
of heterogeneous resources and
o managing virtual organizations created around the grid.
§ Middleware
glues the allocated resources with
specific user applications.
§ Popular
grid middleware tools include the Globus Toolkits (USA), gLight,
UNICORE (German), BOINC (Berkeley), CGSP (China), Condor-G, and Sun Grid
Engine, etc..
7.4.2 The
Globus Toolkit Architecture (GT4)
§ The Globus
Toolkit, started in 1995 with funding from DARPA, is an open middleware library
for the grid computing communities.
§ These open
source software libraries support many operational grids and their applications
on an international basis.
§ The toolkit
addresses common problems and issues related to grid resource discovery,
management, communication, security, fault detection, and portability.
§ The
software itself provides a variety of components and capabilities.
§ The library
includes a rich set of service implementations.
§ The
implemented software
o supports
grid infrastructure management,
o provides
tools for building new web services in Java, C, and Python,
o builds a
powerful standard-based security infrastructure and client APIs (in different
languages), and
o offers comprehensive command-line programs for accessing
various grid services.
§ The Globus
Toolkit was initially motivated by a desire to remove obstacles that prevent
seamless collaboration, and thus sharing of resources and services, in scientific
and engineering applications.
§ The shared
resources can be computers, storage, data, services, networks, science
instruments (e.g., sensors), and so on.
§ The Globus
library version GT4, is conceptually shown in Figure
7.18.
7.4.2.1 The
GT4 Library
§ GT4 offers
the middle-level core services in
grid applications.
§ The high-level services and tools, (such as
MPI, Condor-G, and Nirod/G), are developed by third
parties for general-purpose distributed computing applications.
§ The local services, (such as LSF, TCP,
Linux, and Condor), are at the bottom level and are fundamental tools supplied
by other developers.
§ As a de
facto standard in grid middleware, GT4 is based on industry-standard web
service technologies.
§ Table 7.7
summarizes GT4’s core grid services by module name.
§ Essentially,
these functional modules help users to discover available resources, move data
between sites, manage user credentials, and so on.
§ HTTP-based
GRAM – Globus Resource Allocation Manager – to locate, submit,
monitor, and cancel jobs on Grid computing resources. It provides reliable
operation, stateful monitoring, credential management, and file staging.
§ MDS modules
– Monitory and Discovery Services – Distributed access to structure and state
information.
§ Nexus is
used for collective communications (unicast and multicast).
§ HBM – HeartBeat Monitoring – Monitoring system components of
resource nodes.
§ GridFTP – for internode fast file
transfers.
§ GASS – Global
Access of Secondary Storage – provides a uniform name space (via URLs) and
access mechanisms for files accessed via different protocols and stored in
diverse storage system types (HTTP, FTP, HPSS, DPSS etc.).
§ GSI – Grid
Security Infrastructure – specification for secure communication between
software in a grid
computing environment.
7.4.2.2
Globus Job Workflow
§ Figure 7.19
shows the typical job workflow when using the Globus tools.
§ A typical
job execution sequence proceeds as follows:
1.
The user delegates his credentials to a delegation
service.
2.
The user submits a job request to GRAM with the
delegation identifier as a parameter.
3.
GRAM parses the request, retrieves the user proxy
certificate from the delegation service, and then acts on behalf of the user.
4.
GRAM sends a transfer request to the RFT (Reliable
File Transfer), which applies GridFTP to bring in the
necessary files.
5.
GRAM invokes a local scheduler via a GRAM adapter
6.
The SEG (Scheduler Event Generator) initiates a set of
user jobs.
7.
The local scheduler reports the job state to the SEG.
8.
Once the job is complete, GRAM uses RFT and GridFTP to stage out the resultant files.
9.
The grid monitors the progress of these operations and
sends the user a notification when they succeed, fail, or are delayed.
7.4.2.3
Client-Globus Interactions
§ GT4 service
programs are designed to support user applications as illustrated in Figure
7.20.
§ There are
strong interactions between provider programs and user code.
§ GT4 makes
heavy use of industry-standard web service protocols and mechanisms in service description,
discovery, access, authentication, authorization, and the like.
§ GT4 makes
extensive use of Java, C, and Python to write user code.
§ Web service
mechanisms define specific interfaces for grid computing.
§ Web
services provide flexible, extensible, and widely adopted XML-based interfaces.
§ GT4 provides
a set of infrastructure services for accessing, monitoring, managing, and controlling
access to infrastructure elements.
§ The server
code in the vertical boxes in Figure 7.22 corresponds to 15 grid services that
are in heavy use in the GT4 library.
§ These
demand computational, communication, data, and storage resources and a range of
end-user tools that provide the higher-level capabilities needed in specific
user applications.
§ Wherever
possible, GT4 implements standards to facilitate construction of operable and
reusable user code.
§ Developers
can use these services and libraries to build simple and complex systems
quickly.
§ A
high-security subsystem addresses message protection, authentication, delegation,
and authorization.
§ GT4 has a a set of service implementations
and associated client libraries and provides both web services and non-WS
applications.
§ The
horizontal boxes in the client domain denote custom applications and/or
third-party tools that access GT4 services.
§ The toolkit
programs provide a set of useful infrastructure services.
§ Three
containers are used to host user-developed services written in Java, Python,
and C, respectively.
§ These
containers provide implementations of security, management, discovery, state
management, and other mechanisms frequently required when building services.
§ They extend
open source service hosting environments with support for a range of useful web
service specifications, including WSRF, WS-Notification, and WS-Security.
§ A set of
client libraries allow client programs in Java, C, and Python to invoke
operations on both GT4 and user-developed services.
§ In many cases,
multiple interfaces provide different levels of control:
§ For
example, GridFTP contains,
o simple
command-line client (globusurl-copy)
o control and
data channel libraries for use in programs
o XIO library
for the integration of alternative transports.
§ The use of
uniform abstractions and mechanisms means clients can interact with different
services in similar ways, which facilitates construction of complex,
interoperable systems and encourages code reuse.
Globus
Toolkit4 Architecture:
The Globus Toolkit4 is a collection
of many software components which are divided into the following five
categories.
1.
Security – the
connections should be secured based on the Grid Service Infrastructure (GSI).
2.
Information
services – the information services are also called as Monitoring and
Discovery services (MDS), It comprises a collection of
components to discover and supervise resources in a virtual organization.
3.
Execution
management – it deals with the initiation, monitoring, coordination and
management of executable programs in a GRID.
4.
Data
Management – These components will allow users to manage large sets of
data in virtual organization.
5.
Common
runtime – the Common Runtime components offer a set of fundamental
libraries along with tools which are necessary to construct both web services
and non-web services.