Database Management Systems
According to the leading information technology research firms, Oracle Corporation database is the leading database management system in terms of market share. IBM Corporation DB2 is a close second, followed by Microsoft SQL Server, and SAP-Sybase. All are relational databases; each company also markets additional databases. In most cases, a commercial database will have a FOSS: Free Open Source Free or low cost counterpart: Firebird, MySQL, Ingres, and PostgreSQL. International Data Corporation projections
are that by 2020 the volume of stored data will increase to over 40 times the 2009 level.
Software companies differentiate commercial software products based upon: quality of code, efficient utilization with multiple base operating systems and
web servers, integration with its other software products, availability of development platforms, documentation, and support services. Not all databases will have the identical capacity and featureset; nor will a
database work equally well with other commercial and open source software. The design and underlying architecture of a database will make it well suited and efficient within a range of volume, data management,
transactions, and administrative functions.
||MySQL - FOSS
|IMS Hierarchical Database
||MS SQL Server
|WebLogic Server and Oracle Fusion
||WebSphere Application Server and Rational
||MS IIS, MS Azure, and MS Visual Studio .Net
Oracle Corporation has grown through acquisition and added to its flagship commercial database with open source software: MySQL database, Java language, UNIX-Solaris operating
system, and GlassFish web server. The Oracle database strategy is based upon providing enhancements and value to its customers through its Exadata software and product options for data compression and
performance through both software and hardware. Oracle has two commercial web servers Oracle Application Server and WebLogic Server. WLS: WebLogic Server has been rebranded as Oracle WebLogic Server and is
the preferred server in the Oracle Fusion software and development platform. The long term plan is a unified database, web server, and development platform which drives an integrated software and hardware company.
IBM Corporation's DB2 is the leading database on mainframe hardware and system software; IBM's IMS first generation hierarchical database and Teradata data warehouse provide specialized niche functionality. DB2 is competing directly with the Oracle database on other operating system platforms. Starting with DB2 9.7 for Linux, UNIX, and Windows, IBM supports Oracle and PL/SQL
applications: syntax, locking mechanisms, data types, SQL, procedural language, and client interfaces. IBM provides a minimal cost trial and migration paths to to DB2. DB2 can serve as an engine to the open source MySQL database. DB2 Express-C is available unsupported, free of charge, for unrestricted use in production environments. IBM also uses its established market presence with WAS WebSphere Application Server and
industry leading WebSphere MQ middleware to drive sales of DB2 software.
Microsoft Corporation's strategy for growth is based upon the growth and integration of its MS Windows Azure cloud service, upgrades from its MS Access workstation
database software to MS SQL Server and the popularity of its proprietary programming languages and .NET Framework and development platform. Microsoft support for FoxPro is being phased out. MS SQL
Server is the recommended database on its industry leading MS SharePoint Server enterprise content management system. Windows Azure is a Microsoft PaaS: Platform as a Service for managing individual virtual
machines to customize and control the cloud instance infrastructure. Microsoft supports Azure with both its own platform and third-party, open source languages.
Mobile Database Software
Each commercial software company has a mobile database software strategy which integrates with its enterprise database.
IBM DB2 Everyplace supports high availability, load balancing, and enterprise synchronization for managing data distribution to mobile devices. It integrates with JDBC-compliant and RDBMS data sources: Microsoft, Oracle, Sybase, and Apache Derby. IBM Mobile Database is a IBM DB data server with a small footprint relational database optimized for mobile use. Typically, it is an embedded database accessed through an application. IBM DB2 Everyplace applications can be migrated to IBM Mobile Database.
MS SQL Server Compact 4.0 database has been optimized for use in ASP.NET Web applications. It supports the SQL Server syntax, ADO.NET programming model,
and .NET web applications. The application development support for SQL Server Compact 4.0 is provided in the Professional, Premium, and Ultimate editions of Visual Studio 2010 SP1 and higher.
Oracle Database Lite is a small footprint SQL database installed as integrated software which synchronizes with an Oracle Database server. Oracle Database
Lite consists of Lite Client and Lite Mobile Server. There is a mobile server middleware component which supports scalable data synchronization and centralized management of mobile resources.
Mobile database technology is being used in a variety of applications and device environments, ranging from music databases within MP3 players, to mobile CRM software on smartphones devices.
A flexible comprehensive mobile database will support multiple indexes: B-Trees, R-Trees for mapping data, Patricia Tries, KD-Trees for k-dimensional, and hash tables. The mobile database management system will be required to support the
user requirements. The selection of a central database needs to be integrated with the mobile database for coexistence in a single system while sharing the data.
There also are a number of alternative commercial and open source mobile software databases.
Extending a Database Management System
The criteria for organizational enterprises to assess database software has evolved to five major areas: 1- Usability 2- Performance and scalability 3- Security 4- Back-end operations 5- Return on investment or cost center accounting. When writing an application, a decision has to be made on which relational database management system to utilize. Once that choice has been made, to a large extent this becomes an investment in information technology
infrastructure. Switching to open source software or different commercial database vendors to take advantage of lower prices, an improved technology, or a better partnership is undermined typically by
the legacy code which requires extensive rewrite before it can be reused with another relational database system. The ability to design, model, implement, and
administer a database is used to provide comparative advantage in relation to the cost structure and services offered by a competitor.
The operational requirements which apply to database management systems are far more extensive and multivariate than either the first generation 1970 hierarchical databases or the
relational databases from the mid 1980's. Databases are being used in hybrid combinations with commercial and free open source software specific to the organization's information processing operations. In
addition to the standard issues, there are decisions that have to be made with respect to development platforms and scripting for new applications, integration of open source software, and developing a
capability for mobile computing devices. Alternatively, information technology operations may be outsourced to cloud computing service providers.
Cloud databases are fully automated multitenant services that present a database capability. Cloud solutions allow for application development with a low start up cost and
minimal database administration. There is on demand scalability which mitigates capacity management issues and provides scalability. MongoDB and CouchDB have emerged as niche cloud computing databases. They are competing with the cloud databases
offered by Oracle, IBM, and Microsoft.
The leading commercial software companies are developing cloud computing strategies which incorporate and leverage its database management technologies. They are competing with Amazon Web Services, Google Corporation,
and number of new entrants.
Big Data Technology
IDC forecasts that the worldwide Big Data technology services will grow at approximately a 30% compound annual growth rate with revenues projected to exceed $23.8 billion by 2017.
2 The Big Data market is expanding rapidly as large IT companies and startups compete for customers and market share by providing more opportunities to use Big Data technology to improve operational efficiency and
drive innovation. The major IT vendors increasingly are evolving and offering enterprise database configurations which support Big Data.
Hadoop is a project from the Apache Software Foundation developed for leading cloud based companies such as Google, Yahoo, and Facebook operational requirements required supporting daily
access to huge datasets across distributed servers. Open source Hadoop enables distributed data processing for for big data applications across a large number of servers. The concept and principle of operation is
that distributed, parallel processing will result in redundancy and more efficient performance across clouds to prevent outages. Its increased implementation is being driven by the: 1- Growing number of companies
applications using very large datasets. 2- The availability of clouds containing hundreds or thousands of distributed processors with huge storage capacity.
Hadoop is a generic processing framework designed to execute queries and other batch read operations against extremely large datasets that can be tens or hundreds of terabytes and even petabytes in
size. Hadoop operates on massive datasets by horizontally scaling the processing across very large numbers of servers through MapReduce. Vertical scaling is used for executing on the single most powerful
single server available; however, this is expensive in terms of resources and limiting. MapReduce splits up a problem, sends the sub-problems to different servers, and lets each server solve its
sub-problem in parallel. It then merges all the sub-problem solutions together and writes out the solution into files which be used as inputs into additional MapReduce steps. Hadoop has been useful in environments
where massive server farms are being used to collect the data. Hadoop processes parallel queries as large background batch jobs on the same server farm.
Open source database products do not support operations such as parallel query through horizontal scaling. The leading commercial database products offer capabilities that Hadoop does
not provide: performance optimizations, analytic functions, and declarative features. This provides for complex analysis by non-programmers, enterprise class features for security, auditing,
maximum availability, and disaster recovery.
The Oracle corporation database can coexist and complement Hadoop. The inexpensive cycles of server farms and Hadoop can be used for transforming masses of unstructured
data with low information density into smaller amounts of information dense structured data which is then loaded into Oracle Exadata. Oracle Data Integrator is based on Hive and provides native Hadoop integration. A user interface is provided for creating programs to load data to and from files or
relational data stores. Oracle Data Integrator provides: 1- High performance data integration between Hadoop and a Oracle database. 2- Simplifies Java MapReduce development through
optimized developer productivity. 3- High performance load to the Oracle database using ODI with Oracle Loader for Hadoop. 4- Oracle Data Integrator Enterprise Edition E-LT: Extract Load and
Transform improves performance and reduces data integration costs.
IBM InfoSphere BigInsights Basic software is the IBM distribution of Hadoop which incorporates open source projects, and IBM-add-ons: text analysis engine, development
tool, data exploration, enterprise software integration, platform administration, and runtime performance improvements. There also is a BigInsights Enterprise Edition which includes a
text processing engine and library of annotators for querying and identifying items of interest in documents and messages.
It employs IBM-specific software to further
enhance administration and performance.
The Hadoop team development and documentation effort centered on the Linux platform. Microsoft Win32 is supported as a development platform. However, distributed operation has not
been thoroughly tested on Win32 as a production platform. There currently is minimal Hadoop support in pseudo- and distributed- mode on the Microsoft Windows operating system platform.
A NoSQL database is a non-relational database. The NoSQL databases under development include: Apache Cassandra, MongoDB, Voldemort, Apache HBase, SimpleDB, and BigTable.
Projects are underway for developing SQL interfaces to NoSQL databases.
NoSQL operates on massive datasets by horizontally scaling the processing across very large numbers of servers. This sharding technique essentially
is the same that has has been deployed
for supporting high volume systems using conventional relational databases. Sharding requires that a separate database run on each server and that the data be physically partitioned in order that each
database has its own subset of the data stored on its own local disks. There are tradeoffs associated with sharding; important relational database capabilities will be lost such as for performing joins,
transactions, and schema changes. ACID: atomicity, consistency, isolation, and durability principles can no longer be applied uniformly. In order to join data with sharding, it will be necessary to distribute
queries that span potentially very large numbers of separate databases. There will be additional complexity with distributed queries as compared to queries which join data within a single database. This can result in
additional overhead and degrade performance. As a result, NoSQL databases, typically will not support joins.
Transactions that allow updates for multiple rows to be committed or rolled back together should a failure occur together require distributed transactions in a sharded environment; the
implementation requirement will be a two phase commit protocol. This will result in additional complexity and resource consumption than if all the data is contained within a single database Two phase commit can
be slow, and can compromise availability resulting from failures can lead to indoubt transactions which cause data to become locked and inaccessible until the failure is repaired. NoSQL databases typically
will not support transactions that involve updates to data in multiple tables or multiple rows within a table.