( PDF ) AJAX: An Adaptive Join Algorithm for Extreme Restrictions

Download PDF

ads:

FUNDAÇÃO EDSON QUEIROZ

UNIVERSIDADE DE FORTALEZA-UNIFOR

Eriko Werbet de Oliveira Araújo

AJAX: An Adaptive Join Algorithm for Extreme

Restrictions

Fortaleza

2008

ads:

Livros Grátis

http://www.livrosgratis.com.br

Milhares de livros grátis para download.

FUNDAÇÃO EDSON QUEIROZ

UNIVERSIDADE DE FORTALEZA-UNIFOR

Eriko Werbet de Oliveira Araújo

AJAX: An Adaptive Join Algorithm for Extreme

Restrictions

Advisor: Angelo Roncalli Alencar Brayner, Dr-Ing.

Fortaleza

2008

A DISSERTATION SUBMITTED TO THE SCIENCE

AND

TECHNOLOGY CENTER OF THE UNIVERSITY

FORTALEZA IN PARTIAL FULFILLMENT OF THE

REQUIREMENTS FOR THE DEGREE OF

ASTER OF

APPLIED INFORMATICS.

ads:

iii

AJAX: Adaptive Join Algorithm for Extreme

Restrictions

Examination Board

Angelo Roncalli de Alencar Brayner, Dr-Ing.

(

Advisor)

Pedro Porfirio Muniz Farias, D.Sc.

Javam de Castro Machado, Docteur.

ARAÚJO, ERIKO WERBET DE OLIVEIRA

AJAX: Adaptive Join Algorithm for Extreme Restrictions

[Fortaleza] 2008

xix, 50 p., 29,7cm (INFORMÁTICA/UNIFOR, Bancos de Dados, 2008)

Dissertação – Universidade de Fortaleza, Informática

1. Bancos de Dados

2. Computação Móvel

3. Processamento Adaptativo de Consultas

4. Operadores Adaptativos de Consulta

5. Algoritmos e Estruturas de Dados

I. INFORMÁTICA/UNIFOR II. TÍTULO (série)

Every man is the architect of his own fortune.

Sallust (86 BC - 34 BC)

ACKNOWLEDGEMENTS

I would like to thank my advisor for his commitment and patience in the writing

of this work, and I would like to salute my family and friends who have

consistently helped me surpass my dark hours.

vii

ABSTRACT

In a mobile ad hoc network (MANET), the nodes represent mobile

computers in which database systems may reside. Such mobile computers

(nodes) are free to move arbitrarily. In such an environment, we may have a

collection of autonomous, distributed, heterogeneous and mobile databases

(denoted Mobile Database Community or MDBC). Thus, each database user (in

a mobile host) can access databases belonging to an MDBC through the

MANET. Traditional query processing techniques fail to support the access to

databases in an MDBC, since data delivery rate in such an environment

becomes unpredictable and mobile hosts may suffer from limited available main

memory to process some query operators (e.g., join). To react to those events,

this dissertation presents an adaptive join operator, called AJAX, for processing

join operations on mobile databases in an MDBC. AJAX ensures: (i)

incremental production of results as soon as the data become available; (ii)

progress of the query processing even when the delivery of data is blocked,

and; (iii) reaction to situations of memory limitation on the execution of operator.

Cost estimation and experimental results are presented to evidence that AJAX

is an effective solution for processing join operation on mobile databases.

viii

RESUMO

Uma Comunidade de Bancos de Dados Móveis (MDBC) representa uma

coleção de bancos de dados móveis e autônomos. Em tal contexto, cada

usuário de um sistema de banco de dados participante da comunidade pode

acessar os outros bancos de dados através de uma infra-estrutura de

comunicação sem fio. A utilização de técnicas tradicionais para processamento

de consultas no acesso à banco de dados em uma MDBC tem se mostrado

ineficiente devido à imprevisibilidade na taxa de acesso a bancos de dados da

comunidade e a limitação de memória disponível nos dispositivos móveis para

a execução de certos operadores, como a junção, por exemplo. A

imprevisibilidade é provocada por quedas constantes na comunicação entre os

membros de uma MDBC (cenário comum em redes sem fio) e limitação de

processamento de certas unidades móveis, atrasando a entrega de tuplas.

Para reagir a estes eventos, é apresentado neste trabalho um operador de

junção para processamento de consultas em bancos de dados móveis,

denominado AJAX. O operador proposto garante as seguintes propriedades: (i)

produção incremental de resultados à medida que os dados são

disponibilizados; (ii) continuidade no processamento da consulta mesmo que a

entrega dos dados esteja bloqueada, e; (iii) reação a situações de limitação de

memória durante a execução do operador.

Table of Contents

Table of Contents ............................................................................................. 9

1. Introduction ........................................................................................... 10

1.1. Motivation ............................................................................................ 10

1.2. Goals and Scope ................................................................................. 12

1.3. Structure .............................................................................................. 13

2. Database Systems and Mobility .......................................................... 14

2.1. Introduction ......................................................................................... 14

2.2. The Mobile Computing Environment ................................................... 14

2.3. Ad hoc networks .................................................................................. 17

2.4. The Introduction of Mobility on Databases .......................................... 18

2.5. Query Processing on Mobile Databases ............................................. 18

3. Adaptive Join Operators ...................................................................... 21

3.1. Introduction ......................................................................................... 21

3.2. XJoin ................................................................................................... 22

3.3. Ripple Join .......................................................................................... 23

3.4. Double Pipeline Hash Join (Tukwila Implementation) ......................... 24

3.5. MobiJoin .............................................................................................. 25

3.6. Adaptive Symmetric Hash Join ........................................................... 26

3.7. Hash-Merge Join ................................................................................. 27

4. AJAX – Adaptive Join Algorithm for Extreme Restrictions .............. 28

4.1. Introduction ......................................................................................... 28

4.2. First Phase .......................................................................................... 29

4.3. Probing ................................................................................................ 31

4.3.1. Reacting to Memory Overflow ....................................................... 33

4.4. Second Phase ..................................................................................... 34

4.5. Cost Estimation ................................................................................... 36

5. Experimental Results and Analysis .................................................... 39

5.1. Introduction ......................................................................................... 39

5.2. Standalone Tests ................................................................................ 39

5.3. Comparative Tests .............................................................................. 41

5.4. Analysis of the Related Work .............................................................. 44

6. Conclusions .......................................................................................... 48

References ...................................................................................................... 51

1. Introduction

1.1. Motivation

Advances in portable computing devices and wireless network technology have

led to the development of a new computing paradigm, called mobile computing.

This new paradigm supports that users carrying portable devices are able to

access services provided through a wireless communication infrastructure,

regardless of their physical location or movement patterns.

The database technology has been impacted by the mobile computing

paradigm. For example, database systems may reside on mobile computers

which are nodes of a mobile ad hoc network (MANET) [48]. Since a MANET

has a topology which may change randomly and rapidly at unpredictable time, a

dynamic collection of autonomous, heterogeneous and mobile databases

(denoted Mobile Database Community or MDBC) can be opportunistically

formed [2]. An MDBC models the notion of multidatabases whose members

(local databases) can move across different locations.

Sharing information among multiple autonomous, distributed and mobile

databases in MDBCs has emerged as a strategic requirement for several

applications. For example, in battlefields, we may have several units of a

coalition force (from different countries) moving around different geographical

regions, but having a common goal. The commander of each unit has access to

local database systems (residing in a mobile computer installed in a military

vehicle) containing information about his/her own unit and some information

about the enemy forces. Thus, the databases of the multiple units of the

coalition force are distributed, autonomous and mobile. Furthermore, those

databases might be heterogeneous as well. In order to take tactical decisions,

each commander may have to access the databases of others units of the

coalition to get information about them (their strength w.r.t. type and number of

weapons and number of components, supplies and resources as well).

Besides addressing the problem of integrating heterogeneous databases,

sharing mobile databases in an MDBC requires to address the problem of

processing queries over a variable number of mobile databases as well. This is

because query engines operate in unpredictable and changeable context when

processing queries on mobile databases. In other words, traditional distributed

query processing techniques and operators are inadequate to process queries

on mobile databases in MDBCs due to the following factors:

(i) Unpredictable data-arrival rates. Query processing in MDBCs may

suffer from unpredictable data delivery rates, which can be caused by

initial delays and bursty data arrivals. In turn, initial delays and bursty

data arrivals are provoked by transiently disconnections from the

network. Thus, even if the query engine has chosen the most efficient

execution plan, low data delivery rates makes that plan inefficient

during the query execution;

(ii) Limited memory capacity. Mobile hosts may present limited available

main memory to process operations, such as joins between two

relations (recall that mobile hosts may have severe hardware

restrictions due to their size). Such a limitation can cause memory

overflow events which may induce to a high disk-access rate,

provoking a prohibitive cost to execute those operators;

(iii) Use of stop-and-go operators. Traditional query processing

techniques implement relational operators in a stop-and-go way, since

they are implemented for having more than one phase, executed

sequentially (for example, the build and probe phases of the

conventional hash join). Hence, delay on initial stages (which may

frequently occur in MDBCs) implies performance degradation on the

execution of such operators;

(iv) Absence of statistics. In MDBCs, statistics about mobile databases

are highly likely to be unreliable or inexistent, since mobile databases

are autonomous.

For that reason, new query operators have been introduced for processing

queries in MDBCs. Those operators are adaptive in the sense that they should

be able to react to events which may considerably increase response time when

accessing mobile databases.

1.2. Goals and Scope

This work proposes an adaptive join algorithm called AJAX. The algorithm

implements the relational algebra join operator, using hashing and pipelining

techniques to decrement the overall response time in a query involving join

operations [8, 83]. AJAX is the acronym of Adaptive Join Algorithm for eXtreme

restrictions. We claim that AJAX is an adaptive query operator [9], since it is

able to react to events which may considerably increase response time when

executing queries over mobile databases. AJAX ensures the following

properties:

(v) Incremental production of results as soon as the data become

available. Conventional join operators just produce results when at

least one involved relation is completely evaluated. Joins executed

over distributed data may take a long time until users begin to receive

the result from the join operation. AJAX uses a pipeline technique in

order to incrementally provide results (to users or to another query

operator) as soon as they become available. By doing this, it is

possible, for instance, to users (observing the progress in the query

processing) to decide to stop the execution on-the-fly, since the

results obtained are already enough to make a decision;

(vi) Progress of the query processing even when the delivery of data is

blocked. The idea is to overlap delays by executing other tasks when

tuple delivery (from tables involved in join operation) is blocked;

(vii) Reaction to memory-overflow events. When a memory overflow

occurs AJAX triggers a memory-management policy which consists of

flushing part of the data from main memory to disk;

(viii) Prevention of unnecessary disk access. Since it might be necessary to

flush data to disk (during AJAX’s execution), incorrect (like duplicates)

results may be produced. In order to avoid incomplete results, AJAX

need to read data flushed to disk more than once. AJAX is designed

to reduce such disk accesses.

The algorithm has been implemented using the Java language in a

simulation environment designed to restrict the algorithm like the aimed

platforms, in order to validate its behavior and main properties. Standalone tests

and comparative tests with other top-notch algorithms have also been run,

measuring its performance and efficiency; the experimental results can be

found at chapter five.

1.3. Structure

This dissertation is structured as follows: chapter 2 discusses fundamental

concepts of the mobility paradigm; in chapter 3 classical and conventional

algorithms are discussed; chapter 4 presents the formal specification of the

AJAX algorithm, while experimental results and performance analysis are

presented in chapter 5; chapter 6 holds the conclusions and future work.

2. Database Systems and Mobility

2.1. Introduction

Mobility is already part of the modern computing environment; In order to

illustrate such an assertion, one can easily observe that laptops, cell phones

and many other mobile devices can be seen everywhere.

DBMS (Databases Management Systems) are complex systems, and

adapting it to run in a mobile device is not a trivial task, since mobility impacts

the database technology in many points. Query processing and transaction

management are examples of topics of the database technology impacted by

the introduction of mobility in such systems. In fact, new models and new

techniques must be proposed in order to have databases sharing their data on

mobile devices. The fact that mobile devices have got restricted computing

resources, like main memory, processing power, storage and network

bandwidth directly contribute to the inability of running conventional DMBS in

those devices. In this chapter we discuss and analyze the impact of mobility in

DBMS, emphasizing query processing, which is the focus of this work.

This chapter is structured as follows:

Section 2.2. Definition of the mobile computing environment, with its

components and communication infrastructure.

Section 2.3. Ad hoc networks and its characteristics.

Section 2.4. Physical and logical mobility are discussed. Subsections

present software agents and mobility classification under

database technology perspective.

Section 2.5. Enumeration of the key challenges for the proposal and

construction of mobile databases.

2.2. The Mobile Computing Environment

Mobile computing is a generic term describing your ability to use unplugged

computing technology, that is not physically connected (by wires, for example),

or in remote or mobile (non static) environments [5, 19, 48]. Users equipped

with such devices, have the ability to change their physical location in space

arbitrarily, and their devices would still continue connected and operational.

Those devices contain many computing resources like microprocessors,

memory, storage media and communication interfaces. In fact, they can perform

almost any task that a desktop computer could perform, anywhere, at anytime.

A mobile computing environment is formed by mobile devices and mobility

support stations (MSS) [3, 5]. The mobile devices are grouped in geographical

areas covered by the wireless communication infrastructure; those areas are

called “cells”. The wireless communication infrastructure can be a wireless local

area network (WLAN

), an ad hoc

(i) Database systems residing in stationary hosts. In that option database

systems cannot physically migrate between hosts, i.e., their address

network is known a priori and does not change along the time.

network, or even a cell phone network [48].

Communication among cells is guaranteed by the support stations, which are

entry points for the wired network, which interconnects the MSSs.

From the perspective of the database technology, two types of database

systems may coexist in a mobile computing environment:

(ii) Database systems residing in mobile hosts. Those database systems

can change its location in a random way. Picture 1 shows a cell, in

which there are three mobile devices: a laptop, a PDA (Personal

Digital Assistant) and a smart phone. A query named “Q0” is sent by

the PDA to the mobile database hosted by the laptop, which is located

in “X”, at the time interval known as “T1”. While the query is

processed, the host that contains the database in question moves to

“Y”, at the time interval “T2”, and also receives the new query “Q1”

sent by the cell phone. After that, it moves to the “Z” location at time

interval “T3” and returns the result of the first query to the PDA.

Database systems like that are called mobile databases [2, 3, 4, 5].

Wireless Local Area Network.

A dynamically configurable network.

ICTURE 1: MOBILE DATABASE.

It is also worth of mentioning, that when a cell represents a WLAN or ad

hoc network, mobile devices from inside the cell may communicate with each

other directly. However, when a cell represents an area covered by a cell phone

network, the given devices need the intermediation of the MSSs in order to

communicate.

Picture 2 shows a typical mobile computing environment; in the depicted

environment one can see a cell phone cell, an ad hoc network with mobile

databases and a wired network interconnecting everything. Thus, cell phones

could send queries to the databases in the ad hoc network or to the ones

belonging to the wired network; queries sent from the wired network to the

mobile databases in the ad hoc network could be handled as well.

PICTURE 2: MOBILE COMPUTING ENVIRONMENT

2.3. Ad hoc networks

Wireless networks use radio waves, infrared and satellite communications in

order to transport data signals [46, 48].

Despite the volatile nature of the waves, wireless networks still are static

structures, i.e., its topology once formed cannot be altered without network

restructuration, and in order to achieve that, the number of elements and their

type must be known a priori. However, it would be far interesting to just change

the network structure according to the available elements in a given time.

Behaving like that would create new possibilities, like the formation of a network

in which the nodes are airplanes from the same squad; giving them the

opportunity to share information (when close to each other) about enemy

positions, without the need for updates incoming from a centralized source, like

a ground station (far away) or an air station mounted on a spy airplane. This

type of network is a dynamically configurable network [48]. Its topology is

dynamic, changing accordingly to the spatial migration of its elements and

transient connections. A network element can be integrated to it or removed

from it at anytime. That is the key concept behind ad hoc networks.

However, like all wireless networks, ad hoc networks suffer from the

following restrictions. Low bandwidth (compared to the wired ones) and high

error transmission rates, those restrictions limit its applicability, mainly in regard

to the number of applications, turning it in a great challenge for developers and

network protocols, which try to make it viable for the development of mobile

software that depends on this type of infrastructure.

2.4. The Introduction of Mobility on Databases

In order to have a database system supporting mobility, several of its

components and concepts must be reevaluated, and probably redesigned to

make such a support possible.

That support must guarantee data access everywhere, at anytime, and

also consider that several database systems may be interconnected by an ad

hoc network, for instance. A query can be sent from any mobile node to any

mobile database contained in the network.

A mobile database community (MDBC) is a collection of mobile computers

interconnected by a wireless communication infrastructure, in which each host

may be a database client and server. The databases that are part of the

community are autonomous, distributed and depending on the environment,

heterogeneous.

Query processing, concurrency control and recovery are just a few

technologies that need to be reconstructed, in order to enable a database

system to be mobile.

Many issues must be addressed before having an operational mobile

database, like disconnections while processing a simple query (a host may be

randomly shut down). In case of a distributed transaction, one must guarantee

the atomicity of the commit operation, i.e., a new concurrency control protocol

must be implemented on the mobile databases.

2.5. Query Processing on Mobile Databases

The scope of this work is to solve part of the issues related to query processing

on mobile databases. Specifically the ones related to data access and the

execution of queries based on join operations.

Query processing on mobile databases faces several restrictions. Those

are restrictions imposed by the computing environment, like unknown and

heterogeneous data sources (Internet), huge data sources, and devices with

serious computing resources limitations (memory, processing and network

bandwidth). Some ideas must be evolved to minimize the impact of such issues,

like the following:

(i) Decrease of the response time: the main concern on traditional

query processing was the decreasing of the total execution time of the

query [9, 143]. The new efforts have the goal to decrease the

response time to the user, i.e., the delay on the first tuple to be arrived

must be avoided, even if it implies in the increase of the total overall

execution time of the query.

(ii) Adaptability to the processing capabilities of distinct data

sources: the idea here is to guarantee that a query will not be

blocked, upon the inclusion of a data source with low tuple delivery

rates. That situation may occur at anytime in a remote query, because

a host has disconnected from the network, or because the host is

running out of bandwidth etc. Non-blocking query operators and more

fitted execution plans have been proposed [35, 60, 143]. Query

operators are algorithms which implement a query operation, like the

join operation of the relational algebra. Non-blocking operators are

called symmetric, since they handle each data source in an

independent and parallel way. The new execution plans favor the

usage of pipelining

Tuple pipelining.

. A pipelined operator channels the partial result

tuples from lower levels to the higher operator (on the graph of

operators), i.e., one can decrease the query processing time and its

response time by pipelining partial results to other operators involved

in the query. This is important on mobile environments due to the

restricted resources, like low available memory, slower processors

and narrow bandwidth.

(iii) Query optimization during execution: traditional query optimization

is done in a static way; the best plan is computed before the query

execution starts [35, 60]. However, the data sources could change

substantially while the query is running, especially on mobile

databases; a sudden cardinality increase in one relation, for instance.

Such a change may render the optimization inefficient. In order to

minimize such thing, one could optimize the execution plan step-by-

step. On each new data source accessed, the original execution plan

is optimized; in order to adequate it to the current context. This

technique is called adaptive query processing [9, 18, 87, 96, 104,

143].

(iv) Partial results supply: as soon as the result tuples are validated and

the duplicates are discarded, those partial tuples are immediately sent

to the user. This technique is the online query or continuous query [7,

26, 43, 49, 85, 132, 134].

3. Adaptive Join Operators

3.1. Introduction

Adaptive query processing has a fundamental role in the context of mobile

databases access, since conventional query processing techniques are not

designed for databases running on mobile devices, which do not have plenty of

computing resources like desktop computers. Moreover, those techniques do

not take into consideration the unpredictability of the tuple delivery rate of the

sources, nor the randomly disconnecting hosts, which are common problems in

queries involving mobile databases, for instance [2, 5].

Many strategies have been proposed to make query processing adaptive

[90, 96, 104, 108]. One of those uses the notion of adaptive operators. In fact,

the join operator of the relational algebra is usually the one to be implemented,

since it is the most time and space intensive operator that could possibly be

involved in database queries, i.e., the worst case of a database query is when

the query is solely formed by join operations (due to the amount of operations

and data that need to be handled).

An adaptive join operator tends to adapt itself to the tuple delivery

capabilities of the data sources [120]. This is very handy when one of the data

sources has ceased its tuple delivery for an unknown reason at a random time,

and the requested join operation must continue. The mobile host could have

been disconnected from the network, or the network may be suffering from

packet collision (local network) or even with router traffic (Internet); or the

wireless network is simply running out of bandwidth because of its already

narrow bandwidth, for instance. In case of a conventional join operator, the

query processing would be blocked, because of the starvation of tuples to

continue the query. An adaptive operator is designed to keep the query running

even if the data sources stop sending tuples. It can change the operands

position in the join operation, i.e., changing the internal relation (consumed

faster) for the external one (consumed slower). In that way, the query

processing continues and the operator does not block. When the problematic

data source resumes its delivery, the operands (relations) are changed to their

original positions and the operation keeps running normally. The operator

described in this paragraph is called symmetric, because it is able to adapt to

the situation depicted above by handling the sources independently.

Some adaptive join operators have been proposed, like XJoin [120],

Ripple Join [47, 92] and DPHJ (Double Pipeline Hash Join) [145].

This chapter is structured as follows:

Section 3.2. Some relevant topics about the classical XJoin algorithm,

which are analyzed in this section.

Section 3.3. The Ripple Join algorithm is discussed.

Section 3.4. Double Pipeline Hash Join is presented.

Section 3.5. MobiJoin is discussed and some of its techniques are

presented.

Section 3.6. The Adaptive Symmetric Hash Join (ASHJ) is discussed and

compared to other algorithms.

Section 3.7. Hash-Merge Join, a novel merge based algorithm, which has

been designed to decrease the delivery time of relevant

content is presented.

3.2. XJoin

The XJoin algorithm is an extension of SHJ (Symmetric Hash Join), which is an

algorithm designed to run in a high degree of pipelining in parallel database

systems [120]. However, SHJ requires that all hash structures must be loaded

in main memory for the whole time of query processing. Such a characteristic is

undesired when one realizes that the system may run out of memory, if the data

sources require more memory to be joined than the memory that is currently

available. In order to eliminate such undesired behavior, XJoin partitions those

structures and transfer them to secondary memory (to be loaded at another

time). This technique is quite similar to the one implemented on the Hybrid

Hash Join algorithm, to avoid the memory overflow generated by the standard

Hash Join algorithm.

Another XJoin property worth of mention, is related to its ability to handle

data sources with low tuple delivery rates, i.e., suffering from a systematic

delay. While one of the data sources is stopped, the other one is processed.

There must be available tuples from the blocked data source though, may it be

allocated in main memory or flushed to disk. This is also applied to the situation

where both data sources are blocked, but there still are tuples from them

available in the local system. Moreover, other segments of the query could run,

while one or more operations are stopped, since XJoin is pipelined.

The algorithm runs in three phases:

(i). Probing of the tuples allocated in main memory.

(ii). Probing of the tuples allocated in secondary memory.

(iii). Elimination of the duplicated tuples.

The phases (i) and (ii) run in an interchanged manner, phase (ii) is

activated when (i) is stopped due to the absence of incoming tuples. However,

phase (ii) yields its execution to (i) when the data sources become operational.

Phase (iii) runs when (i) and (ii) have already stopped running, in order to clear

the intermediate result from duplicates.

3.3. Ripple Join

Before discussing the Ripple Join algorithm [47], it is interesting to discuss the

one algorithm that spawned it: Nested-Loop Join.

Nested-Loop Join is the simplest algorithm to implement the relational

algebra join operator. It consists of two nested loops, i.e., its time complexity is

quadratic O(n

). The algorithm compares all tuples from the external relation,

with all tuples from the inner loop (internal operand). Tuples from the inner

operand are “consumed” faster than the ones from the external one, for every

external tuple there is a comparison with all internal tuples.

The Ripple Join decreases the query overall response time, in queries in

which the tuple delivery is eminently intermittent [47, 92]. Its symmetry is a nice

feature to achieve such a purpose; When a given data source does not meet

the requirements ruled by a statistic factor, it exchanges the inner and external

operands, and the delaying source becomes the external one, which implies

that those tuples will be consumed slowly. This change depends on the

computation of a statistic factor, which measures the quality of the incoming

host result; this is done because Ripple Join has been designed for decision

support systems with online queries, in which the quality or relevance of the

incoming result is paramount. However, there is a factor under control of the

user, called “sampling rate”. That factor describes a random tuple processing,

and each relation participates of each sampling step loop, ranging from 1 to n or

(n – 1), where n is input by the user. In a simplistic way, Ripple Join is a kind of

Nested-Loop Join, in which the operands position changes according to a

statistic factor, computed based on the accessed data sources.

3.4. Double Pipeline Hash Join (Tukwila Implementation)

DPHJ is a join operator in which the construction of the hash structures is done

in an independent way [145]. A comparison is done between the current tuple

and the hash table of the other data source, and then the tuple is added to the

hash table of the current source. Notwithstanding, this mechanism requires a lot

of available memory, whereas each hash table must be loaded into the

available memory to proceed.

In case of memory overflow, the algorithm adapts itself according to the

latency entered by the user to deliver the results. If the given latency is

somewhat big (less I/O operations), then the algorithm is most likely to run like

the Hybrid Hash Join (called Incremental Left Flush on the Tukwila engine).

Howbeit, if the user chooses a small enough latency, the algorithm becomes the

Incremental Symmetric Flush, which writes both hash structures to disk;

continuously processing new tuples in a parallel way [47]. The aforementioned

procedure is nothing less than its implementation of symmetry, in which the

quintessential factor for adaptation, is the latency (in case of overflow) chosen

by the user, and not a computed statistical factor, like the one present in Ripple

Join.

Symmetry on DPHJ guarantees that the query keeps running even if the

data sources become unavailable.

3.5. MobiJoin

This algorithm [1] is newer than the previous ones, and has been designed with

adaptation techniques in mind; as the XJoin algorithm, this one runs in three

phases:

(i). Probing of the tuples allocated in main memory.

(ii). Probing of the tuples allocated in secondary memory.

(iii). Another probing phase to avoid incomplete results.

The first two phases are interchanged at runtime, (symmetry is

implemented as well) most likely what is done by XJoin, but unlike XJoin, it

does not use timestamps to control which tuples have already been compared.

Instead, the algorithm keeps control tables (a bit table per bucket pair), which

are auxiliary data structures that contain information about already processed

tuples. Such control also avoids the generation of duplicated tuples while still

probing. Furthermore, the first phase is pipelined, which implies that the

algorithm produces result-tuples while probing.

It also implements a mechanism to handle memory overflow at the bucket

level. In reality, it is a memory management policy, which optimizes the memory

usage of the allocated buckets. Moreover, the policy cannot act over buckets

that have already been flushed to disk, i.e., buckets in which its content has

already been partially flushed to disk cannot take any benefit from the memory

management policy.

In fact, the first two phases cannot guarantee a complete and correct

result at the end of the operation; hence the third phase needs to be triggered.

The first phase cannot compute tuples that have not been loaded to memory at

once. As a matter of fact, the first phase is not able to join tuples belonging to

corresponding buckets, when at least one of them has already been flushed to

disk. What is more, the second phase cannot join disk-resident tuples with

tuples that have not been allocated at the corresponding bucket. So, it is up to

the third phase to join the remaining tuples, hence this phase is only triggered

after all tuples from both data sources have already been received.

3.6. Adaptive Symmetric Hash Join

This algorithm has been designed over the Symmetric Hash Join algorithm,

which is an extension of the original Hash Join algorithm itself, with the

additional symmetry implemented. Symmetry is the ability to independently deal

with the data sources, i.e., in a non-blocking way; the processing of the data

sources does not block the join operation.

The ASHJ is the local join operator of the Parallel Hash Ripple Join

algorithm in [143], i.e., it has been designed to run under an online aggregation

system. Since it is the most low-level algorithm running on such a system, it is

the one responsible to deal with memory overflow. Moreover, the adaptive

extension of the algorithm is mostly focused in the handling of memory overflow

events. Indeed, it deals with memory overflow; since it keeps the operation

going on even when multiple overflow events occur. However, when a tuple

supposed to be allocated in a bucket that has already suffered overflow arrives,

it is immediately flushed to disk, and it cannot be allocated in memory anymore.

Such a behavior greatly decreases the operator result rate, since flushed-to-

disk tuples are probed only after all tuples have arrived from their corresponding

sources. In the meanwhile, those disk resident tuples could be used to produce

more result tuples, if probed with memory resident ones. The algorithm

compares only memory resident tuples before all tuples have arrived.

For the sake of clarity, let us consider the worst case, which is when the

algorithm cannot allocate even a bucket pair, and all other buckets (with

different selectivity factors) have been flushed to disk due to the severe lack of

available memory. In that particular case, the algorithm would be consistently

flushing tuples to secondary memory storage (granted that it has the required

resource) until all involved tuples have arrived. After that, it starts probing all

disk resident tuples and dispatching the results. As one can see, the algorithm

would be unable to produce any result tuples while still receiving new tuples,

greatly reducing its output.

3.7. Hash-Merge Join

The Hash-Merge Join algorithm (HMJ, for short) [78]; is an adaptive join

algorithm that handles unpredictable and slow network traffic, and its main

application are queries in which the user only needs the very first few results.

The HMJ algorithm is designed with two goals in mind: (i) minimizing the time to

produce the first tuple set. (ii) Producing join results even if the sources are

blocked. The algorithm also has an informal third phase, which is quite similar to

the one implemented on MobiJoin. Actually, it is another run of the merging

phase.

This algorithm has been constructed over two phases: hashing and

merging phases respectively. The hashing phase produces join results as soon

as data arrives. Once the algorithm runs out of memory, certain buckets are

flushed into disk to make room for the incoming tuples. The hashing phase is

able to produce join results from the unblocked source, in the event of one data

source becomes blocked. Furthermore, if both sources become blocked, the

HMJ triggers its merging phase. In that phase, flushed buckets are loaded and

joined applying a sort-merge routine over the loaded-from-disk tuples. So, the

HMJ algorithm can still produce results when both sources are blocked. As

soon as the blocking of any sources is resumed, the algorithm interchanges the

merging phase over the hashing phase. As a matter of fact, it switches back

and forth between the phases until all tuples are received. After that, the

informal third phase takes place: the entire memory content is flushed to disk

and the “merging phase” produces the final part of the join result.

4. AJAX – Adaptive Join Algorithm for Extreme

Restrictions

4.1. Introduction

In the context of mobile databases, adaptive query processing plays a key role

[106, 108, 133]. In turn, adaptive query operators represent a critical feature for

adaptive query processing. An adaptive query operator should be able to react

to some events, which are related to increasing response time on mobile

databases access. In order to explain the idea of adaptive operator, consider

that a query Q consisting of a join operation has been submitted, and at

undetermined running time one of the data sources (which is an operand of the

join operation) has suddenly block its tuple delivery for some unknown reason.

In other words, the join operation is blocked, and the query’s response time is

increased until the blocked data source resume to delivery tuples again to the

join operation. This scenario is quite common in mobile databases and

conventional operators are not prepared for such a situation.

This work proposes an adaptive join algorithm called AJAX [5, 25]. The

algorithm implements the relational algebra join operator, using hashing and

pipelining techniques to decrement the overall response time in a query

involving join operations. AJAX is the acronym of Adaptive Join Algorithm for

eXtreme restrictions. We claim that AJAX is an adaptive query operator [9, 90],

since it is able to react to events which may considerably increase response

time when executing queries over mobile databases. AJAX ensures the

following properties:

(i) Incremental production of results as soon as the data become available.

Conventional join operators (like nested-loop join and hash join) just

produce results when at least one in vo lve d re la tio n is co m p le te ly

evaluated. Joins executed over distributed data may take a long time until

users begin to receive the result from the join operation. AJAX uses a

pipeline technique in order to incrementally provide results (to users or to

another query operator) as soon as they become available. By doing this,

it is possible, for instance, to users (observing the progress in the query

processing) to decide to stop the execution on-the-fly, since the results

obtained are already enough to make a decision;

(ii) Progress of the query processing even when the delivery of data is

blocked. The idea is to overlap delays by executing other tasks when tuple

delivery (from tables involved in join operation) is blocked;

(iii) Reaction to memory-overflow events. When a memory overflow occurs,

AJAX reacts by flushing part of the data from main memory to disk;

(iv) Prevention of unnecessary disk access. AJAX predecessors’ usually need

to read data previously flushed to disk in another execution phase like

XJoin, in order to avoid incomplete results and/or duplicates; AJAX is

designed to reduce such disk accesses by avoiding duplicate results at

probing time (there is no need for another phase involving disk access).

AJAX adaptive algorithm keeps running, (and the whole query as well)

even though both data sources are blocked. The idea is to overlap delays by

executing other tasks. That is achieved because AJAX is a symmetric operator

and implements intra-operator and inter-operator pipeline as we show on the

next section.

This chapter is structured as follows:

Section 4.2. Describes the first execution phase of the algorithm.

Section 4.3. The probing procedure is exposed, explaining the mechanism

that avoids duplicates generation. The overflow prevention

routine is also explained in a subsection.

Section 4.4. Second execution phase is discussed here, and the

algorithm’s pseudocode is depicted at the end of the section.

Section 4.5. The cost estimation model for the algorithm is discussed.

4.2. First Phase

The key goal of the first phase is to join the largest amount of memory-resident

tuples, using the available memory. This phase starts to run as soon as tuples

begin to arrive and continues executing as long as tuples of at least one of the

tables are arriving. When all tuples of both tables involved in the join operation

have arrived, the first phase stops its execution (and the second phase is

triggered). If the arrival of tuples from both tables is interrupted (blocked), the

execution of the first phase is suspended (and the second phase is started),

provided that there are no new tuples to be processed. When new tuples restart

to arrive, the execution of the first phase is resumed. For that reason, the AJAX

is a non-blocking join operator.

When the first phase starts to run, two hash tables are created (one for

each table of the join operation). Thereafter, the hash tables can be populated

(as soon as tuples from the tables involved in the join operation arrive) by

applying a hash function h over the join attributes. Each hash table is composed

by buckets. AJAX’s hash tables are double ended queues (deque) [83] of

buckets, where each bucket is a linked list of tuples and there is no limit for its

size, which characterizes a dynamic data structure. The dynamic bucket is a

way to implement a more fair memory allocation policy; in contrast to

conventional operators which have buckets with limited size (incurring on

bucket overflow). Such technique guarantees that all available memory will be

used to allocate as many tuples as possible. Buckets with limited size do not

offer the best allocation, because a great quantity of buckets could be empty or

quite empty, but they will still be holding memory that could be used by tuples

with a greater selectivity factor, i.e., memory will be wasted if a given bucket can

hold 100 tuples, but only 5 has already been allocated in it. Besides that, using

fixed-size buckets implies on memory overflow occurring more frequently, since

the memory-overflow event occurs for buckets and there are several buckets for

each hash table. Such a feature may represent a significant increase in query

response time. In AJAX a bucket that “deserves” to be larger (in size) will be in

fact larger.

For each set of received tuples by AJAX, the memory size that is

necessary to store the received tuples is calculated and compared with a

minimal memory limit (several values have been tested on the experimental

results section), which is a memory size great enough to keep the algorithm

running and to allocate disk pages during second phase. This procedure is

done to avoid memory overflow in both phases; AJAX has been designed to run

in very resource-constrained environments (extreme restrictions). After

checking if there is enough room in memory, the tuples are allocated in their

corresponding hash tables by applying a hash function on the join attribute. The

hash function result is the bucket address which will receive that particular

tuple. It is important to observe that the overflow granularity in AJAX has been

moved from the bucket level to the system (memory) level. As already

mentioned, such a strategy guarantees an increase in overall AJAX’s

performance, because a system overflow occurs less frequently than a bucket

overflow event.

The algorithm has been designed to do as many probing as possible in a

specific time interval, to do that, AJAX uses inter and intra-operator pipelining

[10, 115]. Some additional information is required in order to avoid duplicates

and unnecessary probing on pipelined probing. Tuples that have been allocated

in its corresponding buckets have an index that marks its relative position in the

bucket. In fact, this index represents its insertion order in the bucket. Another

useful information added on tuples is the last probe remembrance, which is an

index indicating the last tuple on the opposed bucket that has been probed with

this single tuple. This information will be used to guarantee the progressive

probing (described on next section) correctness and completeness.

4.3. Probing

Probing in AJAX is progressive; this progressiveness avoids duplication and

unnecessary probing. The idea is to run the probing phase as a loop, which

compares all pairs of buckets (from both relations) with the same hash address,

comparing tuples that have arrived during the current iteration i and tuples from

iteration (i – 1), i.e., tuples that have been arrived on the last iteration but not

probed yet . To illustrate this idea observe the execution scenario depicted on

figure 1. Consider that the join relations are alpha and beta, and that exists two

buckets (x and y) for each hash table.

The probing is done by comparing tuples from buckets with the same hash

address; tuples belonging to the bucket with hash address “x” of alpha are

compared with tuples of the bucket “x” of the beta’s hash table, producing

result-tuples incrementally. This procedure is executed for every bucket pair in

the hash tables until the last pair is reached. After that, the probing continues

from the beginning. For that reason, a deque data structure has been chosen;

the algorithm’s loop traverses from the beginning of the buckets list to its end in

each iteration. Beginning from the list’s head guarantees that tuples that have

been received during a given iteration “i” might be probed in the next iteration (i

+ 1), since some tuples may be inserted in bucket “x” from alpha while the

probing is in progress in the bucket pair (from both hash tables) with hash

address “y”. Therefore, the probing is progressive, since tuples which have

arrived in iteration “i” may or may not be probed during this iteration. However,

AJAX guarantees that in such a scenario, those tuples will be probed on next

iteration (i + 1).

PICTURE 1: RUNNING SCENARIO OF THE PROGRESSIVE PROBING

In order to avoid duplicates in the result of the join operation and

unnecessary probing, AJAX implements the following strategy. Whenever a

tuple is received by AJAX, it adds two columns to the tuple. One column, called

position index, represents the receive order of the tuple in a given bucket. The

other column, denoted last probe remembrance (LPR for short), has the

following functionality; the LPR of a tuple “t”, which belongs to a bucket with

hash address ‘x’, stores the value of the position-index column of the last tuple

of the bucket with hash address ‘x’ of the other hash table which has already

been compared (probed) with “t”.

Thus, for each tuple in a bucket with hash address ‘x’ of a hash table

Alpha, AJAX compares its LPR with the position index of the tuple in bucket ‘x’

of the other hash table, say Beta. Two situations are possible:

(i) The value of LPR of Alpha’s tuple is less than the value of position

index of Beta’s tuple. In that case, the tuples have not been probed

yet. For that reason, their join attributes are compared to check if

the tuples should be inserted in the result. After that, The LPR of

Alpha’s tuple receives the position index value of the Beta’s tuple.

This guarantees that those tuples will no longer be probed.

(ii) The value of LPR of Alpha’s tuple is greater or equal to the value of

position index of Beta’s tuple, which means that the tuples have

already been compared, and will not be probed again.

Figure 2 illustrates how AJAX uses the LPR and position-index values.

The last attribute of every tuple in Figure 2 represents the LPR column. On the

other hand, the first column represents the position-index value. Observe that

the first tuple belonging to bucket ‘x’ of Alpha’s hash table has position-index

equal to 0 and LPR equal to 2, which means that it has already been probed

with the tuple of position-index with values greater than -1 and less than 3 in

Beta’s bucket. Tuples with LPR equal to -1 are new tuples (recently allocated),

which were not compared with any other tuple.

FIGURE 2: LAST PROBE REMEMBRANCE.

4.3.1. Reacting to Memory Overflow

When a memory overflow event occurs during the join, an adaptive operator

should be capable of reacting to such event, avoiding stopping its execution. In

fact, AJAX anticipates its reaction to the occurrence of memory overflow events.

Such a property is possible, since AJAX checks the amount of available

memory at tuple arrival time. It calculates then the required room to allocate the

tuples. If the tuples can be allocated in the available memory amount, then they

are inserted into their corresponding buckets. Otherwise, the algorithm makes

room to allocate the incoming tuples. In order to free memory space, something

has to be deallocated. Many deallocation policies can be implemented on

AJAX. The default policy chooses the largest bucket pair (the pair with the

largest quantity of tuples). After that, the algorithm flushes those buckets to disk

and deallocate all tuples stored in those buckets. This is a recursive process,

which is executed until there is enough memory for allocating the incoming

tuples.

When both data sources are blocked or EOF is received from them, the

second phase of AJAX is triggered.

4.4. Second Phase

Second phase uses buckets previously flushed to disk to produce result-tuples.

The execution of the second phase is triggered to react to different events. First,

when both data sources are blocked (due to external events, like low bandwidth

or abrupt shutdown), AJAX executes its second phase in order to overlap

delays. Second, when all the tuples of both tables are received, the second

phase is triggered for the last time in order to ensure completeness of the join

result produced by AJAX.

The execution of the first and second phases is alternated every time the

data sources become blocked. In order to continue generating result-tuples, the

second phase uses the buckets flushed to disk to feed the progressive probing

loop.

Buckets are flushed to disk to avoid memory overflow. However, those

disk-resident buckets could produce another overflow when they are loaded to

be probed with other buckets in memory. This “second overflow” is prevented

using the minimal memory limit. Before flushing the chosen buckets to disk, the

overflow preventing algorithm divides each single bucket in smaller partitions

having the same size of a disk page (4 KB, for example), then these pages are

flushed to disk. The loading is done using part (how much is used is a

parameter) of the minimal memory limit area; only a few pages are loaded at

once and probed with another tuples on memory; after probing the pages,

another set is loaded to memory and the previously loaded ones are

deallocated. This process continues until all pages are loaded and probed, or

when one of the data sources has become operational again.

To avoid probing the same set of tuples again, the LPR concept is used at

disk page level, every page has its own LPR indicating the last page that have

already been probed with it. This minimizes the disk I/O, increasing the AJAX’s

performance and avoiding duplicated result-tuples, which leads to incorrect

results.

Figure 3 shows a simplistic pseudo code of the algorithm, many details

have been omitted in order to preserve a didactic approach. It is important to

say that the process of receiving tuples from data sources and the overflow

preventing process are executed in different threads, which is not represented

in Figure 3.

Procedure TupleArrival (tuple t, sources (A, B))

Begin

1. If t exceeds the minimal memory limit

(a) Choose two buckets A

and B

(b) Probe buckets A

and B

(d) Deallocate A

and B

2. Calculate the hash of t.

3. Insert t in the correct bucket.

End

Procedure ProgressiveProbing (buckets (A

, B

), pages (PA

, PB

))

Begin

1. If buckets A

or B

got a new tuple t

(a) Probe t with all tuples with position index lower than

its LPR.

(b) Update the LPR of all probed tuples with the position

index of t.

1.1 If sources are blocked

(a) Probe A

with all disk pages with index less than its

LPR.

(b) Probe B

with all disk pages with index less than its

LPR.

with all disk pages with index less

than its LPR.

with all disk pages with index less

than its LPR.

(d) Update the LPR of the involved tuples.

(e) Update the result stream.

End

FIGURE 3: AJAX SIMPLIFIED.

4.5. Cost Estimation

In order to estimate the cost of executing the AJAX operator, consider the

scenario depicted in Figure 4. M represents a mobile database community

(MDBC). DB

, DB

and DB

represent the mobile databases belonging to M.

Those mobile databases reside in mobile hosts C

, C

and C

, respectively. The

mobile hosts C

, C

and C

are interconnected by means of an ad hoc network.

Now suppose that a user submits a query Q in C

. The query Q represents in

fact a join operation (of the relational algebra) between two unsorted tables A

and B, which are stored in DB

and DB

, respectively. The tuples of relations A

and B should be delivered to the mobile host C

where the join operation has to

be executed. Thereafter, the result of Q should be delivered to the user, who

has submitted Q. We are assuming that the database systems in C

, C

and C

are interoperable.

FIGURE 4: RUNNING EXAMPLE.

For the sake of simplicity, let us consider that the tables A and B have the

same cardinality c, the same tuple size and there is an even distribution of

tuples between the two hash tables. The available main memory area in C

for

executing the AJAX has the capacity of storing s tuples. Without lost of

generality, the measure for estimating that cost is the number of tuple transfers

to/from disk, instead of block (page) transfer. The cost for AJAX can be

estimated, according the following conditions:

(i)

2≥

. In this case, there is no overflow. Thus, no tuple is

flushed to disk and the third phase is not triggered, which gives a

cost of 2c (for reading tables A and B);

(ii)

≤<

. In this case, there is an overflow of c tuples, since we

have assumed an even distribution of tuples across the hash

tables. The cost is given by

)

( )

cccc

c ++++

, i.e., 4c.

Note that, one term

)

(

represents the cost of flushing

tuples to disk and the other represents the cost of reading the

tuples from disk for executing the third phase.

(iii)

cs <

. In this situation, we have an overflow of

)

2( sc −

tuples,

i.e., an overflow of

)

(

c −

tuples for each table. Thus, the

estimated cost is

)2()2(2 scscc −+−+

, giving the final

cost of

)

3(2 sc −

5. Experimental Results and Analysis

5.1. Introduction

In order to investigate performance of the AJAX operator, we have performed

several experiments. In the experiments, a query Q is submitted to a mobile

computer C

, where Q represents a join operation between two unsorted tables

A and B, each of which residing in a different mobile computer, C

and C

respectively. The tuples of those relations should be delivered to mobile host C

where the join operation has to be executed.

The experiments were divided into two classes. In the first class of

experiments, we have simulated the execution of AJAX in devices with severe

memory restrictions interconnected by a wireless network, which may cause a

bursty arrival of tuples in C

. In the second class of experiments, we have run

the XJoin [120] and Hash-Merge Join [78] (HMJ, for short) operators (using the

same testbed) in order to compare AJAX with them. The test environment was

composed of 3 Pentium-4 machines, with 3.0 GHz processors and connected

by a wireless network. The three operators (AJAX, XJoin and HMJ) were

implemented in Java. The mobile-like restrictions have been simulated by the

Java Virtual Machine.

This chapter is structured as follows:

Section 5.2. Standalone tests simulating memory availability changes are

presented, in order to analyze the algorithm’s viability.

Section 5.3. In this section AJAX’s performance is measured and

compared to other algorithms.

Section 5.4. An analytical discussion of the related work is presented.

5.2. Standalone Tests

In this class of experiments, the goal was to evaluate performance issues of

AJAX. The cardinality of the tables A and B (operands of the join operation) is

and 6x10

, respectively. The cardinality of the result (of the join operation) is

6x10

, each tuple has an approximate size of 56 bytes. To execute such a join

operation without memory overflow, it is necessary a minimum of 5.7MB of free

memory in C

(the device that has requested the query).The bursty arrival of

tuples (due to low bandwidth of the wireless network) was simulated by

introducing a delay of 5 seconds for every 10

tuples delivered to the algorithm

(running in C

First, we have performed experiments to investigate AJAX’s behavior with

different memory sizes and bursty arrival of tuples (in C

). AJAX was used to

execute a join operation in a computer C

with three memory sizes, 64MB, 4MB

and 1MB. The results are shown in Figure 5.

FIGURE 5: MEMORY OVERFLOW PREVENTED.

Looking more closely at Figure 5, one can observe that the algorithm has

a linear behavior regardless the available memory size, considering the

cardinalities used in the tests. The algorithm also presents an aggressive

behavior in the earlier generation of result-tuples, this can be seen comparing

the 4 MB and 1MB curves, the time difference between their points is small until

4x10

result-tuples are generated, and increases just after 4x10

result-tuples.

This is because AJAX requires more disk I/O in the 1 MB scenario when

executing the second phase for the last time (when all tuples have arrived). In

other words, the algorithm minimizes the impact of low memory on query’s

response time. Running in a scenario with 1/6 of the memory required to

allocate both relations, shows that the algorithm has reacted to the memory

limitation. Recall that this situation was simulated with a blocking of 5 seconds

for both data sources for each 10

received tuples, which implies that the

second phase was activated by the algorithm.

In order to verify the importance of the second phase to the overall

decreasing on query’s response time, another experiment using 1 MB of

available memory was performed. For that set of experiments, we have first

switched off the second phase. Thereafter, we have switched on the second

phase. Figure 6 shows the results of that experiment. Observe that AJAX has

an aggressive strategy to produce results as soon as possible when the second

phase is executed.

FIGURE 6: SECOND PHASE EFFICIENCY.

In Figure 6, one can observe that, for a given time interval, the algorithm

has generated a greater quantity of result-tuples when the second phase was

enabled than when the second phase was switched off. Such a property

decreases the query’s response time. This result is achieved because the

second phase uses tuples on disk to continue the probing when both data

sources are blocked, otherwise the probing mechanism would be blocked as

well (the probing “starves” by the absence of tuples).

5.3. Comparative Tests

We have also performed experiments to compare AJAX with XJoin and HMJ,

regarding memory restriction (which induces memory overflow) and the

occurrence of data delivery interruptions (which induces the execution of the

second phase). For the experiments, the tables A and B (operands of the join

operation) have cardinalities of 10

and 6x10

tuples, respectively. We have

consider tuples of 20 bytes and the cardinality of the result is 10

, which implies

that around 2MB of available memory is required to produce the join result

without the occurrence of memory overflow.

Initially, we have compared AJAX with XJoin [120], which is a well-known

adaptive join algorithm. Since disk access is the major bottleneck for join

algorithms, it has been chosen as the quantity to be measured in the

experiments. The results are illustrated in Figure 7 and 8. Observe that both

algorithms share a similar behavior until 6x10

result-tuples are generated,

which is the point when both algorithms have already flushed a great number of

tuples to disk and more disk I/O is required to continue the operation. Looking at

the AJAX series, one can see that the algorithm curve is quite linear even after

6x10

result-tuples have been generated, which is not the case of XJoin (see

Figure 7). XJoin presents an exponential growth after 6x10

result-tuples have

been generated, which represents much more disk I/O than AJAX.

As the available memory decreases, the performance similarity between

XJoin and AJAX cease earlier, which can be observed in figure 8. In Figure 8,

one can also observe that to produce 4x10

result-tuples XJoin increases its

disk I/O exponentially, while AJAX increases its own disk I/O in a quasi-linear

way (for the used cardinalities).

Besides the fact that disk I/O in XJoin grows faster as the available

memory decreases, XJoin is machine-precision dependant. Since XJoin applies

a timestamp to the recently arrived tuples, in order to avoid duplicates, it might

Figure 7: 1024KB of available memory. Figure 8: 600KB of available memory.

still generate duplicates, because the timestamp can be duplicated if the

machine in which the algorithm runs has not the required precision to count

time. Considering that XJoin runs in a device capable to count time up to

milliseconds, it cannot guarantee that two tuples that have arrived almost at the

same time (in the same milliseconds tick) will receive different timestamps. For

the sake of clarity, let tuple t

be received at the real time of 12:33:0001 and

tuple t

is received at 12:33:0003, since the given device can only count up to

milliseconds, the given tuples will receive the same timestamp of 12:33:000.

Such machine-precision dependant behavior of XJoin is even worse when it is

executing in mobile devices, since such devices may run processors with low

time precision and no floating point support.

Another set of experiments have been performed, but this time, the Hash-

Merge Join (HMJ) algorithm has been chosen to be compared with AJAX. First,

we have simulated devices with 1024KB, 800KB and 600KB of free memory.

For devices with 1024KB and 800KB, HMJ slightly produced less disk access

than AJAX. However, such behavior changed when the available memory is

shifted to 600KB: HMJ performance slightly degraded.

From 800KB to 600KB, HMJ accessed the disk more often than AJAX at

higher amounts of available memory. In order to analyze such behavior, the

memory has been dropped to 512, 256 and 128KB of available memory, and

the same behavior has been observed. Based on the data gathered from those

tests and closer observation of the algorithms, one can conclude that such

behavior is generated by the minimal memory limit imposed by the AJAX

Figure 9: AJAX x HMJ with 1024KB

of available memory.

Figure 10: AJAX x HMJ with 800KB

of available memory.

algorithm; that limit guarantees that no “second overflow” will occur when the

algorithm starts loading disk buckets (pages) to the main memory. This “second

overflow” is not handled by HMJ, which always use all memory available while

running, which leads to another overflow while trying to load disk pages to

compare with tuples still allocated in memory. From 600KB to 128KB, AJAX

performance continued to increase (see figures 11 and 12) compared to HMJ,

even using a minimal memory limit as low as 16KB, which can allocate 2 disk

pages (8KB) and handle the algorithm overhead (around 8KB).

It is important to note that HMJ requires the execution of an in-memory

sort algorithm on its both phases (see Section 4). Of course, such sort

procedure has a computational cost, which impacts in HMJ’s overall

performance. That cost is estimated and discussed in the next section.

5.4. Analysis of the Related Work

In this subsection, we analyze some of the published proposals of adaptive join

operators. The properties used to analyze the proposals are (i) result production

rate and (ii) progress of the query processing even when the delivery of data is

blocked. The first property makes possible to identify the number of tuples

(belonging to the result) produced by the operator within a given time interval.

That property gives an idea of the pipelining level supported by the operator.

The second property identifies whether or not the operator interrupts its

execution when tuples are not arriving. If the second property is supported by a

Figure 11: HMJ performance degrades (with 600KB of

available memory).

Figure 12: HMJ performance degrades (with 128KB of

available memory).

join operator, the time interval required to process the join operation can be

strongly reduced.

The result production rate is a function of the strategy used to process

tuples allocated in buckets for which memory overflow has occurred at least

once. For the operator Adaptive Symmetric Hash Join (ASHJ) proposed in

[143], tuples arriving for a bucket which has suffered of at least one overflow are

directly flushed to disk, i.e., they are not allocated in memory anymore. Such a

strategy reduces the result production rate, since tuples allocated to disk will be

processed only when all tuples of both relations have already arrived. Moreover,

those tuples should be read from disk again to be processed by the ASHJ. For

that reason, ASHJ may present a worse performance than the conventional

hash join operator, since a large number of tuples should be read twice from

disk, considering a scenario with severe memory restrictions. The Incremental

Symmetric Flush Join, proposed by Ives et al [143], presents a similar strategy

to the one presented by the ASHJ to process tuples arriving to buckets which

has suffered from bucket overflow phenomenon at least once. Thus, the

Symmetric Flush Join suffers the same restrictions of the ASHJ w.r.t result

production rate. In [143], another adaptive join operator is proposed, the

Incremental Left Flush Join. According to the algorithm of that operator, on the

occurrence of the first memory overflow, the Incremental Left Flush Join

chooses one of the relations of the join operation, say A, and stops to receive

tuples from A, only receiving (reading) tuples of the other relation, say B. From

that point on, whenever a memory overflow occurs, in-memory buckets

containing tuples of A are flushed to disks and the respective memory area can

be used for allocating tuples of B which are arriving. Only after all tuples of

relation B have arrived, the Incremental Left Flush Join operator resumes

receiving tuples of relation A. Note that if the arrival of tuples of relation B is

interrupted (blocked), the operator stops its execution. Such a characteristic

may represent a strong reduction of the result production rate. The AJAX

continuously processes tuples of both relations involved in a join operation

independently of the occurrence or not of memory overflow. For that reason, the

AJAX presents a higher result production rate than the aforementioned

operators.

Comparing the adaptive join operators considering how the operator

reacts when the delivery of data is blocked, only AJAX [5, 25] and XJoin [120]

and Hash-Merge Join [78] continue their execution. The other operators stop

their execution when the delivery of data is blocked. In order to react to the

data-delivery blocking event, AJAX, XJoin and Hash-Merge Join use similar

strategy, they compare disk-resident tuples of both relations.

The Hash-Merge Join algorithm has two phases: the hashing phase,

which joins incoming tuples stored in in-memory hash buckets based on their

hash values; and the merging phase, which joins tuples that were previously

flushed to disk using a merging strategy. Although Hash-Merge Join has the

same hashing phase as that of Xjoin and AJAX, the merging phase may

introduces two critical problems into the HMJ operator. First, when it is

necessary to flush an in-memory bucket to disk, the algorithm requires that all

tuples in that bucket be sorted. Such sort procedure has a computational cost,

which impacts in HMJ’s performance (mainly in devices with restricted

processing capabilities). To estimate the sort cost in HMJ, let k be the number

of buckets of each hash table and n be the cardinality of each relation involved

in the join operation. Considering that tuples of both relations are uniformly

distributed among the buckets of both hash tables, the cost (complexity) for

sorting the tuples in a given bucket is of

log

, thus the cost for sorting all

tuples during the execution of HMJ is

nlog2

.Observe, that cost depends on

the cardinality of both relations involved in the join operation, which may

increase dramatically queries response time. It is worthwhile to emphasize that

AJAX does not suffer from that problem.

The second critical problem generated by the merging phase of HMJ

stems from the fact that the adaptive flushing policy adopted by HMJ may

produce disk fragmentation. That is because each bucket flushed to disk

represents at least a new allocated block in disk, even when the bucket has just

one tuple. It is important to note that disk partitions just start to be merged by

Hash-Merge Join when both relations involved in the join processing are

blocked. Thus, when that algorithm experiments long periods without

processing the second phase, the disk fragmentation become even worse,

considering the required number of disk access to merge all disk-resident

buckets, in order to join tuples in those bucket. We argue that AJAX does not

favor disk fragmentation, since each bucket has just one in-memory partition

and, when overflow occurs, one in-disk portion. Thus tuples sent to disk are just

appended at the end of the final block allocated for their correspondent in-disk

bucket. Furthermore, in case of memory overflows, AJAX chooses as victims,

buckets already flushed to disk before (if they exist), which also reduces disk

fragmentation.

It is important to note that Xjoin may produce duplicate and/or incomplete

results. For avoiding such an undesirable phenomenon, XJoin implements a

mechanism based on timestamps. To implement that mechanism, XJoin adds

two timestamps for each tuple (which were processed during the first and

second phases [120]). Moreover, for each disk-resident bucket processed

(read) during the XJoin’s second phase, a linked list is created. Each element in

the linked list stores two timestamps. One timestamp registers when the last

tuple of the bucket was flushed to disk. The other timestamp registers when the

second phase has read the disk-resident bucket. Regarding the strategy

implemented by XJoin to avoid duplicate and/or incomplete results when

executing a join operation, there are two critical problems. First, the proposed

strategy fails when the XJoin operator receives at the same time a set of tuples

belonging to the relations involved in the join operation, since, depending on the

machine’s clock precision, the timestamp set to that set of tuples will be the

same. Second, XJoin requires too much control structures in order to avoid

duplicate/incomplete results. AJAX avoids duplicates and unnecessary probing

during the probing itself; it does not need another execution phase, which

certainly may degrade performance.

6. Conclusions

In environments with support to mobile computing, data access becomes

unpredictable (due to transiently disconnections from the network) and mobiles

hosts (in which mobile databases reside) may suffer from limited available main

memory to process query operators, such as join operations. For that reason,

traditional distributed query processing techniques and operators are

inadequate to process queries on mobile databases. Traditional operators like

Nested-Loop Join, Hash Join and many others have been designed to run over

desktop computers connected by wired networks. In fact, traditional algorithms

have been designed for the sole purpose of running the join operation, not to

deal with network related problems or resource restrictions.

Unlike traditional operators, adaptive operators are designed to solve

others problems other than the join operation itself. Those operators may adapt

the entire join operation to run under adverse circumstances, like tuple delivery

delay or bursty delivery. In that class are algorithms like XJoin, Ripple Join and

Double Pipeline Hash Join. XJoin can handle blocking data sources properly,

for instance.

However, most of the adaptive algorithms have not been designed to run

under mobile devices or other resource restricted platforms, which suffer from

the lack of computing resources, where desktop computers have in abundance

(target of the classical adaptive algorithms), namely main memory and

processing power. While Hash-Merge Join has been designed for the Internet

and desktop platforms, its properties and techniques allow it to properly execute

join operations on mobile devices, but a mobile device still is not its optimal

running scenario (as one can see in the experimental results section).

For that reason, a new adaptive join operator, denoted AJAX (Adaptive

Join Algorithm for eXtreme Restrictions), has been proposed. An operator able

to properly run under very resource restricted platforms. The proposed operator

ensures:

(i) incremental production of results while receiving new tuples;

(ii) query processing continuity even if both data sources are blocked

and;

(iii) reaction to low memory contexts, by handling memory overflows in

order to keep the operator running and producing results.

The algorithm modus operandi has been validated through standalone

experiments in the experimental results chapter. In that chapter, one can see

that the above mentioned properties hold in ideal circumstances (namely

abundance of memory and bandwidth) and in very resource restricted

scenarios. Although, the mentioned standalone experiments validate the

algorithm, comparative tests have been run in order to measure AJAX’s

performance and its viability compared to already established algorithms, like

XJoin and Hash-Merge Join. As one can see in section 5.3, AJAX has proven to

be more suited for very resource restricted running scenarios than the other

algorithms.

Moreover, an analytical comparison (section 5.4) has been done with

related works considering the following properties:

(i) result production rate and;

(ii) progress of the query processing even when the delivery of data is

blocked;

In the analytical comparison section one can see that AJAX continuously

processes tuples of both relations, independently of the occurrence of memory

overflow, unlike Adaptive Symmetric Hash Join and Incremental Symmetric

Flush Join, for example, which may decrease the result production rate of those

algorithms. AJAX also does not increase its computational complexity cost with

a sort-merge-like routine trying to avoid the join operation blocking, like Hash-

Merge Join does. By the previously mentioned reason, one can infer that the

greater the cardinality of the relations, the greater is the query response time on

Hash-Merge Join operations. AJAX’s performance is not affected by the size of

the relations. AJAX also does not favor disk fragmentation, since each bucket

has just one in-memory partition and (when overflow occurs) one in-disk

portion; new flushed tuples of those in-disk portions are just appended to that

specific disk block, which prevents excessive disk fragmentation, unlike Hash-

Merge Join. AJAX avoids duplicates and unnecessary probing during the

probing itself; it does not need another execution phase, which certainly may

degrade performance, unlike the additional phase in XJoin. That algorithm also

fails at avoiding duplicates, because its duplicate avoiding strategy is based on

timestamps, which can have duplicate values for two tuples arriving

simultaneously at the host machine, depending on its time resolution

capabilities.

By the presented reasons, AJAX can be called an adaptive query

operator, reacting to events which could increase query processing time.

Future work may consist of a new version of AJAX intended to join data

streams in wireless sensor networks.

References

1. A. Brayner and E. Vasconcelos

. MobiJoin: um operador de junção para

bancos de dados móveis . In: Workshop

de Comunicação sem Fio e

Computação Móvel, 6., 2004, Fortaleza. Anais... Fortaleza, 2004. p. 153-

160.

2. A. Bra

yner and J. A. Filho. "Sharing Mobile Databases in Dynamically

Configuration Environments". Advanced Information Systems

Engineering. Lecture Notes in Computer Science, 2681. Springer-Verlag,

2003. pages 724-737.

3. A. Brayner and J. A. M. Filho. "AMDB: An Approach for Sharing Mobile

Databases in Dynamically Configurable Environments". Proceedings of

the XVII Brazilian Symposium on Databases (SBBD). October, 14 - 16.

Gramado, Rio Grande do Sul, Brazil. 2002.

4. A. Brayner and J. A. M. Filho. "Increasing Mobile Transaction

Concurrency in Mobile Database Communities". Proceedings of the V

Wireless Communication Workshop (WCSF03). October, 27 - 30. São

Lourenço, Minas Gerais, Brazil. 2003.

5. A. Brayner, J. A. M. Filho, M. Holanda and E. Werbet, n.d. “Ensuring

Mobile Databases Interoperability in Ad Hoc Configurable Environments:

A Plug-and-play Approach” in

New Trends in Database Systems.

Methods, Tools and Applications, Springer-

Verlag. Submitted for

publication.

6. A. Aboulnaga and S. Chaudhuri, “Self-tuning histogra

ms: building

histograms without looking at data,” in SIGMOD ’99: Proceedings of the

1999 ACM SIGMOD international conference on Management of data,

(New York, NY, USA), pp. 181-192, ACM Press, 1999.

7. A. Arasu, S. Babu, and J. Widom, “The CQL continuous query language:

Semantic foundations and query execution,” The VLDB Journal, vol. 15,

no. 2, pp. 121-142, 2006.

8. A. Condon, A. Deshpande, L. Hellerstein, and N.

Wu, “Flow algorithms

for two pipelined filter ordering problems,” in PODS ’06: Proceedings of

the twenty-fifth ACM SIGMOD-SIGACT-

SIGART symposium on

Principles of database systems, (New York, NY, USA), pp. 193-202,

ACM Press, 2006.

9. A. Deshpande and J. M. Hellerstein, “Lifting the burden of history from

adaptive query processing,” in VLDB ’04: Proceedings of the 30th

International Conference on Very Large Data Bases,

Toronto, Canada,

August 29-September 3 2004.

10. A. Deshpande and L. Hellerstein, “Flow algorithms for parallel query

optimization,” Tech. Rep. CS-TR-4873, University of Maryland at College

Park, 2007.

11. A. Deshpande, “An initial study of overheads of eddies,” SIGMOD Rec.,

vol. 33, no. 1, pp. 44-49, 2004.

12. A. Deshpande, C. Guestrin, W. Hong, and S. Madden, “Exploiting

correlated attributes in acquisitional query processing,” in ICDE ’05:

Proceedings of the 21st International Conference on Data Engineering

(ICDE’05), (Washington, DC, USA), pp. 143-

154, IEEE Computer

Society, 2005.

13. A. Deshpande, M. Garofalakis, and R. Rastogi, “Independence is good:

dependency-based histogram synopses for high-dimensional data,” in

SIGMOD ’01: Proceedings of the 2001 ACM SIGMOD international

conference on Management of data, (New York, NY, USA), pp. 199-210,

ACM Press, 2001.

14. A. Hulgeri and S. Sudarshan, “AniPQO: Almost non-intrusive parametric

query optimization for nonlinear cost functions.,” in

VLDB ’03:

Proceedings of 29th International Conference on Very Large Data Bases,

September 9-12, 2003, Berlin, Germany, pp. 766-777, 2003.

15. A. Hulgeri and S. Sudarshan, “Parametric query optimization for linear

and piecewise linear cost functions.,” in VLDB ’02: Proceedings of 28th

International Conference on Very Large Data Bases, August 20-23,

2002, Hong Kong, China, pp. 167-178, 2002.

16. A. N. Wilschut and P. M. G. Apers, “Dataflow query execution in a

parallel main-memory environment,” in PDIS ’91: Proceedings of the First

International Conference on Parallel and Distributed Information

Systems, Fontainebleu Hilton Resort, Miami Beach, FL, pp. 68-77, IEEE

Computer Society, 1991.

17. B. Babcock and S. Chaudhuri, “Towards a robust query optimizer: a

principled and practical approach,” in

SIGMOD ’05: Proceedings of the

2005 ACM SIGMOD international conference on Management of data,

(New York, NY, USA), pp. 119-130, ACM Press, 2005.

18. C. M. Chen and N. Roussopoulos, “Adaptive selectivity estimation using

query feedback,” in SIGMOD ’94: Proceedings of the 1994 ACM

SIGMOD international conference on Management of data, (New York,

NY, USA), pp. 161-172, ACM Press, 1994.

19.

D. B. Lange and M. Oshima. “Programming and Deploying Java Mobile

Agents with Aglets”. Addison-Wesley. Massachusetts, USA. 1998. ISBN

0201325829.

20. D. A. Berry and B. Fristedt, Bandit Problems: Sequential Allocation of

Experiments. Springer, October 1985.

21. D. A. Huffman, “A method for the construction of minimum redundancy

codes,” in Proc. Inst. Radio Eng., pp. 1098-1101, 1952.

22. D. Daniels, “Query compilation in a distributed database system,” Tech.

Rep., IBM, 1982. Research Report RJ 3423.

23. D. Kossmann, “The state of the art in distributed query processing,” ACM

Comput. Surv., vol. 32, no. 4, pp. 422-469, 2000.

24. D. Zhang, J. Li, K. Kimeli, and W. Wang, “Sliding window based multi-join

algorithms over distributed data streams,” in

ICDE ’06: Proceedings of

the 22nd International Conference on Data Engineering (ICDE’06),

(Washington, DC, USA), p. 139, IEEE Computer Society, 2006.

25. E. Werbet and A. Brayner. “AJAX - Adaptive Join Algorithm for Extreme

Restrictions”.

Proceedings of the XXII Brazilian Symposium on

Databases (SBBD). October, 15 -

19. Joao Pessoa, Paraiba, Brazil.

2007.

26. E. A. Rundensteiner, L. Ding, T. M. Sutherland, Y. Zhu, B. Pielech, and

N. Mehta, “CAPE: Continuous query engine with heterogeneous-grained

adaptivity,” in VLDB ’04: Proceedings of the Thirtieth International

Conference on Very Large Data Bases, Toronto, Canada, pp. 1353-

1356, 2004.

27. E. F. Codd, “A relational model of data for large shared data banks,”

Commun. ACM, vol. 26, no. 1, pp. 64-69, 1983.

28. E. Viglas and S.-J. F. Naughton,

Novel Query Optimization and

Evaluation Techniques

. PhD thesis, University of Wisconsin at Madison,

2003.

29. E. Wong and K. Youssefi, “Decomposition --

strategy for query

processing,” ACM Transactions on Database Systems, vol. 1, no. 3,

pp. 223-241, 1976.

30. F. Chu, J. Halpern, and J. Gehrke, “Least expected

cost query

optimization: what can we expect?,” in

PODS ’02: Proceedings of the

twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of

database systems, (New York, NY, USA), pp. 293-

302, ACM Press,

2002.

31. F. Chu, J. Y. Halpern, and P. Seshadri, “Least expected cost query

optimization: an exercise in utility,” in

PODS ’99: Proceedings of the

eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of

database systems, (New York, NY, USA), pp. 138-

147, ACM Press,

1999.

32. F. Tian and D. J. DeWitt, “Tuple routing strategies for distributed eddies,”

in VLDB ’03: Proceedings of 29th International Conference on Very

Large Data Bases, pp. 333-

344, Berlin, Germany: Morgan Kaufmann,

September 9-12 2003.

33.

G. Z. Ives, D. Florescu, M. Friedman, A. Levy and D. S. Weld. “An

Adaptative Query Execution System for Data Integration”. Proceedings of

the 1999 ACM SIGMOD Internati

onal conference on Management of

Data. May 31 –

Jun 03. Philadelphia, Pennsylvania, United States.1999.

pages 299 - 310.

34. G. Antoshenkov and M. Ziauddin, “Query processing and optimization in

Oracle Rdb,” The VLDB Journal, vol. 5, no. 4, pp. 229-237, 1996.

35. G. Graefe and K. Ward, “Dynamic query evaluation plans,” in SIGMOD

’89: Proceedings of the 1989 ACM SIGMOD international conference on

Management of data, (New York, NY, USA), pp. 358-

366, ACM Press,

1989.

36. G. Graefe and W. J. McKenna, “The Volcano optimizer generator:

Extensibility and efficient search,” in ICDE ’93: Proceedings of the Ninth

International Conference on Data Engineering, Vienna, Austria, pp. 209-

218, IEEE Computer Society, April 19-23 1993.

37. G. Graefe, “Query evaluation techniques for large databases,” ACM

Comput. Surv., vol. 25, no. 2, pp. 73-169, 1993.

38. G. Graefe, “The Cascades framework for query optimization,” IEEE Data

Engineering Bulletin, vol. 18, no. 3, pp. 19-29, 1995.

39. G. Graefe, R. Bunker, and S. Cooper, “Hash joins and hash teams in

Microsoft SQL Server,” in VLDB ’98: Proceedings of 24th International

Conference on Very Large Data Bases, pp. 86-

97, Morgan Kaufman,

August 24-27 1998.

40. H. Guo, P.-Å. Larson, R. Ramakrishnan, and J. Goldstein, “Relaxed

currency and consistency: how to say ”good enough” in SQL,” in

SIGMOD ’04:

Proceedings of the 2004 ACM SIGMOD international

conference on Management of data, (New York, NY, USA), pp. 815-826,

ACM Press, 2004.

41. H. Kaplan, E. Kushilevitz, and Y. Mansour, “Learn

ing with attribute

costs,” in STOC ’05: Proceedings of the thirty-

seventh annual ACM

symposium on Theory of computing, (New York, NY, USA), pp. 356-365,

ACM Press, 2005.

42. H. Simon and J. Kadane, “Optimal problem-solving search: All-or-none

solutions,” Artificial Intelligence, vol. 6, pp. 235-247, 1975.

43. H.-G. Li, S. Chen, J. Tatemura, D. Agrawal, K. S. Candan, and W.-P.

Hsiung, “Safety guarantee of continuous join queries over punctuated

data streams,” in

VLDB ’06: Proceedings of the 32nd international

conference on Very large data bases, pp. 19-

30, VLDB Endowment,

2006.

44. H.-T. Chou and D. J. DeWitt, “An evaluation of buffer management

strategies for relational database systems,” in VLDB ’85: Proceedings of

11th International Conference on Very Large Data Bases, pp. 127-141,

Stockholm, Sweden: Morgan Kaufmann, August 21-23 1985.

45. I. S. Mumick, S. J. Finkelstein, H. Pirahesh, and R. Ramakrishnan,

“Magic is relevant,” in

SIGMOD ’90: Proceedings of the 1990 ACM

SIGMOD international conference on Management of data, (New York,

NY, USA), pp. 247-258, ACM Press, 1990.

46. J. A. M. Filho. “AMDB –

An approach for Sharing Mobile Databases”.

Master Dissertation. Universidade de Fortaleza (UNIFOR). 2003.

47.

J. P. Haas, G. Luo, C. J. Ellmann and J. F. Naughton. “A Scalable Hash

Ripple Join Algorithm”. Proceeding

s of the ACM SIGMOD International

Conference on Management of Data. June 4-

6, Madison, Wisconsin,

USA. 2002.

48.

J. P. Macker and M. S. Corson. “Mobile Ad Hoc Networks and the IETF”,

Internet Engineering Task Force, Mobile Ad Hoc Network (MANET)

Working Group.

49. J. Chen, D. J. DeWitt, F. Tian, and Y. Wang, “NiagaraCQ: a scalable

continuous query system for internet databases,” in SIGMOD ’00:

Proceedings of the 2000 ACM SIGMOD international conference on

Management of data, (New York, NY, USA), pp. 379-390, ACM Press,

2000.

50. J. Claussen, A. Kemper, G. Moerkotte, K. Peithner, and M. Steinbrunn,

“Optimization and evaluation of disjunctive queries,”

IEEE Transactions

on Knowledge and Data Engineering, vol. 12, no. 2, pp. 238-260, 2000.

51. J. M. Hellerstein, “Optimization techniques for queries with expensive

methods,” ACM Transactions on Database Systems, vol. 23, no. 2,

pp. 113-157, 1998.

52. J. M. Hellerstein, R. Avnur, A. Chou, C. Hidber, C. Olston, V. Raman,

T. Roth, and P. J. Haas, “Interactive data analysis: The Control project,”

Computer, vol. 32, no. 8, pp. 51-59, 1999.

53. J. Naughton, D. DeWitt, D. Maier, A. Aboulnaga, J. Chen, L. Galanis,

J. Kang, R. Krishnamurthy, Q. Luo, N. Prakash, R. Ramamurthy,

J. Shanmugasundaram, F. Tian, K. Tufte, S. Viglas, Y. Wang, C. Zhang,

B. Jackson, A. Gupta, and R. Chen, “The niagara internet query system,”

IEEE Data Engineering Bulletin, June 2001.

54. J. Shanmugasundaram, K. Tufte, D. J. DeWitt, J. F. Naughton, and

D. Maier, “Architecting a network query engine for prod

ucing partial

results,” in ACM SIGMOD Workshop on the Web (WebDB) 2000, Dallas,

TX, pp. 17-22, 2000.

55. J.-H. Hwang, M. Balazinska, A. Rasin, U. Cetintemel, M. Stonebraker,

and S. Zdonik, “High-

availability algorithms for distributed stream

processing,” in

ICDE ’05: Proceedings of the 21st International

Conference on Data Engineering (ICDE’05)

, (Washington, DC, USA),

pp. 779-790, IEEE Computer Society, 2005.

56. K. Loudon. Mastering Algorithms in C. O’Reilly, New York. USA. 1999.

ISBN: 1-56592-453-3.

57. K. Gao, S. Harizopoulos, I. Pandis, V. Shkapenyuk, and A. Ailamaki,

“Simultaneous pipelining in QPipe: Exploiting work sharing opportunities

across queries,” in ICDE ’06: Proceedings of the 22nd International

Conference on Data Engineering (ICDE’06)

, (Washington, DC, USA),

p. 162, IEEE Computer Society, 2006.

58. K. Munagala, S. Babu, R. Motwani, and J.

Widom, “The pipelined set

cover problem.,” in

ICDT ’05: Proceedings of the 10th International

Conference, Edinburgh, UK, pp. 83-98, 2005.

59. K. Munagala, U. Srivastava, and J. Widom, “Optimization of continuous

queries with shared expensive filters,” in PODS ’07: Proceedings of the

twenty-sixth ACM SIGMOD-SIGACT-

SIGART symposium on Principles

of database systems, (New York, NY, USA), pp. 215-

224, ACM Press,

2007.

60. L. Amsaleg, M. J. Franklin, A. Tomasic, and T. Urhan, “Scrambling query

plans to cope with unexpected delays,” in Proceedings of the Fourth

International Conference on Parallel and Distributed Information

Systems, Miami Beach, FL, pp. 208-219, IEEE Computer Society,

December 18-20 1996.

61. L. Bellatreche, K. Karlapalem, M. K. Mohania, and M. Schneider, “What

can partitioning do for your data warehouses and data marts?,” in IDEAS

’00:

Proceedings of the 2000 International Symposium on Database

Engineering & Applications, (Washington, DC, USA), pp. 437-446, IEEE

Computer Society, 2000.

62. L. Ding and E.

A. Rundensteiner, “Evaluating window joins over

punctuated streams,” in

CIKM ’04: Proceedings of the thirteenth ACM

international conference on Information and knowledge management,

(New York, NY, USA), pp. 98-107, ACM Press, 2004.

63. L. F. Mackert and G.

M. Lohman, “R* optimizer validation and

performance evaluation for distributed queries,” in

VLDB ’86:

Proceedings of the 12th Internationa

l Conference on Very Large Data

Bases, (San Francisco, CA, USA), pp. 149-

159, Morgan Kaufmann

Publishers Inc., 1986.

64. L. Getoor, B. Taskar, and D. Koller, “Selectivity estimation using

probabilistic models,” in SIGMOD ’01: Proceedings of the 2001 ACM

SIGMOD international conference on Management of data, (New York,

NY, USA), pp. 461-472, ACM Press, 2001.

65. L. M. Haas, J. C. Freytag, G. M. Lohman, and H. Pirahesh, “Extensible

query processing in starburst,” in

SIGMOD ’89: Proceedings of the 1989

ACM SIGMOD international conference on Management of data, (New

York, NY, USA), pp. 377-388, ACM Press, 1989.

66. L. Raschid and S. Y. W. Su, “A parallel processing strategy for evaluating

recursive queries,” in VLDB ’86: Proceedings of the 12th International

Conference on Very Large Data Bases, pp. 412-419, Morgan Kaufmann

Publishers Inc., 1986.

67. Large Data Bases, (San Francisco, CA, USA), pp. 128-137, Morgan

Kaufmann Publishers Inc., 1986.

68. M. A. Shah, J. M. Hellerstein, and E. Brewer, “Highly available, fault-

tolerant, parallel dataflows,” in SIGMOD ’04: Proceedings of the 2004

ACM SIGMOD international conference on Management of data, (New

York, NY, USA), pp. 827-838, ACM Press, 2004.

69. M. A. Shayman and E. Fernandez-Gaucherand, “Risk-sensitive decision-

theoretic diagnosis,” IEEE Transactions on Automatic Control, vol. 46,

pp. 1166-1171, 2001.

70. M. Balazinska, H. Balakrishnan, S. Madden, and M. Stonebraker, “Fault-

tolerance in the Borealis distributed stream processing system,” in

SIGMOD ’05: Proceedings of the 2005 ACM

SIGMOD international

conference on Management of data, (New York, NY, USA), pp. 13-24,

ACM Press, 2005.

71. M. Herbster and M. K. Warmuth, “Tracking the best expert,” Machine

Learning, vol. 32, pp. 151-178, August 1998.

72. M. Kearns and S. Singh, “Near-optimal reinforcement learning in

polynomial time,” Machine Learning, vol. 49, pp. 260-268, November

2002.

73. M. Kutsch, P. J. Haas, V. Markl, N. Megiddo, and T. M. Tran, “Integrating

a maximum-entropy cardinality estimator into DB2 UDB,” in EDBT ’06:

Proceedings of the 10th International Conference on Extending

Database Technology, 2006.

74. M. Stillger, G. Lohman, V. Markl, and M. Kandil, “LEO - DB2’s LEarning

Optimizer,” in

VLDB ’01: Proceedings of 27th International Conference

on Very Large Data Bases, Morgan Kaufmann, September 11-14 2001.

75. M. Stonebraker, E. Wong, P. Kreps, and G. Held, “The design and

implementation of Ingres,” ACM Transactions on Database Systems,

vol. 1, no. 3, pp. 189-222, 1976.

76. M. Templeton, H. Henley, E. Maros, and D. J.

V. Buer, “InterViso:

Dealing with the complexity of federated database access,” The VLDB

Journal, vol. 4, no. 2, 1995.

77. M.S. Chen, M. L. Lo, P. S. Yu, H. C. Young, Using segmented right-deep

trees for the execution of pipelined hashjoins.," in Proc 18th VLDB Conf,

Vancouver, Canada, August 23-27,1992.

78. Mokbel, Lu et al.”Hash-merge Join: A Non-

blocking Join algorithm for

Producing Fast and Early Join Results”. Proceedings of the International

Conference on Data Engineering, 2004.

79. N. Bruno and S.

Chaudhuri, “Exploiting statistics on query expressions

for optimization,” in

SIGMOD ’02: Proceedings of the 2002 ACM

SIGMOD international conference on Management of data, (New York,

NY, USA), pp. 263-274, ACM Press, 2002.

80. N. Bruno, S. Chaudhuri, and L. Gravano, “STHoles: a multidimensional

workload-aware histogram,” in

SIGMOD ’01: Proceedings of the 2001

ACM SIGMOD international conference on Management of data, (New

York, NY, USA), pp. 211-222, ACM Press, 2001.

81. N. Kabra and D. J. DeWitt, “Efficient mid-query re-optimization of sub-

optimal query execution plans,” in SIGMOD ’98: Proceedings of the 1998

ACM SIGMOD international conference on Management of data, (New

York, NY, USA), pp. 106-117, ACM Press, 1998.

82. N. Polyzotis, “Selectivity-based partitioning: A divide-and-union paradigm

for effective query optimization,” in CIKM ’05: Proceedings of the 14th

ACM International Conference on Information and knowledge

management, pp. 720-727, New York, NY: ACM Press, 2005.

83.

NIST. Dictionary of Algorithms and Data Structures.

http://www.nist.gov/dads/.

84. O. Etzioni, S. Hanks, T. Jiang, R. M. Karp, O. Madani, and O. Waarts,

“Efficient information gathering on the internet,” in FOCS ’96:

Proceedings of the 37th Annual Symposium on Foundations of Computer

Science, pp. 234-243, IEEE Computer Society, 1996.

85.

P. J. Haas and J. M. Hellerstein. “Ripple Joins for Online Aggregation”.

Proceedings of the 1999 ACM SIGMOD International Conference on

Management of Data. May, 31 – Jun, 03. Philadelphia, Pennsylvania,

United States.1999. pages 287-298.

86. P. A. Tucker and D. Maier, “Exploiting punctuation semantics in data

streams,” in

ICDE ’02: Proceedings of the 18th International Conference

on Data Engineering, (Washington, DC, USA), p.

279, IEEE Computer

Society, 2002.

87. P. Bizarro and D. DeWitt, “Adaptive and robust query processing with

SHARP,” Tech. Rep. 1562, University of Wisconsin - Madison, CS Dept.,

2006.

88. P. Bizarro, S. Babu, D. DeWitt, and J. Widom, “Content-based routing:

different plans for different data,” in VLDB ’05: Proceedings of the 31st

international conference on Very large data bases, pp. 757-768, VLDB

Endowment, 2005.

89. P. D. Bra and J.

Paredaens, “Horizontal decompositions and their impact

on query solving,” SIGMOD Rec., vol. 13, no. 1, pp. 46-50, 1982.

90. P. G. Bizarro, Adaptive Query Processing: Dealing with Incomplete and

Uncertain Statistics. PhD thesis, University of Wisconsin - Madison,

2006.

91. P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G.

Price, “Access path selection in a relational database management

system,” in

SIGMOD ’79: Proceedings of the 1979 ACM SIGMOD

International Conference on Management of Data, 1979.

92. P. J. Haas and J. M. Hellerstein, “Ripple joins for online aggregation,” in

SIGMOD ’99: Proceedings of the 1999 ACM SIGMOD international

conference on Management of data, (New York, NY, USA), pp. 287-298,

ACM Press, 1999.

93. P. Roy, S. Seshadri, S. Sudarshan, and S.

Bhobe, “Efficient and

extensible algorithms for multi query optimization,” in

SIGMOD ’00:

Proceedings of the

2000 ACM SIGMOD international conference on

Management of data, (New York, NY, USA), pp. 249-

260, ACM Press,

2000.

94. P. Seshadri, H. Pirahesh, and T. Y. C. Leung, “Complex query

decorrelation,” in ICDE ’96: Proceedings of the Twelfth International

Conference on Data Engineering, New Orleans, LA, pp. 450-458,

February 26-March 1 1996.

95. P. Seshadri, J. M. Hellerstein, H. Pirahesh, T. Y.

C. Leung,

R. Ramakrishnan, D. Srivastava, P. J. Stuckey, and S. Sudarshan,

“Cost-based optimization for magic: Algebra and implementation,” in

SIGMOD ’96: Proceedings of the 1996 ACM

SIGMOD International

Conference on Management of Data, pp. 435-446, ACM Press, 1996.

96. R. Avnur and J. M. Hellerstein, “Eddies: continuously adaptive query

processing,” in SIGMOD ’00: Proceedings of the 2000 ACM SIGMOD

international conference on Management of data, (New York, NY, USA),

pp. 261-272, ACM Press, 2000.

97. R. Goldman and J. Widom, “DataGuides: Enabling query formulation and

optimization in semistructured databases,” in VLDB ’97: Proceedings of

23rd International Conference on Very Large Data Bases, pp. 436-445,

Athens, Greece: Morgan Kaufman, August 25-29 1997.

98. R. Goldman and J.

Widom, “WSQ/DSQ: a practical approach for

combined querying of databases and the web,” in

SIGMOD ’00:

Proceedings of the 2000

ACM SIGMOD international conference on

Management of data, (New York, NY, USA), pp. 285-

296, ACM Press,

2000.

99. R. H. Arpaci-Dusseau, E. Anderson, N. Treuhaft, D. E. Culler, J. M.

Hellerstein, D. Patterson, and K. Yelick, “Cluster I/O with River: making

the fast case common,” in IOPADS ’99:

Proceedings of the sixth

workshop on I/O in parallel and distributed systems, (New York, NY,

USA), pp. 10-22, ACM Press, 1999.

100. R. Krishnamurthy, H. Boral, and C. Zaniolo, “Optimization of

nonrecursive queries,” in VLDB ’86: Proceedings of the 12th International

Conference on Very

101. R. L. Cole and G.

Graefe, “Optimization of dynamic query evaluation

plans,” in

SIGMOD ’94: Proceedings of the 1994 ACM SIGMOD

International Conference on Management of Data, pp. 150-160,

Minneapolis, MN: ACM Press, May 24-27 1994.

102. R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar,

G. Manku, C. Olston, J. Rosenstein, and R. Varma, “Query processing,

resource management, and approximation in a data stream management

system,” in

CIDR ’03: First Biennial Conference on Innovative Data

Systems Research, Asilomar, CA, 2003.

103. S. Babu and J. Widom, “Continuous queries over data streams,”

SIGMOD Rec., vol. 30, no. 3, pp. 109-120, 2001.

104. S. Babu and J.

Widom, “StreaMon: an adaptive engine for stream

query processing,” in

SIGMOD ’04: Proceedings of the 2004 ACM

SIGMOD international conference on Management of data, (New York,

NY, USA), pp. 931-932, ACM Press, 2004.

105. S. Babu and P.

Bizarro, “Adaptive query processing in the looking

glass,” in CIDR ’05: Sec

ond Biennial Conference on Innovative Data

Systems Research, pp. 238-249, Asilomar, CA, 2005.

106. S. Babu, K. Munagala, J. Widom, and R. Motwani, “Adaptive caching

for continuous queries,” in

ICDE ’05: Proceedings of the 21st

International Conference on Data Engineering (ICDE’05), (Washington,

DC, USA), pp. 118-129, IEEE Computer Society, 2005.

107. S. Babu, P. Bizarro, and D. DeWitt, “Proactive re-optimization,” in

SIGMOD ’05: Proceedings of the 2005 ACM SIGMOD international

conference on Management of data, Baltimore, MD, pp. 107-118, New

York, NY: ACM Press, June 14-16 2005.

108. S. Babu, R. Motwani, K. Munagala, I. Nishizawa, and J. Widom,

“Adaptive ordering of pipelined stream filters,” in SIGMOD ’04:

Proceedings of the 2004 ACM SIGMOD international conference on

Management of data, (New York, NY, USA), pp. 407-

418, ACM Press,

2004.

109. S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M.

Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, V. Raman, F. Reiss,

and M. A. Shah, “TelegraphCQ: Continuous dataflow processing for an

uncertain world,” in

CIDR ’03: First Biennial Conference on Innovative

Data Systems Research, Asilomar, CA, 2003.

110. S. Chaudhuri and K. Shim, “Including group-by in query optimization,”

VLDB ’94: Proceedings of the 20th International Conference on Very

Large Data Bases, (San Francisco, CA, USA), pp. 354-

366, Morgan

Kaufmann Publishers Inc., 1994.

111. S.

Chaudhuri, “An overview of query optimization in relational

systems,” in PODS ’98: Proceedings of the seventeenth ACM SIGACT-

SIGMOD-SIGART symposium on Principles of database systems, (New

York, NY, USA), pp. 34-43, ACM Press, 1998.

112. S. Chaudhuri, U. Dayal, and T.

W. Yan, “Join queries with external

text sources: Execution and optimization techniques,” in SIGMOD ’95:

Proceedings

of the 1995 ACM SIGMOD International Conference on

Management of Data, pp. 410-422, ACM Press, May 26 1995.

113. S. Ewen, H. Kache, V. Markl, and V. Raman, “Progressive query

optimization for federated queries,” in EDBT ’06: Proceedings of the 10th

International Conference on Extending Database Technology, pp. 847-

864, 2006.

114. S. Ganguly, “Design and analysis of parametric query optimization

algorithms,” in VLDB ’98: Proce

edings of the 24th International

Conference on Very Large Data Bases, pp. 228-238, Morgan Kaufmann,

August 24-27 1998.

115. S. Ganguly, W. Hasan, and R. Krishnamurthy, “Query optimization for

parallel execution,” in

SIGMOD ’92: Proceedings of the 1992 ACM

SIGMOD international conference on Management of data, (New York,

NY, USA), pp. 9-18, ACM Press, 1992.

116. S. Madden, M. Shah, J. M. Hellerstein, and V. Raman, “Continuously

adaptive continuous queries over streams,” in SIGMOD ’02: Proceedings

the 2002 ACM SIGMOD International Conference on Management of

Data, pp. 49-60, ACM Press, 2002.

117. S. V. U. M. Rao, Parametric Query Optimization: A Non-Geometric

Approach. Master’s thesis, IIT Kanpur, 1999.

118. S. Viglas, J. F. Naughton, and J. Burger, “Maximizing the output rate

of multi-way join queries over streaming information sources,” in VLDB

’03:

Proceedings of the 29th International Conference on Very Large

Data Bases, Berlin, Germany: Morgan Kaufmann, September 9-12 2003.

119.

Soares, M. and Brayner, A. “MXQUERY: An XQuery like

Multidatabase Language”. Submitted for publication. 2003.

120. T. Urhan and J. M. Franklin. “XJoin: A Reactively-Scheduled

Pipelined Join Operator”. IEEE Data Engineering Bulletin. Vol. 23 No. 2.

June, 2000. pages 27-33.

121. T. Urhan and M. J. Franklin. “XJoin: A Reactively-Scheduled

Pipelined Join Operator”. IEEE Data Engineering Bulletin. Vol. 23 No. 2.

June, 2000. pages 27-33.

122. T. Ibaraki and T. Kameda, “On the optimal nesting ord

er for

computing N-relational joins,” ACM Transactions on Database Systems,

vol. 9, no. 3, pp. 482-502, 1984.

123. T. K. Sellis, “Multiple-query optimization,”

ACM Trans. Database

Syst., vol. 13, no. 1, pp. 23-52, 1988.

124. T. Legler, W. Lehner, and A.

Ross, “Data mining with the SAP

NetWeaver BI accelerator,” in

VLDB ’06: Proceedings of the 32nd

international conference on Very large data bases, pp. 1059-1068, VLDB

Endowment, 2006.

125. T. Urhan and M. J. Franklin, “XJoin: a reactively-scheduled pipelined

join operator,” IEEE Data Engineering Bulletin, vol. 23, no. 2, pp. 27-33,

2000.

126. T. Urhan, M. J. Franklin, and L. Amsaleg, “Cost b

ased query

scrambling for initial delays,” in

SIGMOD ’98: Proceedings of the 1998

ACM SIGMOD International Conference on Management of Data,

pp. 130-141, Seattle, WA: ACM Press, June 2-4 1998.

127. U. Feige, L. Lovász, and P. Tetali, “Approximating min-

sum set

cover,” Algorithmica, vol. 40, no. 4, pp. 219-234, 2004.

128. U. Srivastava, K. Munagala, and J. Widom, “Operator placement for

in-network stream query processing,” in PODS ’05: Proceedings of the

Twenty-Fourth ACM SIGMOD-SIGACT-

SIGART Symposium on

Principles of Database Systems, pp. 250-258, 2005.

129. U. Srivastava, K. Munagala, J. Widom, and R.

Motwani, “Query

optimization over web services,” in

VLDB ’06: Proceedings of the 32nd

international conference on Very large data bases, pp. 355-366, VLDB

Endowment, 2006.

130. V. G. V. Prasad, Parametric Query O

ptimization: A Geometric

Approach. Master’s thesis, IIT Kanpur, 1999.

131. V. Markl, V. Raman, D. Simmen, G. Lohman, H.

Pirahesh, and

M. Cilimdzic, “Robust query

processing through progressive

optimization,” in SIGMOD ’04: Proceedings

of the 2004 ACM SIGMOD

international conference on Management of data, (New York, NY, USA),

pp. 659-670, ACM Press, 2004.

132. V. Raman and J.

M. Hellerstein, “Partial results for online query

processing,” in SIG

MOD ’02: Proceedings of the 2002 ACM SIGMOD

International Conference on Management of Data, pp. 275-

286, ACM

Press, 2002.

133. V. Raman, A. Deshpande, and J.

M. Hellerstein, “Using state

modules for adaptive query processing.,” in ICDE ’03: Proceedings of the

19th International Conference on Data Engineering, Bangalore, India,

pp. 353-364, 2003.

134. V. Raman, B. Raman, and J.

M. Hellerstein, “Online dynamic

reordering for interactive data processing,” in VLDB ’99: Proceedings of

the 25th International Conference on Very Large Data Bases, pp. 709-

720, Edinburgh, Scotland: Morgan Kaufmann, 1999.

135. W. Kim, “On optimizing an SQL-like nested query,” ACM Transactions

on Database Systems, vol. 7, no. 3, pp. 443-469, 1982.

136. W.-S. Li, V. S. Batra, V. Raman, W. Han, and I. Narang, “QoS-based

data access and placement for federated systems,” in

VLDB ’05:

Proceedings of the 31st internatio

nal conference on Very large data

bases, pp. 1358-1362, VLDB Endowment, 2005.

137. Y. E. Ioannidis and S.

Christodoulakis, “On the propagation of errors

in the size of join results,” in SIGMOD ’91: Proceedings of the 1991 ACM

SIGMOD international conference on Management of data, (New York,

NY, USA), pp. 268-277, ACM Press, 1991.

138. Y. E. Ioannidis, “Query optimization,” ACM Computing Surveys,

vol. 28, no. 1, pp. 121-123, 1996.

139. Y. E. Ioannidis, R. T. Ng, K. Shim, and T. K. Sellis, “Parametric query

optimization,” The VLDB Journal, vol. 6, no. 2, pp. 132-151, 1997.

140. Y. Zhu, E. A. Rundensteiner, and G. T. Heineman, “Dynamic plan

migration for continuous queries over data streams,” in SIGMOD ’04:

Proceedings of the 2004 ACM SIGMOD international conference on

Management of data, (New York, NY, USA), pp. 431-442, ACM Press,

2004.

141. Z. G. Ives and N. E. Taylor, “Sideways information passing for push-

style query processing,” Tech. Rep. MS-CIS-07-

14, University of

Pennsylvania, 2007.

142. Z. G. Ives, A. Y. Halevy, and D.

S. Weld, “Adapting to source

properties in processing data integration queries,” in

SIGMOD ’04:

Proceedings of the 2004

ACM SIGMOD international conference on

Management of data, (New York, NY, USA), pp. 395-

406, ACM Press,

2004.

143. Z. G. Ives, D. Florescu, M. Friedman, A. Levy, and D. S. Weld, “An

adaptive query execution system for data integration,” in SIGMOD ’99:

Proceedings of the 1999 ACM SIGMOD international conference on

Management of data, (New York, NY, USA), pp. 299-310, ACM Press,

1999.

144. Z. G. Ives, Efficient Query Processing for Data Integration

. PhD

thesis, University of Washington, August 2002.

145. Z. Zabret, Dynamic Query Scheduling in Data Integration Systems,

Proceedings of the 16th International Conference on Data Engineering,

p.425, February 28-March 03, 2000.

Livros Grátis
( http://www.livrosgratis.com.br )
 
Milhares de Livros para Download:
 
Baixar livros de Administração
Baixar livros de Agronomia
Baixar livros de Arquitetura
Baixar livros de Artes
Baixar livros de Astronomia
Baixar livros de Biologia Geral
Baixar livros de Ciência da Computação
Baixar livros de Ciência da Informação
Baixar livros de Ciência Política
Baixar livros de Ciências da Saúde
Baixar livros de Comunicação
Baixar livros do Conselho Nacional de Educação - CNE
Baixar livros de Defesa civil
Baixar livros de Direito
Baixar livros de Direitos humanos
Baixar livros de Economia
Baixar livros de Economia Doméstica
Baixar livros de Educação
Baixar livros de Educação - Trânsito
Baixar livros de Educação Física
Baixar livros de Engenharia Aeroespacial
Baixar livros de Farmácia
Baixar livros de Filosofia
Baixar livros de Física
Baixar livros de Geociências
Baixar livros de Geografia
Baixar livros de História
Baixar livros de Línguas

Baixar livros de Literatura
Baixar livros de Literatura de Cordel
Baixar livros de Literatura Infantil
Baixar livros de Matemática
Baixar livros de Medicina
Baixar livros de Medicina Veterinária
Baixar livros de Meio Ambiente
Baixar livros de Meteorologia
Baixar Monografias e TCC
Baixar livros Multidisciplinar
Baixar livros de Música
Baixar livros de Psicologia
Baixar livros de Química
Baixar livros de Saúde Coletiva
Baixar livros de Serviço Social
Baixar livros de Sociologia
Baixar livros de Teologia
Baixar livros de Trabalho
Baixar livros de Turismo