Download PDF
ads:
FUNDAÇÃO EDSON QUEIROZ
UNIVERSIDADE DE FORTALEZA-UNIFOR
Eriko Werbet de Oliveira Araújo
AJAX: An Adaptive Join Algorithm for Extreme
Restrictions
Fortaleza
2008
ads:
Livros Grátis
http://www.livrosgratis.com.br
Milhares de livros grátis para download.
ii
FUNDAÇÃO EDSON QUEIROZ
UNIVERSIDADE DE FORTALEZA-UNIFOR
Eriko Werbet de Oliveira Araújo
AJAX: An Adaptive Join Algorithm for Extreme
Restrictions
Advisor: Angelo Roncalli Alencar Brayner, Dr-Ing.
Fortaleza
2008
A DISSERTATION SUBMITTED TO THE SCIENCE
AND
TECHNOLOGY CENTER OF THE UNIVERSITY
OF
FORTALEZA IN PARTIAL FULFILLMENT OF THE
REQUIREMENTS FOR THE DEGREE OF
M
ASTER OF
APPLIED INFORMATICS.
ads:
iii
AJAX: Adaptive Join Algorithm for Extreme
Restrictions
Examination Board
Angelo Roncalli de Alencar Brayner, Dr-Ing.
(
Advisor)
Pedro Porfirio Muniz Farias, D.Sc.
Javam de Castro Machado, Docteur.
iv
ARAÚJO, ERIKO WERBET DE OLIVEIRA
AJAX: Adaptive Join Algorithm for Extreme Restrictions
[Fortaleza] 2008
xix, 50 p., 29,7cm (INFORTICA/UNIFOR, Bancos de Dados, 2008)
DissertaçãoUniversidade de Fortaleza, Informática
1. Bancos de Dados
2. Computação Móvel
3. Processamento Adaptativo de Consultas
4. Operadores Adaptativos de Consulta
5. Algoritmos e Estruturas de Dados
I. INFORMÁTICA/UNIFOR II. TÍTULO (série)
v
Every man is the architect of his own fortune.
Sallust (86 BC - 34 BC)
vi
ACKNOWLEDGEMENTS
I would like to thank my advisor for his commitment and patience in the writing
of this work, and I would like to salute my family and friends who have
consistently helped me surpass my dark hours.
ABSTRACT
In a mobile ad hoc network (MANET), the nodes represent mobile
computers in which database systems may reside. Such mobile computers
(nodes) are free to move arbitrarily. In such an environment, we may have a
collection of autonomous, distributed, heterogeneous and mobile databases
(denoted Mobile Database Community or MDBC). Thus, each database user (in
a mobile host) can access databases belonging to an MDBC through the
MANET. Traditional query processing techniques fail to support the access to
databases in an MDBC, since data delivery rate in such an environment
becomes unpredictable and mobile hosts may suffer from limited available main
memory to process some query operators (e.g., join). To react to those events,
this dissertation presents an adaptive join operator, called AJAX, for processing
join operations on mobile databases in an MDBC. AJAX ensures: (i)
incremental production of results as soon as the data become available; (ii)
progress of the query processing even when the delivery of data is blocked,
and; (iii) reaction to situations of memory limitation on the execution of operator.
Cost estimation and experimental results are presented to evidence that AJAX
is an effective solution for processing join operation on mobile databases.
viii
RESUMO
Uma Comunidade de Bancos de Dados Móveis (MDBC) representa uma
coleção de bancos de dados móveis e autônomos. Em tal contexto, cada
usuário de um sistema de banco de dados participante da comunidade pode
acessar os outros bancos de dados através de uma infra-estrutura de
comunicação sem fio. A utilização de técnicas tradicionais para processamento
de consultas no acesso à banco de dados em uma MDBC tem se mostrado
ineficiente devido à imprevisibilidade na taxa de acesso a bancos de dados da
comunidade e a limitação de memória disponível nos dispositivos móveis para
a execução de certos operadores, como a junção, por exemplo. A
imprevisibilidade é provocada por quedas constantes na comunicação entre os
membros de uma MDBC (cenário comum em redes sem fio) e limitação de
processamento de certas unidades móveis, atrasando a entrega de tuplas.
Para reagir a estes eventos, é apresentado neste trabalho um operador de
junção para processamento de consultas em bancos de dados móveis,
denominado AJAX. O operador proposto garante as seguintes propriedades: (i)
produção incremental de resultados à medida que os dados são
disponibilizados; (ii) continuidade no processamento da consulta mesmo que a
entrega dos dados esteja bloqueada, e; (iii) reação a situações de limitação de
memória durante a execução do operador.
9
Table of Contents
Table of Contents ............................................................................................. 9
1. Introduction ........................................................................................... 10
1.1. Motivation ............................................................................................ 10
1.2. Goals and Scope ................................................................................. 12
1.3. Structure .............................................................................................. 13
2. Database Systems and Mobility .......................................................... 14
2.1. Introduction ......................................................................................... 14
2.2. The Mobile Computing Environment ................................................... 14
2.3. Ad hoc networks .................................................................................. 17
2.4. The Introduction of Mobility on Databases .......................................... 18
2.5. Query Processing on Mobile Databases ............................................. 18
3. Adaptive Join Operators ...................................................................... 21
3.1. Introduction ......................................................................................... 21
3.2. XJoin ................................................................................................... 22
3.3. Ripple Join .......................................................................................... 23
3.4. Double Pipeline Hash Join (Tukwila Implementation) ......................... 24
3.5. MobiJoin .............................................................................................. 25
3.6. Adaptive Symmetric Hash Join ........................................................... 26
3.7. Hash-Merge Join ................................................................................. 27
4. AJAX Adaptive Join Algorithm for Extreme Restrictions .............. 28
4.1. Introduction ......................................................................................... 28
4.2. First Phase .......................................................................................... 29
4.3. Probing ................................................................................................ 31
4.3.1. Reacting to Memory Overflow ....................................................... 33
4.4. Second Phase ..................................................................................... 34
4.5. Cost Estimation ................................................................................... 36
5. Experimental Results and Analysis .................................................... 39
5.1. Introduction ......................................................................................... 39
5.2. Standalone Tests ................................................................................ 39
5.3. Comparative Tests .............................................................................. 41
5.4. Analysis of the Related Work .............................................................. 44
6. Conclusions .......................................................................................... 48
References ...................................................................................................... 51
10
1. Introduction
1.1. Motivation
Advances in portable computing devices and wireless network technology have
led to the development of a new computing paradigm, called mobile computing.
This new paradigm supports that users carrying portable devices are able to
access services provided through a wireless communication infrastructure,
regardless of their physical location or movement patterns.
The database technology has been impacted by the mobile computing
paradigm. For example, database systems may reside on mobile computers
which are nodes of a mobile ad hoc network (MANET) [48]. Since a MANET
has a topology which may change randomly and rapidly at unpredictable time, a
dynamic collection of autonomous, heterogeneous and mobile databases
(denoted Mobile Database Community or MDBC) can be opportunistically
formed [2]. An MDBC models the notion of multidatabases whose members
(local databases) can move across different locations.
Sharing information among multiple autonomous, distributed and mobile
databases in MDBCs has emerged as a strategic requirement for several
applications. For example, in battlefields, we may have several units of a
coalition force (from different countries) moving around different geographical
regions, but having a common goal. The commander of each unit has access to
local database systems (residing in a mobile computer installed in a military
vehicle) containing information about his/her own unit and some information
about the enemy forces. Thus, the databases of the multiple units of the
coalition force are distributed, autonomous and mobile. Furthermore, those
databases might be heterogeneous as well. In order to take tactical decisions,
each commander may have to access the databases of others units of the
coalition to get information about them (their strength w.r.t. type and number of
weapons and number of components, supplies and resources as well).
Besides addressing the problem of integrating heterogeneous databases,
sharing mobile databases in an MDBC requires to address the problem of
processing queries over a variable number of mobile databases as well. This is
because query engines operate in unpredictable and changeable context when
11
processing queries on mobile databases. In other words, traditional distributed
query processing techniques and operators are inadequate to process queries
on mobile databases in MDBCs due to the following factors:
(i) Unpredictable data-arrival rates. Query processing in MDBCs may
suffer from unpredictable data delivery rates, which can be caused by
initial delays and bursty data arrivals. In turn, initial delays and bursty
data arrivals are provoked by transiently disconnections from the
network. Thus, even if the query engine has chosen the most efficient
execution plan, low data delivery rates makes that plan inefficient
during the query execution;
(ii) Limited memory capacity. Mobile hosts may present limited available
main memory to process operations, such as joins between two
relations (recall that mobile hosts may have severe hardware
restrictions due to their size). Such a limitation can cause memory
overflow events which may induce to a high disk-access rate,
provoking a prohibitive cost to execute those operators;
(iii) Use of stop-and-go operators. Traditional query processing
techniques implement relational operators in a stop-and-go way, since
they are implemented for having more than one phase, executed
sequentially (for example, the build and probe phases of the
conventional hash join). Hence, delay on initial stages (which may
frequently occur in MDBCs) implies performance degradation on the
execution of such operators;
(iv) Absence of statistics. In MDBCs, statistics about mobile databases
are highly likely to be unreliable or inexistent, since mobile databases
are autonomous.
For that reason, new query operators have been introduced for processing
queries in MDBCs. Those operators are adaptive in the sense that they should
be able to react to events which may considerably increase response time when
accessing mobile databases.
12
1.2. Goals and Scope
This work proposes an adaptive join algorithm called AJAX. The algorithm
implements the relational algebra join operator, using hashing and pipelining
techniques to decrement the overall response time in a query involving join
operations [8, 83]. AJAX is the acronym of Adaptive Join Algorithm for eXtreme
restrictions. We claim that AJAX is an adaptive query operator [9], since it is
able to react to events which may considerably increase response time when
executing queries over mobile databases. AJAX ensures the following
properties:
(v) Incremental production of results as soon as the data become
available. Conventional join operators just produce results when at
least one involved relation is completely evaluated. Joins executed
over distributed data may take a long time until users begin to receive
the result from the join operation. AJAX uses a pipeline technique in
order to incrementally provide results (to users or to another query
operator) as soon as they become available. By doing this, it is
possible, for instance, to users (observing the progress in the query
processing) to decide to stop the execution on-the-fly, since the
results obtained are already enough to make a decision;
(vi) Progress of the query processing even when the delivery of data is
blocked. The idea is to overlap delays by executing other tasks when
tuple delivery (from tables involved in join operation) is blocked;
(vii) Reaction to memory-overflow events. When a memory overflow
occurs AJAX triggers a memory-management policy which consists of
flushing part of the data from main memory to disk;
(viii) Prevention of unnecessary disk access. Since it might be necessary to
flush data to disk (during AJAX’s execution), incorrect (like duplicates)
results may be produced. In order to avoid incomplete results, AJAX
need to read data flushed to disk more than once. AJAX is designed
to reduce such disk accesses.
The algorithm has been implemented using the Java language in a
simulation environment designed to restrict the algorithm like the aimed
13
platforms, in order to validate its behavior and main properties. Standalone tests
and comparative tests with other top-notch algorithms have also been run,
measuring its performance and efficiency; the experimental results can be
found at chapter five.
1.3. Structure
This dissertation is structured as follows: chapter 2 discusses fundamental
concepts of the mobility paradigm; in chapter 3 classical and conventional
algorithms are discussed; chapter 4 presents the formal specification of the
AJAX algorithm, while experimental results and performance analysis are
presented in chapter 5; chapter 6 holds the conclusions and future work.
14
2. Database Systems and Mobility
2.1. Introduction
Mobility is already part of the modern computing environment; In order to
illustrate such an assertion, one can easily observe that laptops, cell phones
and many other mobile devices can be seen everywhere.
DBMS (Databases Management Systems) are complex systems, and
adapting it to run in a mobile device is not a trivial task, since mobility impacts
the database technology in many points. Query processing and transaction
management are examples of topics of the database technology impacted by
the introduction of mobility in such systems. In fact, new models and new
techniques must be proposed in order to have databases sharing their data on
mobile devices. The fact that mobile devices have got restricted computing
resources, like main memory, processing power, storage and network
bandwidth directly contribute to the inability of running conventional DMBS in
those devices. In this chapter we discuss and analyze the impact of mobility in
DBMS, emphasizing query processing, which is the focus of this work.
This chapter is structured as follows:
Section 2.2. Definition of the mobile computing environment, with its
components and communication infrastructure.
Section 2.3. Ad hoc networks and its characteristics.
Section 2.4. Physical and logical mobility are discussed. Subsections
present software agents and mobility classification under
database technology perspective.
Section 2.5. Enumeration of the key challenges for the proposal and
construction of mobile databases.
2.2. The Mobile Computing Environment
Mobile computing is a generic term describing your ability to use unplugged
computing technology, that is not physically connected (by wires, for example),
or in remote or mobile (non static) environments [5, 19, 48]. Users equipped
with such devices, have the ability to change their physical location in space
15
arbitrarily, and their devices would still continue connected and operational.
Those devices contain many computing resources like microprocessors,
memory, storage media and communication interfaces. In fact, they can perform
almost any task that a desktop computer could perform, anywhere, at anytime.
A mobile computing environment is formed by mobile devices and mobility
support stations (MSS) [3, 5]. The mobile devices are grouped in geographical
areas covered by the wireless communication infrastructure; those areas are
called “cells”. The wireless communication infrastructure can be a wireless local
area network (WLAN
1
), an ad hoc
2
(i) Database systems residing in stationary hosts. In that option database
systems cannot physically migrate between hosts, i.e., their address
network is known a priori and does not change along the time.
network, or even a cell phone network [48].
Communication among cells is guaranteed by the support stations, which are
entry points for the wired network, which interconnects the MSSs.
From the perspective of the database technology, two types of database
systems may coexist in a mobile computing environment:
(ii) Database systems residing in mobile hosts. Those database systems
can change its location in a random way. Picture 1 shows a cell, in
which there are three mobile devices: a laptop, a PDA (Personal
Digital Assistant) and a smart phone. A query named “Q0” is sent by
the PDA to the mobile database hosted by the laptop, which is located
in “X”, at the time interval known as “T1”. While the query is
processed, the host that contains the database in question moves to
“Y”, at the time interval “T2”, and also receives the new query “Q1
sent by the cell phone. After that, it moves to the “Z” location at time
interval “T3” and returns the result of the first query to the PDA.
Database systems like that are called mobile databases [2, 3, 4, 5].
1
Wireless Local Area Network.
2
A dynamically configurable network.
16
P
ICTURE 1: MOBILE DATABASE.
It is also worth of mentioning, that when a cell represents a WLAN or ad
hoc network, mobile devices from inside the cell may communicate with each
other directly. However, when a cell represents an area covered by a cell phone
network, the given devices need the intermediation of the MSSs in order to
communicate.
Picture 2 shows a typical mobile computing environment; in the depicted
environment one can see a cell phone cell, an ad hoc network with mobile
databases and a wired network interconnecting everything. Thus, cell phones
could send queries to the databases in the ad hoc network or to the ones
belonging to the wired network; queries sent from the wired network to the
mobile databases in the ad hoc network could be handled as well.
17
PICTURE 2: MOBILE COMPUTING ENVIRONMENT
2.3. Ad hoc networks
Wireless networks use radio waves, infrared and satellite communications in
order to transport data signals [46, 48].
Despite the volatile nature of the waves, wireless networks still are static
structures, i.e., its topology once formed cannot be altered without network
restructuration, and in order to achieve that, the number of elements and their
type must be known a priori. However, it would be far interesting to just change
the network structure according to the available elements in a given time.
Behaving like that would create new possibilities, like the formation of a network
in which the nodes are airplanes from the same squad; giving them the
opportunity to share information (when close to each other) about enemy
positions, without the need for updates incoming from a centralized source, like
a ground station (far away) or an air station mounted on a spy airplane. This
type of network is a dynamically configurable network [48]. Its topology is
dynamic, changing accordingly to the spatial migration of its elements and
transient connections. A network element can be integrated to it or removed
from it at anytime. That is the key concept behind ad hoc networks.
18
However, like all wireless networks, ad hoc networks suffer from the
following restrictions. Low bandwidth (compared to the wired ones) and high
error transmission rates, those restrictions limit its applicability, mainly in regard
to the number of applications, turning it in a great challenge for developers and
network protocols, which try to make it viable for the development of mobile
software that depends on this type of infrastructure.
2.4. The Introduction of Mobility on Databases
In order to have a database system supporting mobility, several of its
components and concepts must be reevaluated, and probably redesigned to
make such a support possible.
That support must guarantee data access everywhere, at anytime, and
also consider that several database systems may be interconnected by an ad
hoc network, for instance. A query can be sent from any mobile node to any
mobile database contained in the network.
A mobile database community (MDBC) is a collection of mobile computers
interconnected by a wireless communication infrastructure, in which each host
may be a database client and server. The databases that are part of the
community are autonomous, distributed and depending on the environment,
heterogeneous.
Query processing, concurrency control and recovery are just a few
technologies that need to be reconstructed, in order to enable a database
system to be mobile.
Many issues must be addressed before having an operational mobile
database, like disconnections while processing a simple query (a host may be
randomly shut down). In case of a distributed transaction, one must guarantee
the atomicity of the commit operation, i.e., a new concurrency control protocol
must be implemented on the mobile databases.
2.5. Query Processing on Mobile Databases
The scope of this work is to solve part of the issues related to query processing
on mobile databases. Specifically the ones related to data access and the
execution of queries based on join operations.
19
Query processing on mobile databases faces several restrictions. Those
are restrictions imposed by the computing environment, like unknown and
heterogeneous data sources (Internet), huge data sources, and devices with
serious computing resources limitations (memory, processing and network
bandwidth). Some ideas must be evolved to minimize the impact of such issues,
like the following:
(i) Decrease of the response time: the main concern on traditional
query processing was the decreasing of the total execution time of the
query [9, 143]. The new efforts have the goal to decrease the
response time to the user, i.e., the delay on the first tuple to be arrived
must be avoided, even if it implies in the increase of the total overall
execution time of the query.
(ii) Adaptability to the processing capabilities of distinct data
sources: the idea here is to guarantee that a query will not be
blocked, upon the inclusion of a data source with low tuple delivery
rates. That situation may occur at anytime in a remote query, because
a host has disconnected from the network, or because the host is
running out of bandwidth etc. Non-blocking query operators and more
fitted execution plans have been proposed [35, 60, 143]. Query
operators are algorithms which implement a query operation, like the
join operation of the relational algebra. Non-blocking operators are
called symmetric, since they handle each data source in an
independent and parallel way. The new execution plans favor the
usage of pipelining
3
3
Tuple pipelining.
. A pipelined operator channels the partial result
tuples from lower levels to the higher operator (on the graph of
operators), i.e., one can decrease the query processing time and its
response time by pipelining partial results to other operators involved
in the query. This is important on mobile environments due to the
restricted resources, like low available memory, slower processors
and narrow bandwidth.
20
(iii) Query optimization during execution: traditional query optimization
is done in a static way; the best plan is computed before the query
execution starts [35, 60]. However, the data sources could change
substantially while the query is running, especially on mobile
databases; a sudden cardinality increase in one relation, for instance.
Such a change may render the optimization inefficient. In order to
minimize such thing, one could optimize the execution plan step-by-
step. On each new data source accessed, the original execution plan
is optimized; in order to adequate it to the current context. This
technique is called adaptive query processing [9, 18, 87, 96, 104,
143].
(iv) Partial results supply: as soon as the result tuples are validated and
the duplicates are discarded, those partial tuples are immediately sent
to the user. This technique is the online query or continuous query [7,
26, 43, 49, 85, 132, 134].
21
3. Adaptive Join Operators
3.1. Introduction
Adaptive query processing has a fundamental role in the context of mobile
databases access, since conventional query processing techniques are not
designed for databases running on mobile devices, which do not have plenty of
computing resources like desktop computers. Moreover, those techniques do
not take into consideration the unpredictability of the tuple delivery rate of the
sources, nor the randomly disconnecting hosts, which are common problems in
queries involving mobile databases, for instance [2, 5].
Many strategies have been proposed to make query processing adaptive
[90, 96, 104, 108]. One of those uses the notion of adaptive operators. In fact,
the join operator of the relational algebra is usually the one to be implemented,
since it is the most time and space intensive operator that could possibly be
involved in database queries, i.e., the worst case of a database query is when
the query is solely formed by join operations (due to the amount of operations
and data that need to be handled).
An adaptive join operator tends to adapt itself to the tuple delivery
capabilities of the data sources [120]. This is very handy when one of the data
sources has ceased its tuple delivery for an unknown reason at a random time,
and the requested join operation must continue. The mobile host could have
been disconnected from the network, or the network may be suffering from
packet collision (local network) or even with router traffic (Internet); or the
wireless network is simply running out of bandwidth because of its already
narrow bandwidth, for instance. In case of a conventional join operator, the
query processing would be blocked, because of the starvation of tuples to
continue the query. An adaptive operator is designed to keep the query running
even if the data sources stop sending tuples. It can change the operands
position in the join operation, i.e., changing the internal relation (consumed
faster) for the external one (consumed slower). In that way, the query
processing continues and the operator does not block. When the problematic
data source resumes its delivery, the operands (relations) are changed to their
original positions and the operation keeps running normally. The operator
22
described in this paragraph is called symmetric, because it is able to adapt to
the situation depicted above by handling the sources independently.
Some adaptive join operators have been proposed, like XJoin [120],
Ripple Join [47, 92] and DPHJ (Double Pipeline Hash Join) [145].
This chapter is structured as follows:
Section 3.2. Some relevant topics about the classical XJoin algorithm,
which are analyzed in this section.
Section 3.3. The Ripple Join algorithm is discussed.
Section 3.4. Double Pipeline Hash Join is presented.
Section 3.5. MobiJoin is discussed and some of its techniques are
presented.
Section 3.6. The Adaptive Symmetric Hash Join (ASHJ) is discussed and
compared to other algorithms.
Section 3.7. Hash-Merge Join, a novel merge based algorithm, which has
been designed to decrease the delivery time of relevant
content is presented.
3.2. XJoin
The XJoin algorithm is an extension of SHJ (Symmetric Hash Join), which is an
algorithm designed to run in a high degree of pipelining in parallel database
systems [120]. However, SHJ requires that all hash structures must be loaded
in main memory for the whole time of query processing. Such a characteristic is
undesired when one realizes that the system may run out of memory, if the data
sources require more memory to be joined than the memory that is currently
available. In order to eliminate such undesired behavior, XJoin partitions those
structures and transfer them to secondary memory (to be loaded at another
time). This technique is quite similar to the one implemented on the Hybrid
Hash Join algorithm, to avoid the memory overflow generated by the standard
Hash Join algorithm.
Another XJoin property worth of mention, is related to its ability to handle
data sources with low tuple delivery rates, i.e., suffering from a systematic
23
delay. While one of the data sources is stopped, the other one is processed.
There must be available tuples from the blocked data source though, may it be
allocated in main memory or flushed to disk. This is also applied to the situation
where both data sources are blocked, but there still are tuples from them
available in the local system. Moreover, other segments of the query could run,
while one or more operations are stopped, since XJoin is pipelined.
The algorithm runs in three phases:
(i). Probing of the tuples allocated in main memory.
(ii). Probing of the tuples allocated in secondary memory.
(iii). Elimination of the duplicated tuples.
The phases (i) and (ii) run in an interchanged manner, phase (ii) is
activated when (i) is stopped due to the absence of incoming tuples. However,
phase (ii) yields its execution to (i) when the data sources become operational.
Phase (iii) runs when (i) and (ii) have already stopped running, in order to clear
the intermediate result from duplicates.
3.3. Ripple Join
Before discussing the Ripple Join algorithm [47], it is interesting to discuss the
one algorithm that spawned it: Nested-Loop Join.
Nested-Loop Join is the simplest algorithm to implement the relational
algebra join operator. It consists of two nested loops, i.e., its time complexity is
quadratic O(n
2
). The algorithm compares all tuples from the external relation,
with all tuples from the inner loop (internal operand). Tuples from the inner
operand are “consumed” faster than the ones from the external one, for every
external tuple there is a comparison with all internal tuples.
The Ripple Join decreases the query overall response time, in queries in
which the tuple delivery is eminently intermittent [47, 92]. Its symmetry is a nice
feature to achieve such a purpose; When a given data source does not meet
the requirements ruled by a statistic factor, it exchanges the inner and external
operands, and the delaying source becomes the external one, which implies
that those tuples will be consumed slowly. This change depends on the
24
computation of a statistic factor, which measures the quality of the incoming
host result; this is done because Ripple Join has been designed for decision
support systems with online queries, in which the quality or relevance of the
incoming result is paramount. However, there is a factor under control of the
user, called “sampling rate”. That factor describes a random tuple processing,
and each relation participates of each sampling step loop, ranging from 1 to n or
(n 1), where n is input by the user. In a simplistic way, Ripple Join is a kind of
Nested-Loop Join, in which the operands position changes according to a
statistic factor, computed based on the accessed data sources.
3.4. Double Pipeline Hash Join (Tukwila Implementation)
DPHJ is a join operator in which the construction of the hash structures is done
in an independent way [145]. A comparison is done between the current tuple
and the hash table of the other data source, and then the tuple is added to the
hash table of the current source. Notwithstanding, this mechanism requires a lot
of available memory, whereas each hash table must be loaded into the
available memory to proceed.
In case of memory overflow, the algorithm adapts itself according to the
latency entered by the user to deliver the results. If the given latency is
somewhat big (less I/O operations), then the algorithm is most likely to run like
the Hybrid Hash Join (called Incremental Left Flush on the Tukwila engine).
Howbeit, if the user chooses a small enough latency, the algorithm becomes the
Incremental Symmetric Flush, which writes both hash structures to disk;
continuously processing new tuples in a parallel way [47]. The aforementioned
procedure is nothing less than its implementation of symmetry, in which the
quintessential factor for adaptation, is the latency (in case of overflow) chosen
by the user, and not a computed statistical factor, like the one present in Ripple
Join.
Symmetry on DPHJ guarantees that the query keeps running even if the
data sources become unavailable.
25
3.5. MobiJoin
This algorithm [1] is newer than the previous ones, and has been designed with
adaptation techniques in mind; as the XJoin algorithm, this one runs in three
phases:
(i). Probing of the tuples allocated in main memory.
(ii). Probing of the tuples allocated in secondary memory.
(iii). Another probing phase to avoid incomplete results.
The first two phases are interchanged at runtime, (symmetry is
implemented as well) most likely what is done by XJoin, but unlike XJoin, it
does not use timestamps to control which tuples have already been compared.
Instead, the algorithm keeps control tables (a bit table per bucket pair), which
are auxiliary data structures that contain information about already processed
tuples. Such control also avoids the generation of duplicated tuples while still
probing. Furthermore, the first phase is pipelined, which implies that the
algorithm produces result-tuples while probing.
It also implements a mechanism to handle memory overflow at the bucket
level. In reality, it is a memory management policy, which optimizes the memory
usage of the allocated buckets. Moreover, the policy cannot act over buckets
that have already been flushed to disk, i.e., buckets in which its content has
already been partially flushed to disk cannot take any benefit from the memory
management policy.
In fact, the first two phases cannot guarantee a complete and correct
result at the end of the operation; hence the third phase needs to be triggered.
The first phase cannot compute tuples that have not been loaded to memory at
once. As a matter of fact, the first phase is not able to join tuples belonging to
corresponding buckets, when at least one of them has already been flushed to
disk. What is more, the second phase cannot join disk-resident tuples with
tuples that have not been allocated at the corresponding bucket. So, it is up to
the third phase to join the remaining tuples, hence this phase is only triggered
after all tuples from both data sources have already been received.
26
3.6. Adaptive Symmetric Hash Join
This algorithm has been designed over the Symmetric Hash Join algorithm,
which is an extension of the original Hash Join algorithm itself, with the
additional symmetry implemented. Symmetry is the ability to independently deal
with the data sources, i.e., in a non-blocking way; the processing of the data
sources does not block the join operation.
The ASHJ is the local join operator of the Parallel Hash Ripple Join
algorithm in [143], i.e., it has been designed to run under an online aggregation
system. Since it is the most low-level algorithm running on such a system, it is
the one responsible to deal with memory overflow. Moreover, the adaptive
extension of the algorithm is mostly focused in the handling of memory overflow
events. Indeed, it deals with memory overflow; since it keeps the operation
going on even when multiple overflow events occur. However, when a tuple
supposed to be allocated in a bucket that has already suffered overflow arrives,
it is immediately flushed to disk, and it cannot be allocated in memory anymore.
Such a behavior greatly decreases the operator result rate, since flushed-to-
disk tuples are probed only after all tuples have arrived from their corresponding
sources. In the meanwhile, those disk resident tuples could be used to produce
more result tuples, if probed with memory resident ones. The algorithm
compares only memory resident tuples before all tuples have arrived.
For the sake of clarity, let us consider the worst case, which is when the
algorithm cannot allocate even a bucket pair, and all other buckets (with
different selectivity factors) have been flushed to disk due to the severe lack of
available memory. In that particular case, the algorithm would be consistently
flushing tuples to secondary memory storage (granted that it has the required
resource) until all involved tuples have arrived. After that, it starts probing all
disk resident tuples and dispatching the results. As one can see, the algorithm
would be unable to produce any result tuples while still receiving new tuples,
greatly reducing its output.
27
3.7. Hash-Merge Join
The Hash-Merge Join algorithm (HMJ, for short) [78]; is an adaptive join
algorithm that handles unpredictable and slow network traffic, and its main
application are queries in which the user only needs the very first few results.
The HMJ algorithm is designed with two goals in mind: (i) minimizing the time to
produce the first tuple set. (ii) Producing join results even if the sources are
blocked. The algorithm also has an informal third phase, which is quite similar to
the one implemented on MobiJoin. Actually, it is another run of the merging
phase.
This algorithm has been constructed over two phases: hashing and
merging phases respectively. The hashing phase produces join results as soon
as data arrives. Once the algorithm runs out of memory, certain buckets are
flushed into disk to make room for the incoming tuples. The hashing phase is
able to produce join results from the unblocked source, in the event of one data
source becomes blocked. Furthermore, if both sources become blocked, the
HMJ triggers its merging phase. In that phase, flushed buckets are loaded and
joined applying a sort-merge routine over the loaded-from-disk tuples. So, the
HMJ algorithm can still produce results when both sources are blocked. As
soon as the blocking of any sources is resumed, the algorithm interchanges the
merging phase over the hashing phase. As a matter of fact, it switches back
and forth between the phases until all tuples are received. After that, the
informal third phase takes place: the entire memory content is flushed to disk
and the “merging phase” produces the final part of the join result.
28
4. AJAX Adaptive Join Algorithm for Extreme
Restrictions
4.1. Introduction
In the context of mobile databases, adaptive query processing plays a key role
[106, 108, 133]. In turn, adaptive query operators represent a critical feature for
adaptive query processing. An adaptive query operator should be able to react
to some events, which are related to increasing response time on mobile
databases access. In order to explain the idea of adaptive operator, consider
that a query Q consisting of a join operation has been submitted, and at
undetermined running time one of the data sources (which is an operand of the
join operation) has suddenly block its tuple delivery for some unknown reason.
In other words, the join operation is blocked, and the query’s response time is
increased until the blocked data source resume to delivery tuples again to the
join operation. This scenario is quite common in mobile databases and
conventional operators are not prepared for such a situation.
This work proposes an adaptive join algorithm called AJAX [5, 25]. The
algorithm implements the relational algebra join operator, using hashing and
pipelining techniques to decrement the overall response time in a query
involving join operations. AJAX is the acronym of Adaptive Join Algorithm for
eXtreme restrictions. We claim that AJAX is an adaptive query operator [9, 90],
since it is able to react to events which may considerably increase response
time when executing queries over mobile databases. AJAX ensures the
following properties:
(i) Incremental production of results as soon as the data become available.
Conventional join operators (like nested-loop join and hash join) just
produce results when at least one in vo lve d re la tio n is co m p le te ly
evaluated. Joins executed over distributed data may take a long time until
users begin to receive the result from the join operation. AJAX uses a
pipeline technique in order to incrementally provide results (to users or to
another query operator) as soon as they become available. By doing this,
it is possible, for instance, to users (observing the progress in the query
29
processing) to decide to stop the execution on-the-fly, since the results
obtained are already enough to make a decision;
(ii) Progress of the query processing even when the delivery of data is
blocked. The idea is to overlap delays by executing other tasks when tuple
delivery (from tables involved in join operation) is blocked;
(iii) Reaction to memory-overflow events. When a memory overflow occurs,
AJAX reacts by flushing part of the data from main memory to disk;
(iv) Prevention of unnecessary disk access. AJAX predecessors’ usually need
to read data previously flushed to disk in another execution phase like
XJoin, in order to avoid incomplete results and/or duplicates; AJAX is
designed to reduce such disk accesses by avoiding duplicate results at
probing time (there is no need for another phase involving disk access).
AJAX adaptive algorithm keeps running, (and the whole query as well)
even though both data sources are blocked. The idea is to overlap delays by
executing other tasks. That is achieved because AJAX is a symmetric operator
and implements intra-operator and inter-operator pipeline as we show on the
next section.
This chapter is structured as follows:
Section 4.2. Describes the first execution phase of the algorithm.
Section 4.3. The probing procedure is exposed, explaining the mechanism
that avoids duplicates generation. The overflow prevention
routine is also explained in a subsection.
Section 4.4. Second execution phase is discussed here, and the
algorithm’s pseudocode is depicted at the end of the section.
Section 4.5. The cost estimation model for the algorithm is discussed.
4.2. First Phase
The key goal of the first phase is to join the largest amount of memory-resident
tuples, using the available memory. This phase starts to run as soon as tuples
begin to arrive and continues executing as long as tuples of at least one of the
30
tables are arriving. When all tuples of both tables involved in the join operation
have arrived, the first phase stops its execution (and the second phase is
triggered). If the arrival of tuples from both tables is interrupted (blocked), the
execution of the first phase is suspended (and the second phase is started),
provided that there are no new tuples to be processed. When new tuples restart
to arrive, the execution of the first phase is resumed. For that reason, the AJAX
is a non-blocking join operator.
When the first phase starts to run, two hash tables are created (one for
each table of the join operation). Thereafter, the hash tables can be populated
(as soon as tuples from the tables involved in the join operation arrive) by
applying a hash function h over the join attributes. Each hash table is composed
by buckets. AJAX’s hash tables are double ended queues (deque) [83] of
buckets, where each bucket is a linked list of tuples and there is no limit for its
size, which characterizes a dynamic data structure. The dynamic bucket is a
way to implement a more fair memory allocation policy; in contrast to
conventional operators which have buckets with limited size (incurring on
bucket overflow). Such technique guarantees that all available memory will be
used to allocate as many tuples as possible. Buckets with limited size do not
offer the best allocation, because a great quantity of buckets could be empty or
quite empty, but they will still be holding memory that could be used by tuples
with a greater selectivity factor, i.e., memory will be wasted if a given bucket can
hold 100 tuples, but only 5 has already been allocated in it. Besides that, using
fixed-size buckets implies on memory overflow occurring more frequently, since
the memory-overflow event occurs for buckets and there are several buckets for
each hash table. Such a feature may represent a significant increase in query
response time. In AJAX a bucket that “deserves” to be larger (in size) will be in
fact larger.
For each set of received tuples by AJAX, the memory size that is
necessary to store the received tuples is calculated and compared with a
minimal memory limit (several values have been tested on the experimental
results section), which is a memory size great enough to keep the algorithm
running and to allocate disk pages during second phase. This procedure is
done to avoid memory overflow in both phases; AJAX has been designed to run
in very resource-constrained environments (extreme restrictions). After
31
checking if there is enough room in memory, the tuples are allocated in their
corresponding hash tables by applying a hash function on the join attribute. The
hash function result is the bucket address which will receive that particular
tuple. It is important to observe that the overflow granularity in AJAX has been
moved from the bucket level to the system (memory) level. As already
mentioned, such a strategy guarantees an increase in overall AJAX’s
performance, because a system overflow occurs less frequently than a bucket
overflow event.
The algorithm has been designed to do as many probing as possible in a
specific time interval, to do that, AJAX uses inter and intra-operator pipelining
[10, 115]. Some additional information is required in order to avoid duplicates
and unnecessary probing on pipelined probing. Tuples that have been allocated
in its corresponding buckets have an index that marks its relative position in the
bucket. In fact, this index represents its insertion order in the bucket. Another
useful information added on tuples is the last probe remembrance, which is an
index indicating the last tuple on the opposed bucket that has been probed with
this single tuple. This information will be used to guarantee the progressive
probing (described on next section) correctness and completeness.
4.3. Probing
Probing in AJAX is progressive; this progressiveness avoids duplication and
unnecessary probing. The idea is to run the probing phase as a loop, which
compares all pairs of buckets (from both relations) with the same hash address,
comparing tuples that have arrived during the current iteration i and tuples from
iteration (i 1), i.e., tuples that have been arrived on the last iteration but not
probed yet . To illustrate this idea observe the execution scenario depicted on
figure 1. Consider that the join relations are alpha and beta, and that exists two
buckets (x and y) for each hash table.
The probing is done by comparing tuples from buckets with the same hash
address; tuples belonging to the bucket with hash address “x” of alpha are
compared with tuples of the bucket “x” of the beta’s hash table, producing
result-tuples incrementally. This procedure is executed for every bucket pair in
the hash tables until the last pair is reached. After that, the probing continues
from the beginning. For that reason, a deque data structure has been chosen;
32
the algorithm’s loop traverses from the beginning of the buckets list to its end in
each iteration. Beginning from the list’s head guarantees that tuples that have
been received during a given iteration “i” might be probed in the next iteration (i
+ 1), since some tuples may be inserted in bucket “x” from alpha while the
probing is in progress in the bucket pair (from both hash tables) with hash
address “y”. Therefore, the probing is progressive, since tuples which have
arrived in iteration “i” may or may not be probed during this iteration. However,
AJAX guarantees that in such a scenario, those tuples will be probed on next
iteration (i + 1).
PICTURE 1: RUNNING SCENARIO OF THE PROGRESSIVE PROBING
In order to avoid duplicates in the result of the join operation and
unnecessary probing, AJAX implements the following strategy. Whenever a
tuple is received by AJAX, it adds two columns to the tuple. One column, called
position index, represents the receive order of the tuple in a given bucket. The
other column, denoted last probe remembrance (LPR for short), has the
following functionality; the LPR of a tuple “t”, which belongs to a bucket with
hash address ‘x’, stores the value of the position-index column of the last tuple
of the bucket with hash address ‘x’ of the other hash table which has already
been compared (probed) with “t”.
Thus, for each tuple in a bucket with hash address ‘x’ of a hash table
Alpha, AJAX compares its LPR with the position index of the tuple in bucket ‘x’
of the other hash table, say Beta. Two situations are possible:
33
(i) The value of LPR of Alpha’s tuple is less than the value of position
index of Beta’s tuple. In that case, the tuples have not been probed
yet. For that reason, their join attributes are compared to check if
the tuples should be inserted in the result. After that, The LPR of
Alpha’s tuple receives the position index value of the Beta’s tuple.
This guarantees that those tuples will no longer be probed.
(ii) The value of LPR of Alpha’s tuple is greater or equal to the value of
position index of Beta’s tuple, which means that the tuples have
already been compared, and will not be probed again.
Figure 2 illustrates how AJAX uses the LPR and position-index values.
The last attribute of every tuple in Figure 2 represents the LPR column. On the
other hand, the first column represents the position-index value. Observe that
the first tuple belonging to bucket ‘x’ of Alpha’s hash table has position-index
equal to 0 and LPR equal to 2, which means that it has already been probed
with the tuple of position-index with values greater than -1 and less than 3 in
Beta’s bucket. Tuples with LPR equal to -1 are new tuples (recently allocated),
which were not compared with any other tuple.
FIGURE 2: LAST PROBE REMEMBRANCE.
4.3.1. Reacting to Memory Overflow
When a memory overflow event occurs during the join, an adaptive operator
should be capable of reacting to such event, avoiding stopping its execution. In
fact, AJAX anticipates its reaction to the occurrence of memory overflow events.
34
Such a property is possible, since AJAX checks the amount of available
memory at tuple arrival time. It calculates then the required room to allocate the
tuples. If the tuples can be allocated in the available memory amount, then they
are inserted into their corresponding buckets. Otherwise, the algorithm makes
room to allocate the incoming tuples. In order to free memory space, something
has to be deallocated. Many deallocation policies can be implemented on
AJAX. The default policy chooses the largest bucket pair (the pair with the
largest quantity of tuples). After that, the algorithm flushes those buckets to disk
and deallocate all tuples stored in those buckets. This is a recursive process,
which is executed until there is enough memory for allocating the incoming
tuples.
When both data sources are blocked or EOF is received from them, the
second phase of AJAX is triggered.
4.4. Second Phase
Second phase uses buckets previously flushed to disk to produce result-tuples.
The execution of the second phase is triggered to react to different events. First,
when both data sources are blocked (due to external events, like low bandwidth
or abrupt shutdown), AJAX executes its second phase in order to overlap
delays. Second, when all the tuples of both tables are received, the second
phase is triggered for the last time in order to ensure completeness of the join
result produced by AJAX.
The execution of the first and second phases is alternated every time the
data sources become blocked. In order to continue generating result-tuples, the
second phase uses the buckets flushed to disk to feed the progressive probing
loop.
Buckets are flushed to disk to avoid memory overflow. However, those
disk-resident buckets could produce another overflow when they are loaded to
be probed with other buckets in memory. This “second overflow” is prevented
using the minimal memory limit. Before flushing the chosen buckets to disk, the
overflow preventing algorithm divides each single bucket in smaller partitions
having the same size of a disk page (4 KB, for example), then these pages are
flushed to disk. The loading is done using part (how much is used is a
parameter) of the minimal memory limit area; only a few pages are loaded at
35
once and probed with another tuples on memory; after probing the pages,
another set is loaded to memory and the previously loaded ones are
deallocated. This process continues until all pages are loaded and probed, or
when one of the data sources has become operational again.
To avoid probing the same set of tuples again, the LPR concept is used at
disk page level, every page has its own LPR indicating the last page that have
already been probed with it. This minimizes the disk I/O, increasing the AJAX’s
performance and avoiding duplicated result-tuples, which leads to incorrect
results.
Figure 3 shows a simplistic pseudo code of the algorithm, many details
have been omitted in order to preserve a didactic approach. It is important to
say that the process of receiving tuples from data sources and the overflow
preventing process are executed in different threads, which is not represented
in Figure 3.
Procedure TupleArrival (tuple t, sources (A, B))
Begin
1. If t exceeds the minimal memory limit
(a) Choose two buckets A
x
and B
x
.
(b) Probe buckets A
x
and B
x
.
(c) Flush buckets A
x
and B
x
.
(d) Deallocate A
x
and B
x
.
2. Calculate the hash of t.
3. Insert t in the correct bucket.
End
Procedure ProgressiveProbing (buckets (A
i
, B
i
), pages (PA
i
, PB
i
))
Begin
1. If buckets A
i
or B
i
got a new tuple t
(a) Probe t with all tuples with position index lower than
its LPR.
36
(b) Update the LPR of all probed tuples with the position
index of t.
(c) Update the result stream.
1.1 If sources are blocked
(a) Probe A
i
with all disk pages with index less than its
LPR.
(b) Probe B
i
with all disk pages with index less than its
LPR.
(c) Probe page PA
i
with all disk pages with index less
than its LPR.
(c) Probe page PB
i
with all disk pages with index less
than its LPR.
(d) Update the LPR of the involved tuples.
(e) Update the result stream.
End
FIGURE 3: AJAX SIMPLIFIED.
4.5. Cost Estimation
In order to estimate the cost of executing the AJAX operator, consider the
scenario depicted in Figure 4. M represents a mobile database community
(MDBC). DB
1
, DB
2
and DB
3
represent the mobile databases belonging to M.
Those mobile databases reside in mobile hosts C
1
, C
2
and C
3
, respectively. The
mobile hosts C
1
, C
2
and C
3
are interconnected by means of an ad hoc network.
Now suppose that a user submits a query Q in C
1
. The query Q represents in
fact a join operation (of the relational algebra) between two unsorted tables A
and B, which are stored in DB
2
and DB
3
, respectively. The tuples of relations A
and B should be delivered to the mobile host C
1
where the join operation has to
be executed. Thereafter, the result of Q should be delivered to the user, who
has submitted Q. We are assuming that the database systems in C
1
, C
2
and C
3
are interoperable.
37
FIGURE 4: RUNNING EXAMPLE.
For the sake of simplicity, let us consider that the tables A and B have the
same cardinality c, the same tuple size and there is an even distribution of
tuples between the two hash tables. The available main memory area in C
1
for
executing the AJAX has the capacity of storing s tuples. Without lost of
generality, the measure for estimating that cost is the number of tuple transfers
to/from disk, instead of block (page) transfer. The cost for AJAX can be
estimated, according the following conditions:
(i)
cs
2
. In this case, there is no overflow. Thus, no tuple is
flushed to disk and the third phase is not triggered, which gives a
cost of 2c (for reading tables A and B);
(ii)
sc
s
<
2
. In this case, there is an overflow of c tuples, since we
have assumed an even distribution of tuples across the hash
tables. The cost is given by
)
22
( )
22
(2
cccc
c ++++
, i.e., 4c.
38
Note that, one term
)
22
(
cc
+
represents the cost of flushing
tuples to disk and the other represents the cost of reading the
tuples from disk for executing the third phase.
(iii)
cs <
. In this situation, we have an overflow of
)
2( sc
tuples,
i.e., an overflow of
)
2
(
s
c
tuples for each table. Thus, the
estimated cost is
)2()2(2 scscc ++
, giving the final
cost of
)
3(2 sc
.
39
5. Experimental Results and Analysis
5.1. Introduction
In order to investigate performance of the AJAX operator, we have performed
several experiments. In the experiments, a query Q is submitted to a mobile
computer C
1
, where Q represents a join operation between two unsorted tables
A and B, each of which residing in a different mobile computer, C
2
and C
3
,
respectively. The tuples of those relations should be delivered to mobile host C
1
where the join operation has to be executed.
The experiments were divided into two classes. In the first class of
experiments, we have simulated the execution of AJAX in devices with severe
memory restrictions interconnected by a wireless network, which may cause a
bursty arrival of tuples in C
1
. In the second class of experiments, we have run
the XJoin [120] and Hash-Merge Join [78] (HMJ, for short) operators (using the
same testbed) in order to compare AJAX with them. The test environment was
composed of 3 Pentium-4 machines, with 3.0 GHz processors and connected
by a wireless network. The three operators (AJAX, XJoin and HMJ) were
implemented in Java. The mobile-like restrictions have been simulated by the
Java Virtual Machine.
This chapter is structured as follows:
Section 5.2. Standalone tests simulating memory availability changes are
presented, in order to analyze the algorithm’s viability.
Section 5.3. In this section AJAX’s performance is measured and
compared to other algorithms.
Section 5.4. An analytical discussion of the related work is presented.
5.2. Standalone Tests
In this class of experiments, the goal was to evaluate performance issues of
AJAX. The cardinality of the tables A and B (operands of the join operation) is
10
4
and 6x10
4
, respectively. The cardinality of the result (of the join operation) is
6x10
3
, each tuple has an approximate size of 56 bytes. To execute such a join
40
operation without memory overflow, it is necessary a minimum of 5.7MB of free
memory in C
1
(the device that has requested the query).The bursty arrival of
tuples (due to low bandwidth of the wireless network) was simulated by
introducing a delay of 5 seconds for every 10
4
tuples delivered to the algorithm
(running in C
1
).
First, we have performed experiments to investigate AJAX’s behavior with
different memory sizes and bursty arrival of tuples (in C
1
). AJAX was used to
execute a join operation in a computer C
1
with three memory sizes, 64MB, 4MB
and 1MB. The results are shown in Figure 5.
FIGURE 5: MEMORY OVERFLOW PREVENTED.
Looking more closely at Figure 5, one can observe that the algorithm has
a linear behavior regardless the available memory size, considering the
cardinalities used in the tests. The algorithm also presents an aggressive
behavior in the earlier generation of result-tuples, this can be seen comparing
the 4 MB and 1MB curves, the time difference between their points is small until
4x10
4
result-tuples are generated, and increases just after 4x10
4
result-tuples.
This is because AJAX requires more disk I/O in the 1 MB scenario when
executing the second phase for the last time (when all tuples have arrived). In
other words, the algorithm minimizes the impact of low memory on query’s
response time. Running in a scenario with 1/6 of the memory required to
allocate both relations, shows that the algorithm has reacted to the memory
limitation. Recall that this situation was simulated with a blocking of 5 seconds
41
for both data sources for each 10
4
received tuples, which implies that the
second phase was activated by the algorithm.
In order to verify the importance of the second phase to the overall
decreasing on query’s response time, another experiment using 1 MB of
available memory was performed. For that set of experiments, we have first
switched off the second phase. Thereafter, we have switched on the second
phase. Figure 6 shows the results of that experiment. Observe that AJAX has
an aggressive strategy to produce results as soon as possible when the second
phase is executed.
FIGURE 6: SECOND PHASE EFFICIENCY.
In Figure 6, one can observe that, for a given time interval, the algorithm
has generated a greater quantity of result-tuples when the second phase was
enabled than when the second phase was switched off. Such a property
decreases the query’s response time. This result is achieved because the
second phase uses tuples on disk to continue the probing when both data
sources are blocked, otherwise the probing mechanism would be blocked as
well (the probing “starves” by the absence of tuples).
5.3. Comparative Tests
We have also performed experiments to compare AJAX with XJoin and HMJ,
regarding memory restriction (which induces memory overflow) and the
occurrence of data delivery interruptions (which induces the execution of the
42
second phase). For the experiments, the tables A and B (operands of the join
operation) have cardinalities of 10
4
and 6x10
4
tuples, respectively. We have
consider tuples of 20 bytes and the cardinality of the result is 10
5
, which implies
that around 2MB of available memory is required to produce the join result
without the occurrence of memory overflow.
Initially, we have compared AJAX with XJoin [120], which is a well-known
adaptive join algorithm. Since disk access is the major bottleneck for join
algorithms, it has been chosen as the quantity to be measured in the
experiments. The results are illustrated in Figure 7 and 8. Observe that both
algorithms share a similar behavior until 6x10
4
result-tuples are generated,
which is the point when both algorithms have already flushed a great number of
tuples to disk and more disk I/O is required to continue the operation. Looking at
the AJAX series, one can see that the algorithm curve is quite linear even after
6x10
4
result-tuples have been generated, which is not the case of XJoin (see
Figure 7). XJoin presents an exponential growth after 6x10
4
result-tuples have
been generated, which represents much more disk I/O than AJAX.
As the available memory decreases, the performance similarity between
XJoin and AJAX cease earlier, which can be observed in figure 8. In Figure 8,
one can also observe that to produce 4x10
4
result-tuples XJoin increases its
disk I/O exponentially, while AJAX increases its own disk I/O in a quasi-linear
way (for the used cardinalities).
Besides the fact that disk I/O in XJoin grows faster as the available
memory decreases, XJoin is machine-precision dependant. Since XJoin applies
a timestamp to the recently arrived tuples, in order to avoid duplicates, it might
Figure 7: 1024KB of available memory. Figure 8: 600KB of available memory.
43
still generate duplicates, because the timestamp can be duplicated if the
machine in which the algorithm runs has not the required precision to count
time. Considering that XJoin runs in a device capable to count time up to
milliseconds, it cannot guarantee that two tuples that have arrived almost at the
same time (in the same milliseconds tick) will receive different timestamps. For
the sake of clarity, let tuple t
i
be received at the real time of 12:33:0001 and
tuple t
k
is received at 12:33:0003, since the given device can only count up to
milliseconds, the given tuples will receive the same timestamp of 12:33:000.
Such machine-precision dependant behavior of XJoin is even worse when it is
executing in mobile devices, since such devices may run processors with low
time precision and no floating point support.
Another set of experiments have been performed, but this time, the Hash-
Merge Join (HMJ) algorithm has been chosen to be compared with AJAX. First,
we have simulated devices with 1024KB, 800KB and 600KB of free memory.
For devices with 1024KB and 800KB, HMJ slightly produced less disk access
than AJAX. However, such behavior changed when the available memory is
shifted to 600KB: HMJ performance slightly degraded.
From 800KB to 600KB, HMJ accessed the disk more often than AJAX at
higher amounts of available memory. In order to analyze such behavior, the
memory has been dropped to 512, 256 and 128KB of available memory, and
the same behavior has been observed. Based on the data gathered from those
tests and closer observation of the algorithms, one can conclude that such
behavior is generated by the minimal memory limit imposed by the AJAX
Figure 9: AJAX x HMJ with 1024KB
of available memory.
Figure 10: AJAX x HMJ with 800KB
of available memory.
44
algorithm; that limit guarantees that no “second overflow” will occur when the
algorithm starts loading disk buckets (pages) to the main memory. This “second
overflow” is not handled by HMJ, which always use all memory available while
running, which leads to another overflow while trying to load disk pages to
compare with tuples still allocated in memory. From 600KB to 128KB, AJAX
performance continued to increase (see figures 11 and 12) compared to HMJ,
even using a minimal memory limit as low as 16KB, which can allocate 2 disk
pages (8KB) and handle the algorithm overhead (around 8KB).
It is important to note that HMJ requires the execution of an in-memory
sort algorithm on its both phases (see Section 4). Of course, such sort
procedure has a computational cost, which impacts in HMJ’s overall
performance. That cost is estimated and discussed in the next section.
5.4. Analysis of the Related Work
In this subsection, we analyze some of the published proposals of adaptive join
operators. The properties used to analyze the proposals are (i) result production
rate and (ii) progress of the query processing even when the delivery of data is
blocked. The first property makes possible to identify the number of tuples
(belonging to the result) produced by the operator within a given time interval.
That property gives an idea of the pipelining level supported by the operator.
The second property identifies whether or not the operator interrupts its
execution when tuples are not arriving. If the second property is supported by a
Figure 11: HMJ performance degrades (with 600KB of
available memory).
Figure 12: HMJ performance degrades (with 128KB of
available memory).
45
join operator, the time interval required to process the join operation can be
strongly reduced.
The result production rate is a function of the strategy used to process
tuples allocated in buckets for which memory overflow has occurred at least
once. For the operator Adaptive Symmetric Hash Join (ASHJ) proposed in
[143], tuples arriving for a bucket which has suffered of at least one overflow are
directly flushed to disk, i.e., they are not allocated in memory anymore. Such a
strategy reduces the result production rate, since tuples allocated to disk will be
processed only when all tuples of both relations have already arrived. Moreover,
those tuples should be read from disk again to be processed by the ASHJ. For
that reason, ASHJ may present a worse performance than the conventional
hash join operator, since a large number of tuples should be read twice from
disk, considering a scenario with severe memory restrictions. The Incremental
Symmetric Flush Join, proposed by Ives et al [143], presents a similar strategy
to the one presented by the ASHJ to process tuples arriving to buckets which
has suffered from bucket overflow phenomenon at least once. Thus, the
Symmetric Flush Join suffers the same restrictions of the ASHJ w.r.t result
production rate. In [143], another adaptive join operator is proposed, the
Incremental Left Flush Join. According to the algorithm of that operator, on the
occurrence of the first memory overflow, the Incremental Left Flush Join
chooses one of the relations of the join operation, say A, and stops to receive
tuples from A, only receiving (reading) tuples of the other relation, say B. From
that point on, whenever a memory overflow occurs, in-memory buckets
containing tuples of A are flushed to disks and the respective memory area can
be used for allocating tuples of B which are arriving. Only after all tuples of
relation B have arrived, the Incremental Left Flush Join operator resumes
receiving tuples of relation A. Note that if the arrival of tuples of relation B is
interrupted (blocked), the operator stops its execution. Such a characteristic
may represent a strong reduction of the result production rate. The AJAX
continuously processes tuples of both relations involved in a join operation
independently of the occurrence or not of memory overflow. For that reason, the
AJAX presents a higher result production rate than the aforementioned
operators.
46
Comparing the adaptive join operators considering how the operator
reacts when the delivery of data is blocked, only AJAX [5, 25] and XJoin [120]
and Hash-Merge Join [78] continue their execution. The other operators stop
their execution when the delivery of data is blocked. In order to react to the
data-delivery blocking event, AJAX, XJoin and Hash-Merge Join use similar
strategy, they compare disk-resident tuples of both relations.
The Hash-Merge Join algorithm has two phases: the hashing phase,
which joins incoming tuples stored in in-memory hash buckets based on their
hash values; and the merging phase, which joins tuples that were previously
flushed to disk using a merging strategy. Although Hash-Merge Join has the
same hashing phase as that of Xjoin and AJAX, the merging phase may
introduces two critical problems into the HMJ operator. First, when it is
necessary to flush an in-memory bucket to disk, the algorithm requires that all
tuples in that bucket be sorted. Such sort procedure has a computational cost,
which impacts in HMJ’s performance (mainly in devices with restricted
processing capabilities). To estimate the sort cost in HMJ, let k be the number
of buckets of each hash table and n be the cardinality of each relation involved
in the join operation. Considering that tuples of both relations are uniformly
distributed among the buckets of both hash tables, the cost (complexity) for
sorting the tuples in a given bucket is of
k
n
k
n
log
, thus the cost for sorting all
tuples during the execution of HMJ is
k
n
nlog2
.Observe, that cost depends on
the cardinality of both relations involved in the join operation, which may
increase dramatically queries response time. It is worthwhile to emphasize that
AJAX does not suffer from that problem.
The second critical problem generated by the merging phase of HMJ
stems from the fact that the adaptive flushing policy adopted by HMJ may
produce disk fragmentation. That is because each bucket flushed to disk
represents at least a new allocated block in disk, even when the bucket has just
one tuple. It is important to note that disk partitions just start to be merged by
Hash-Merge Join when both relations involved in the join processing are
blocked. Thus, when that algorithm experiments long periods without
processing the second phase, the disk fragmentation become even worse,
considering the required number of disk access to merge all disk-resident
47
buckets, in order to join tuples in those bucket. We argue that AJAX does not
favor disk fragmentation, since each bucket has just one in-memory partition
and, when overflow occurs, one in-disk portion. Thus tuples sent to disk are just
appended at the end of the final block allocated for their correspondent in-disk
bucket. Furthermore, in case of memory overflows, AJAX chooses as victims,
buckets already flushed to disk before (if they exist), which also reduces disk
fragmentation.
It is important to note that Xjoin may produce duplicate and/or incomplete
results. For avoiding such an undesirable phenomenon, XJoin implements a
mechanism based on timestamps. To implement that mechanism, XJoin adds
two timestamps for each tuple (which were processed during the first and
second phases [120]). Moreover, for each disk-resident bucket processed
(read) during the XJoin’s second phase, a linked list is created. Each element in
the linked list stores two timestamps. One timestamp registers when the last
tuple of the bucket was flushed to disk. The other timestamp registers when the
second phase has read the disk-resident bucket. Regarding the strategy
implemented by XJoin to avoid duplicate and/or incomplete results when
executing a join operation, there are two critical problems. First, the proposed
strategy fails when the XJoin operator receives at the same time a set of tuples
belonging to the relations involved in the join operation, since, depending on the
machine’s clock precision, the timestamp set to that set of tuples will be the
same. Second, XJoin requires too much control structures in order to avoid
duplicate/incomplete results. AJAX avoids duplicates and unnecessary probing
during the probing itself; it does not need another execution phase, which
certainly may degrade performance.
48
6. Conclusions
In environments with support to mobile computing, data access becomes
unpredictable (due to transiently disconnections from the network) and mobiles
hosts (in which mobile databases reside) may suffer from limited available main
memory to process query operators, such as join operations. For that reason,
traditional distributed query processing techniques and operators are
inadequate to process queries on mobile databases. Traditional operators like
Nested-Loop Join, Hash Join and many others have been designed to run over
desktop computers connected by wired networks. In fact, traditional algorithms
have been designed for the sole purpose of running the join operation, not to
deal with network related problems or resource restrictions.
Unlike traditional operators, adaptive operators are designed to solve
others problems other than the join operation itself. Those operators may adapt
the entire join operation to run under adverse circumstances, like tuple delivery
delay or bursty delivery. In that class are algorithms like XJoin, Ripple Join and
Double Pipeline Hash Join. XJoin can handle blocking data sources properly,
for instance.
However, most of the adaptive algorithms have not been designed to run
under mobile devices or other resource restricted platforms, which suffer from
the lack of computing resources, where desktop computers have in abundance
(target of the classical adaptive algorithms), namely main memory and
processing power. While Hash-Merge Join has been designed for the Internet
and desktop platforms, its properties and techniques allow it to properly execute
join operations on mobile devices, but a mobile device still is not its optimal
running scenario (as one can see in the experimental results section).
For that reason, a new adaptive join operator, denoted AJAX (Adaptive
Join Algorithm for eXtreme Restrictions), has been proposed. An operator able
to properly run under very resource restricted platforms. The proposed operator
ensures:
(i) incremental production of results while receiving new tuples;
(ii) query processing continuity even if both data sources are blocked
and;
49
(iii) reaction to low memory contexts, by handling memory overflows in
order to keep the operator running and producing results.
The algorithm modus operandi has been validated through standalone
experiments in the experimental results chapter. In that chapter, one can see
that the above mentioned properties hold in ideal circumstances (namely
abundance of memory and bandwidth) and in very resource restricted
scenarios. Although, the mentioned standalone experiments validate the
algorithm, comparative tests have been run in order to measure AJAX’s
performance and its viability compared to already established algorithms, like
XJoin and Hash-Merge Join. As one can see in section 5.3, AJAX has proven to
be more suited for very resource restricted running scenarios than the other
algorithms.
Moreover, an analytical comparison (section 5.4) has been done with
related works considering the following properties:
(i) result production rate and;
(ii) progress of the query processing even when the delivery of data is
blocked;
In the analytical comparison section one can see that AJAX continuously
processes tuples of both relations, independently of the occurrence of memory
overflow, unlike Adaptive Symmetric Hash Join and Incremental Symmetric
Flush Join, for example, which may decrease the result production rate of those
algorithms. AJAX also does not increase its computational complexity cost with
a sort-merge-like routine trying to avoid the join operation blocking, like Hash-
Merge Join does. By the previously mentioned reason, one can infer that the
greater the cardinality of the relations, the greater is the query response time on
Hash-Merge Join operations. AJAX’s performance is not affected by the size of
the relations. AJAX also does not favor disk fragmentation, since each bucket
has just one in-memory partition and (when overflow occurs) one in-disk
portion; new flushed tuples of those in-disk portions are just appended to that
specific disk block, which prevents excessive disk fragmentation, unlike Hash-
Merge Join. AJAX avoids duplicates and unnecessary probing during the
probing itself; it does not need another execution phase, which certainly may
degrade performance, unlike the additional phase in XJoin. That algorithm also
50
fails at avoiding duplicates, because its duplicate avoiding strategy is based on
timestamps, which can have duplicate values for two tuples arriving
simultaneously at the host machine, depending on its time resolution
capabilities.
By the presented reasons, AJAX can be called an adaptive query
operator, reacting to events which could increase query processing time.
Future work may consist of a new version of AJAX intended to join data
streams in wireless sensor networks.
51
References
1. A. Brayner and E. Vasconcelos
. MobiJoin: um operador de junção para
bancos de dados móveis . In: Workshop
de Comunicação sem Fio e
Computação Móvel, 6., 2004, Fortaleza. Anais... Fortaleza, 2004. p. 153-
160.
2. A. Bra
yner and J. A. Filho. "Sharing Mobile Databases in Dynamically
Configuration Environments". Advanced Information Systems
Engineering. Lecture Notes in Computer Science, 2681. Springer-Verlag,
2003. pages 724-737.
3. A. Brayner and J. A. M. Filho. "AMDB: An Approach for Sharing Mobile
Databases in Dynamically Configurable Environments". Proceedings of
the XVII Brazilian Symposium on Databases (SBBD). October, 14 - 16.
Gramado, Rio Grande do Sul, Brazil. 2002.
4. A. Brayner and J. A. M. Filho. "Increasing Mobile Transaction
Concurrency in Mobile Database Communities". Proceedings of the V
Wireless Communication Workshop (WCSF03). October, 27 - 30. São
Lourenço, Minas Gerais, Brazil. 2003.
5. A. Brayner, J. A. M. Filho, M. Holanda and E. Werbet, n.d. “Ensuring
Mobile Databases Interoperability in Ad Hoc Configurable Environments:
A Plug-and-play Approach” in
New Trends in Database Systems.
Methods, Tools and Applications, Springer-
Verlag. Submitted for
publication.
6. A. Aboulnaga and S. Chaudhuri, “Self-tuning histogra
ms: building
histograms without looking at data,” in SIGMOD ’99: Proceedings of the
1999 ACM SIGMOD international conference on Management of data,
(New York, NY, USA), pp. 181-192, ACM Press, 1999.
7. A. Arasu, S. Babu, and J. Widom, “The CQL continuous query language:
Semantic foundations and query execution,” The VLDB Journal, vol. 15,
no. 2, pp. 121-142, 2006.
8. A. Condon, A. Deshpande, L. Hellerstein, and N.
Wu, “Flow algorithms
for two pipelined filter ordering problems,” in PODS ’06: Proceedings of
the twenty-fifth ACM SIGMOD-SIGACT-
SIGART symposium on
Principles of database systems, (New York, NY, USA), pp. 193-202,
ACM Press, 2006.
9. A. Deshpande and J. M. Hellerstein, “Lifting the burden of history from
adaptive query processing,” in VLDB ’04: Proceedings of the 30th
International Conference on Very Large Data Bases,
Toronto, Canada,
August 29-September 3 2004.
10. A. Deshpande and L. Hellerstein, “Flow algorithms for parallel query
optimization,” Tech. Rep. CS-TR-4873, University of Maryland at College
Park, 2007.
11. A. Deshpande, “An initial study of overheads of eddies,” SIGMOD Rec.,
vol. 33, no. 1, pp. 44-49, 2004.
52
12. A. Deshpande, C. Guestrin, W. Hong, and S. Madden, “Exploiting
correlated attributes in acquisitional query processing,” in ICDE ’05:
Proceedings of the 21st International Conference on Data Engineering
(ICDE’05), (Washington, DC, USA), pp. 143-
154, IEEE Computer
Society, 2005.
13. A. Deshpande, M. Garofalakis, and R. Rastogi, “Independence is good:
dependency-based histogram synopses for high-dimensional data,” in
SIGMOD ’01: Proceedings of the 2001 ACM SIGMOD international
conference on Management of data, (New York, NY, USA), pp. 199-210,
ACM Press, 2001.
14. A. Hulgeri and S. Sudarshan, “AniPQO: Almost non-intrusive parametric
query optimization for nonlinear cost functions.,” in
VLDB ’03:
Proceedings of 29th International Conference on Very Large Data Bases,
September 9-12, 2003, Berlin, Germany, pp. 766-777, 2003.
15. A. Hulgeri and S. Sudarshan, “Parametric query optimization for linear
and piecewise linear cost functions.,” in VLDB ’02: Proceedings of 28th
International Conference on Very Large Data Bases, August 20-23,
2002, Hong Kong, China, pp. 167-178, 2002.
16. A. N. Wilschut and P. M. G. Apers, “Dataflow query execution in a
parallel main-memory environment,” in PDIS ’91: Proceedings of the First
International Conference on Parallel and Distributed Information
Systems, Fontainebleu Hilton Resort, Miami Beach, FL, pp. 68-77, IEEE
Computer Society, 1991.
17. B. Babcock and S. Chaudhuri, “Towards a robust query optimizer: a
principled and practical approach,” in
SIGMOD ’05: Proceedings of the
2005 ACM SIGMOD international conference on Management of data,
(New York, NY, USA), pp. 119-130, ACM Press, 2005.
18. C. M. Chen and N. Roussopoulos, “Adaptive selectivity estimation using
query feedback,” in SIGMOD ’94: Proceedings of the 1994 ACM
SIGMOD international conference on Management of data, (New York,
NY, USA), pp. 161-172, ACM Press, 1994.
19.
D. B. Lange and M. Oshima. “Programming and Deploying Java Mobile
Agents with Aglets”. Addison-Wesley. Massachusetts, USA. 1998. ISBN
0201325829.
20. D. A. Berry and B. Fristedt, Bandit Problems: Sequential Allocation of
Experiments. Springer, October 1985.
21. D. A. Huffman, “A method for the construction of minimum redundancy
codes,” in Proc. Inst. Radio Eng., pp. 1098-1101, 1952.
22. D. Daniels, “Query compilation in a distributed database system,” Tech.
Rep., IBM, 1982. Research Report RJ 3423.
23. D. Kossmann, “The state of the art in distributed query processing,” ACM
Comput. Surv., vol. 32, no. 4, pp. 422-469, 2000.
24. D. Zhang, J. Li, K. Kimeli, and W. Wang, “Sliding window based multi-join
algorithms over distributed data streams,” in
ICDE ’06: Proceedings of
the 22nd International Conference on Data Engineering (ICDE’06),
53
(Washington, DC, USA), p. 139, IEEE Computer Society, 2006.
25. E. Werbet and A. Brayner. “AJAX - Adaptive Join Algorithm for Extreme
Restrictions”.
Proceedings of the XXII Brazilian Symposium on
Databases (SBBD). October, 15 -
19. Joao Pessoa, Paraiba, Brazil.
2007.
26. E. A. Rundensteiner, L. Ding, T. M. Sutherland, Y. Zhu, B. Pielech, and
N. Mehta, “CAPE: Continuous query engine with heterogeneous-grained
adaptivity,” in VLDB ’04: Proceedings of the Thirtieth International
Conference on Very Large Data Bases, Toronto, Canada, pp. 1353-
1356, 2004.
27. E. F. Codd, “A relational model of data for large shared data banks,”
Commun. ACM, vol. 26, no. 1, pp. 64-69, 1983.
28. E. Viglas and S.-J. F. Naughton,
Novel Query Optimization and
Evaluation Techniques
. PhD thesis, University of Wisconsin at Madison,
2003.
29. E. Wong and K. Youssefi, “Decomposition --
strategy for query
processing,” ACM Transactions on Database Systems, vol. 1, no. 3,
pp. 223-241, 1976.
30. F. Chu, J. Halpern, and J. Gehrke, “Least expected
cost query
optimization: what can we expect?,” in
PODS ’02: Proceedings of the
twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of
database systems, (New York, NY, USA), pp. 293-
302, ACM Press,
2002.
31. F. Chu, J. Y. Halpern, and P. Seshadri, “Least expected cost query
optimization: an exercise in utility,” in
PODS ’99: Proceedings of the
eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of
database systems, (New York, NY, USA), pp. 138-
147, ACM Press,
1999.
32. F. Tian and D. J. DeWitt, “Tuple routing strategies for distributed eddies,”
in VLDB ’03: Proceedings of 29th International Conference on Very
Large Data Bases, pp. 333-
344, Berlin, Germany: Morgan Kaufmann,
September 9-12 2003.
33.
G. Z. Ives, D. Florescu, M. Friedman, A. Levy and D. S. Weld. “An
Adaptative Query Execution System for Data Integration”. Proceedings of
the 1999 ACM SIGMOD Internati
onal conference on Management of
Data. May 31
Jun 03. Philadelphia, Pennsylvania, United States.1999.
pages 299 - 310.
34. G. Antoshenkov and M. Ziauddin, “Query processing and optimization in
Oracle Rdb,” The VLDB Journal, vol. 5, no. 4, pp. 229-237, 1996.
35. G. Graefe and K. Ward, “Dynamic query evaluation plans,” in SIGMOD
’89: Proceedings of the 1989 ACM SIGMOD international conference on
Management of data, (New York, NY, USA), pp. 358-
366, ACM Press,
1989.
36. G. Graefe and W. J. McKenna, “The Volcano optimizer generator:
Extensibility and efficient search,” in ICDE ’93: Proceedings of the Ninth
54
International Conference on Data Engineering, Vienna, Austria, pp. 209-
218, IEEE Computer Society, April 19-23 1993.
37. G. Graefe, “Query evaluation techniques for large databases,” ACM
Comput. Surv., vol. 25, no. 2, pp. 73-169, 1993.
38. G. Graefe, “The Cascades framework for query optimization,” IEEE Data
Engineering Bulletin, vol. 18, no. 3, pp. 19-29, 1995.
39. G. Graefe, R. Bunker, and S. Cooper, “Hash joins and hash teams in
Microsoft SQL Server,” in VLDB ’98: Proceedings of 24th International
Conference on Very Large Data Bases, pp. 86-
97, Morgan Kaufman,
August 24-27 1998.
40. H. Guo, P.-Å. Larson, R. Ramakrishnan, and J. Goldstein, “Relaxed
currency and consistency: how to say ”good enough” in SQL,” in
SIGMOD ’04:
Proceedings of the 2004 ACM SIGMOD international
conference on Management of data, (New York, NY, USA), pp. 815-826,
ACM Press, 2004.
41. H. Kaplan, E. Kushilevitz, and Y. Mansour, “Learn
ing with attribute
costs,” in STOC ’05: Proceedings of the thirty-
seventh annual ACM
symposium on Theory of computing, (New York, NY, USA), pp. 356-365,
ACM Press, 2005.
42. H. Simon and J. Kadane, “Optimal problem-solving search: All-or-none
solutions,” Artificial Intelligence, vol. 6, pp. 235-247, 1975.
43. H.-G. Li, S. Chen, J. Tatemura, D. Agrawal, K. S. Candan, and W.-P.
Hsiung, “Safety guarantee of continuous join queries over punctuated
data streams,” in
VLDB ’06: Proceedings of the 32nd international
conference on Very large data bases, pp. 19-
30, VLDB Endowment,
2006.
44. H.-T. Chou and D. J. DeWitt, “An evaluation of buffer management
strategies for relational database systems,” in VLDB ’85: Proceedings of
11th International Conference on Very Large Data Bases, pp. 127-141,
Stockholm, Sweden: Morgan Kaufmann, August 21-23 1985.
45. I. S. Mumick, S. J. Finkelstein, H. Pirahesh, and R. Ramakrishnan,
“Magic is relevant,” in
SIGMOD ’90: Proceedings of the 1990 ACM
SIGMOD international conference on Management of data, (New York,
NY, USA), pp. 247-258, ACM Press, 1990.
46. J. A. M. Filho. “AMDB
An approach for Sharing Mobile Databases”.
Master Dissertation. Universidade de Fortaleza (UNIFOR). 2003.
47.
J. P. Haas, G. Luo, C. J. Ellmann and J. F. Naughton. “A Scalable Hash
Ripple Join Algorithm”. Proceeding
s of the ACM SIGMOD International
Conference on Management of Data. June 4-
6, Madison, Wisconsin,
USA. 2002.
48.
J. P. Macker and M. S. Corson. “Mobile Ad Hoc Networks and the IETF”,
Internet Engineering Task Force, Mobile Ad Hoc Network (MANET)
Working Group.
49. J. Chen, D. J. DeWitt, F. Tian, and Y. Wang, “NiagaraCQ: a scalable
continuous query system for internet databases,” in SIGMOD ’00:
55
Proceedings of the 2000 ACM SIGMOD international conference on
Management of data, (New York, NY, USA), pp. 379-390, ACM Press,
2000.
50. J. Claussen, A. Kemper, G. Moerkotte, K. Peithner, and M. Steinbrunn,
“Optimization and evaluation of disjunctive queries,”
IEEE Transactions
on Knowledge and Data Engineering, vol. 12, no. 2, pp. 238-260, 2000.
51. J. M. Hellerstein, “Optimization techniques for queries with expensive
methods,” ACM Transactions on Database Systems, vol. 23, no. 2,
pp. 113-157, 1998.
52. J. M. Hellerstein, R. Avnur, A. Chou, C. Hidber, C. Olston, V. Raman,
T. Roth, and P. J. Haas, “Interactive data analysis: The Control project,”
Computer, vol. 32, no. 8, pp. 51-59, 1999.
53. J. Naughton, D. DeWitt, D. Maier, A. Aboulnaga, J. Chen, L. Galanis,
J. Kang, R. Krishnamurthy, Q. Luo, N. Prakash, R. Ramamurthy,
J. Shanmugasundaram, F. Tian, K. Tufte, S. Viglas, Y. Wang, C. Zhang,
B. Jackson, A. Gupta, and R. Chen, “The niagara internet query system,”
IEEE Data Engineering Bulletin, June 2001.
54. J. Shanmugasundaram, K. Tufte, D. J. DeWitt, J. F. Naughton, and
D. Maier, “Architecting a network query engine for prod
ucing partial
results,” in ACM SIGMOD Workshop on the Web (WebDB) 2000, Dallas,
TX, pp. 17-22, 2000.
55. J.-H. Hwang, M. Balazinska, A. Rasin, U. Cetintemel, M. Stonebraker,
and S. Zdonik, “High-
availability algorithms for distributed stream
processing,” in
ICDE ’05: Proceedings of the 21st International
Conference on Data Engineering (ICDE’05)
, (Washington, DC, USA),
pp. 779-790, IEEE Computer Society, 2005.
56. K. Loudon. Mastering Algorithms in C. O’Reilly, New York. USA. 1999.
ISBN: 1-56592-453-3.
57. K. Gao, S. Harizopoulos, I. Pandis, V. Shkapenyuk, and A. Ailamaki,
“Simultaneous pipelining in QPipe: Exploiting work sharing opportunities
across queries,” in ICDE ’06: Proceedings of the 22nd International
Conference on Data Engineering (ICDE’06)
, (Washington, DC, USA),
p. 162, IEEE Computer Society, 2006.
58. K. Munagala, S. Babu, R. Motwani, and J.
Widom, “The pipelined set
cover problem.,” in
ICDT ’05: Proceedings of the 10th International
Conference, Edinburgh, UK, pp. 83-98, 2005.
59. K. Munagala, U. Srivastava, and J. Widom, “Optimization of continuous
queries with shared expensive filters,” in PODS ’07: Proceedings of the
twenty-sixth ACM SIGMOD-SIGACT-
SIGART symposium on Principles
of database systems, (New York, NY, USA), pp. 215-
224, ACM Press,
2007.
60. L. Amsaleg, M. J. Franklin, A. Tomasic, and T. Urhan, “Scrambling query
plans to cope with unexpected delays,” in Proceedings of the Fourth
International Conference on Parallel and Distributed Information
56
Systems, Miami Beach, FL, pp. 208-219, IEEE Computer Society,
December 18-20 1996.
61. L. Bellatreche, K. Karlapalem, M. K. Mohania, and M. Schneider, “What
can partitioning do for your data warehouses and data marts?,” in IDEAS
’00:
Proceedings of the 2000 International Symposium on Database
Engineering & Applications, (Washington, DC, USA), pp. 437-446, IEEE
Computer Society, 2000.
62. L. Ding and E.
A. Rundensteiner, “Evaluating window joins over
punctuated streams,” in
CIKM ’04: Proceedings of the thirteenth ACM
international conference on Information and knowledge management,
(New York, NY, USA), pp. 98-107, ACM Press, 2004.
63. L. F. Mackert and G.
M. Lohman, “R* optimizer validation and
performance evaluation for distributed queries,” in
VLDB ’86:
Proceedings of the 12th Internationa
l Conference on Very Large Data
Bases, (San Francisco, CA, USA), pp. 149-
159, Morgan Kaufmann
Publishers Inc., 1986.
64. L. Getoor, B. Taskar, and D. Koller, “Selectivity estimation using
probabilistic models,” in SIGMOD ’01: Proceedings of the 2001 ACM
SIGMOD international conference on Management of data, (New York,
NY, USA), pp. 461-472, ACM Press, 2001.
65. L. M. Haas, J. C. Freytag, G. M. Lohman, and H. Pirahesh, “Extensible
query processing in starburst,” in
SIGMOD ’89: Proceedings of the 1989
ACM SIGMOD international conference on Management of data, (New
York, NY, USA), pp. 377-388, ACM Press, 1989.
66. L. Raschid and S. Y. W. Su, “A parallel processing strategy for evaluating
recursive queries,” in VLDB ’86: Proceedings of the 12th International
Conference on Very Large Data Bases, pp. 412-419, Morgan Kaufmann
Publishers Inc., 1986.
67. Large Data Bases, (San Francisco, CA, USA), pp. 128-137, Morgan
Kaufmann Publishers Inc., 1986.
68. M. A. Shah, J. M. Hellerstein, and E. Brewer, “Highly available, fault-
tolerant, parallel dataflows,” in SIGMOD ’04: Proceedings of the 2004
ACM SIGMOD international conference on Management of data, (New
York, NY, USA), pp. 827-838, ACM Press, 2004.
69. M. A. Shayman and E. Fernandez-Gaucherand, “Risk-sensitive decision-
theoretic diagnosis,” IEEE Transactions on Automatic Control, vol. 46,
pp. 1166-1171, 2001.
70. M. Balazinska, H. Balakrishnan, S. Madden, and M. Stonebraker, “Fault-
tolerance in the Borealis distributed stream processing system,” in
SIGMOD ’05: Proceedings of the 2005 ACM
SIGMOD international
conference on Management of data, (New York, NY, USA), pp. 13-24,
ACM Press, 2005.
71. M. Herbster and M. K. Warmuth, “Tracking the best expert,” Machine
Learning, vol. 32, pp. 151-178, August 1998.
72. M. Kearns and S. Singh, “Near-optimal reinforcement learning in
57
polynomial time,” Machine Learning, vol. 49, pp. 260-268, November
2002.
73. M. Kutsch, P. J. Haas, V. Markl, N. Megiddo, and T. M. Tran, “Integrating
a maximum-entropy cardinality estimator into DB2 UDB,” in EDBT ’06:
Proceedings of the 10th International Conference on Extending
Database Technology, 2006.
74. M. Stillger, G. Lohman, V. Markl, and M. Kandil, “LEO - DB2’s LEarning
Optimizer,” in
VLDB ’01: Proceedings of 27th International Conference
on Very Large Data Bases, Morgan Kaufmann, September 11-14 2001.
75. M. Stonebraker, E. Wong, P. Kreps, and G. Held, “The design and
implementation of Ingres,” ACM Transactions on Database Systems,
vol. 1, no. 3, pp. 189-222, 1976.
76. M. Templeton, H. Henley, E. Maros, and D. J.
V. Buer, “InterViso:
Dealing with the complexity of federated database access,” The VLDB
Journal, vol. 4, no. 2, 1995.
77. M.S. Chen, M. L. Lo, P. S. Yu, H. C. Young, Using segmented right-deep
trees for the execution of pipelined hashjoins.," in Proc 18th VLDB Conf,
Vancouver, Canada, August 23-27,1992.
78. Mokbel, Lu et al.”Hash-merge Join: A Non-
blocking Join algorithm for
Producing Fast and Early Join Results”. Proceedings of the International
Conference on Data Engineering, 2004.
79. N. Bruno and S.
Chaudhuri, “Exploiting statistics on query expressions
for optimization,” in
SIGMOD ’02: Proceedings of the 2002 ACM
SIGMOD international conference on Management of data, (New York,
NY, USA), pp. 263-274, ACM Press, 2002.
80. N. Bruno, S. Chaudhuri, and L. Gravano, “STHoles: a multidimensional
workload-aware histogram,” in
SIGMOD ’01: Proceedings of the 2001
ACM SIGMOD international conference on Management of data, (New
York, NY, USA), pp. 211-222, ACM Press, 2001.
81. N. Kabra and D. J. DeWitt, “Efficient mid-query re-optimization of sub-
optimal query execution plans,” in SIGMOD ’98: Proceedings of the 1998
ACM SIGMOD international conference on Management of data, (New
York, NY, USA), pp. 106-117, ACM Press, 1998.
82. N. Polyzotis, “Selectivity-based partitioning: A divide-and-union paradigm
for effective query optimization,” in CIKM ’05: Proceedings of the 14th
ACM International Conference on Information and knowledge
management, pp. 720-727, New York, NY: ACM Press, 2005.
83.
NIST. Dictionary of Algorithms and Data Structures.
http://www.nist.gov/dads/.
84. O. Etzioni, S. Hanks, T. Jiang, R. M. Karp, O. Madani, and O. Waarts,
“Efficient information gathering on the internet,” in FOCS96:
Proceedings of the 37th Annual Symposium on Foundations of Computer
Science, pp. 234-243, IEEE Computer Society, 1996.
85.
P. J. Haas and J. M. Hellerstein. “Ripple Joins for Online Aggregation”.
Proceedings of the 1999 ACM SIGMOD International Conference on
58
Management of Data. May, 31 Jun, 03. Philadelphia, Pennsylvania,
United States.1999. pages 287-298.
86. P. A. Tucker and D. Maier, “Exploiting punctuation semantics in data
streams,” in
ICDE ’02: Proceedings of the 18th International Conference
on Data Engineering, (Washington, DC, USA), p.
279, IEEE Computer
Society, 2002.
87. P. Bizarro and D. DeWitt, “Adaptive and robust query processing with
SHARP,” Tech. Rep. 1562, University of Wisconsin - Madison, CS Dept.,
2006.
88. P. Bizarro, S. Babu, D. DeWitt, and J. Widom, “Content-based routing:
different plans for different data,” in VLDB ’05: Proceedings of the 31st
international conference on Very large data bases, pp. 757-768, VLDB
Endowment, 2005.
89. P. D. Bra and J.
Paredaens, “Horizontal decompositions and their impact
on query solving,” SIGMOD Rec., vol. 13, no. 1, pp. 46-50, 1982.
90. P. G. Bizarro, Adaptive Query Processing: Dealing with Incomplete and
Uncertain Statistics. PhD thesis, University of Wisconsin - Madison,
2006.
91. P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G.
Price, “Access path selection in a relational database management
system,” in
SIGMOD ’79: Proceedings of the 1979 ACM SIGMOD
International Conference on Management of Data, 1979.
92. P. J. Haas and J. M. Hellerstein, “Ripple joins for online aggregation,” in
SIGMOD ’99: Proceedings of the 1999 ACM SIGMOD international
conference on Management of data, (New York, NY, USA), pp. 287-298,
ACM Press, 1999.
93. P. Roy, S. Seshadri, S. Sudarshan, and S.
Bhobe, “Efficient and
extensible algorithms for multi query optimization,” in
SIGMOD ’00:
Proceedings of the
2000 ACM SIGMOD international conference on
Management of data, (New York, NY, USA), pp. 249-
260, ACM Press,
2000.
94. P. Seshadri, H. Pirahesh, and T. Y. C. Leung, “Complex query
decorrelation,” in ICDE ’96: Proceedings of the Twelfth International
Conference on Data Engineering, New Orleans, LA, pp. 450-458,
February 26-March 1 1996.
95. P. Seshadri, J. M. Hellerstein, H. Pirahesh, T. Y.
C. Leung,
R. Ramakrishnan, D. Srivastava, P. J. Stuckey, and S. Sudarshan,
“Cost-based optimization for magic: Algebra and implementation,” in
SIGMOD ’96: Proceedings of the 1996 ACM
SIGMOD International
Conference on Management of Data, pp. 435-446, ACM Press, 1996.
96. R. Avnur and J. M. Hellerstein, “Eddies: continuously adaptive query
processing,” in SIGMOD ’00: Proceedings of the 2000 ACM SIGMOD
international conference on Management of data, (New York, NY, USA),
pp. 261-272, ACM Press, 2000.
59
97. R. Goldman and J. Widom, “DataGuides: Enabling query formulation and
optimization in semistructured databases,” in VLDB ’97: Proceedings of
23rd International Conference on Very Large Data Bases, pp. 436-445,
Athens, Greece: Morgan Kaufman, August 25-29 1997.
98. R. Goldman and J.
Widom, “WSQ/DSQ: a practical approach for
combined querying of databases and the web,” in
SIGMOD ’00:
Proceedings of the 2000
ACM SIGMOD international conference on
Management of data, (New York, NY, USA), pp. 285-
296, ACM Press,
2000.
99. R. H. Arpaci-Dusseau, E. Anderson, N. Treuhaft, D. E. Culler, J. M.
Hellerstein, D. Patterson, and K. Yelick, “Cluster I/O with River: making
the fast case common,” in IOPADS ’99:
Proceedings of the sixth
workshop on I/O in parallel and distributed systems, (New York, NY,
USA), pp. 10-22, ACM Press, 1999.
100. R. Krishnamurthy, H. Boral, and C. Zaniolo, “Optimization of
nonrecursive queries,” in VLDB ’86: Proceedings of the 12th International
Conference on Very
101. R. L. Cole and G.
Graefe, “Optimization of dynamic query evaluation
plans,” in
SIGMOD ’94: Proceedings of the 1994 ACM SIGMOD
International Conference on Management of Data, pp. 150-160,
Minneapolis, MN: ACM Press, May 24-27 1994.
102. R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar,
G. Manku, C. Olston, J. Rosenstein, and R. Varma, “Query processing,
resource management, and approximation in a data stream management
system,” in
CIDR ’03: First Biennial Conference on Innovative Data
Systems Research, Asilomar, CA, 2003.
103. S. Babu and J. Widom, “Continuous queries over data streams,
SIGMOD Rec., vol. 30, no. 3, pp. 109-120, 2001.
104. S. Babu and J.
Widom, “StreaMon: an adaptive engine for stream
query processing,” in
SIGMOD ’04: Proceedings of the 2004 ACM
SIGMOD international conference on Management of data, (New York,
NY, USA), pp. 931-932, ACM Press, 2004.
105. S. Babu and P.
Bizarro, “Adaptive query processing in the looking
glass,” in CIDR ’05: Sec
ond Biennial Conference on Innovative Data
Systems Research, pp. 238-249, Asilomar, CA, 2005.
106. S. Babu, K. Munagala, J. Widom, and R. Motwani, “Adaptive caching
for continuous queries,” in
ICDE ’05: Proceedings of the 21st
International Conference on Data Engineering (ICDE’05), (Washington,
DC, USA), pp. 118-129, IEEE Computer Society, 2005.
107. S. Babu, P. Bizarro, and D. DeWitt, “Proactive re-optimization,” in
SIGMOD ’05: Proceedings of the 2005 ACM SIGMOD international
conference on Management of data, Baltimore, MD, pp. 107-118, New
York, NY: ACM Press, June 14-16 2005.
108. S. Babu, R. Motwani, K. Munagala, I. Nishizawa, and J. Widom,
“Adaptive ordering of pipelined stream filters,” in SIGMOD ’04:
60
Proceedings of the 2004 ACM SIGMOD international conference on
Management of data, (New York, NY, USA), pp. 407-
418, ACM Press,
2004.
109. S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M.
Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, V. Raman, F. Reiss,
and M. A. Shah, “TelegraphCQ: Continuous dataflow processing for an
uncertain world,” in
CIDR ’03: First Biennial Conference on Innovative
Data Systems Research, Asilomar, CA, 2003.
110. S. Chaudhuri and K. Shim, “Including group-by in query optimization,
in
VLDB ’94: Proceedings of the 20th International Conference on Very
Large Data Bases, (San Francisco, CA, USA), pp. 354-
366, Morgan
Kaufmann Publishers Inc., 1994.
111. S.
Chaudhuri, “An overview of query optimization in relational
systems,” in PODS ’98: Proceedings of the seventeenth ACM SIGACT-
SIGMOD-SIGART symposium on Principles of database systems, (New
York, NY, USA), pp. 34-43, ACM Press, 1998.
112. S. Chaudhuri, U. Dayal, and T.
W. Yan, “Join queries with external
text sources: Execution and optimization techniques,” in SIGMOD ’95:
Proceedings
of the 1995 ACM SIGMOD International Conference on
Management of Data, pp. 410-422, ACM Press, May 26 1995.
113. S. Ewen, H. Kache, V. Markl, and V. Raman, “Progressive query
optimization for federated queries,” in EDBT ’06: Proceedings of the 10th
International Conference on Extending Database Technology, pp. 847-
864, 2006.
114. S. Ganguly, “Design and analysis of parametric query optimization
algorithms,” in VLDB ’98: Proce
edings of the 24th International
Conference on Very Large Data Bases, pp. 228-238, Morgan Kaufmann,
August 24-27 1998.
115. S. Ganguly, W. Hasan, and R. Krishnamurthy, “Query optimization for
parallel execution,” in
SIGMOD ’92: Proceedings of the 1992 ACM
SIGMOD international conference on Management of data, (New York,
NY, USA), pp. 9-18, ACM Press, 1992.
116. S. Madden, M. Shah, J. M. Hellerstein, and V. Raman, “Continuously
adaptive continuous queries over streams,” in SIGMOD ’02: Proceedings
of
the 2002 ACM SIGMOD International Conference on Management of
Data, pp. 49-60, ACM Press, 2002.
117. S. V. U. M. Rao, Parametric Query Optimization: A Non-Geometric
Approach. Master’s thesis, IIT Kanpur, 1999.
118. S. Viglas, J. F. Naughton, and J. Burger, “Maximizing the output rate
of multi-way join queries over streaming information sources,” in VLDB
’03:
Proceedings of the 29th International Conference on Very Large
Data Bases, Berlin, Germany: Morgan Kaufmann, September 9-12 2003.
119.
Soares, M. and Brayner, A. “MXQUERY: An XQuery like
Multidatabase Language”. Submitted for publication. 2003.
120. T. Urhan and J. M. Franklin. “XJoin: A Reactively-Scheduled
61
Pipelined Join Operator”. IEEE Data Engineering Bulletin. Vol. 23 No. 2.
June, 2000. pages 27-33.
121. T. Urhan and M. J. Franklin. “XJoin: A Reactively-Scheduled
Pipelined Join Operator”. IEEE Data Engineering Bulletin. Vol. 23 No. 2.
June, 2000. pages 27-33.
122. T. Ibaraki and T. Kameda, “On the optimal nesting ord
er for
computing N-relational joins,” ACM Transactions on Database Systems,
vol. 9, no. 3, pp. 482-502, 1984.
123. T. K. Sellis, “Multiple-query optimization,”
ACM Trans. Database
Syst., vol. 13, no. 1, pp. 23-52, 1988.
124. T. Legler, W. Lehner, and A.
Ross, “Data mining with the SAP
NetWeaver BI accelerator,” in
VLDB ’06: Proceedings of the 32nd
international conference on Very large data bases, pp. 1059-1068, VLDB
Endowment, 2006.
125. T. Urhan and M. J. Franklin, “XJoin: a reactively-scheduled pipelined
join operator,” IEEE Data Engineering Bulletin, vol. 23, no. 2, pp. 27-33,
2000.
126. T. Urhan, M. J. Franklin, and L. Amsaleg, “Cost b
ased query
scrambling for initial delays,” in
SIGMOD ’98: Proceedings of the 1998
ACM SIGMOD International Conference on Management of Data,
pp. 130-141, Seattle, WA: ACM Press, June 2-4 1998.
127. U. Feige, L. Lovász, and P. Tetali, “Approximating min-
sum set
cover,” Algorithmica, vol. 40, no. 4, pp. 219-234, 2004.
128. U. Srivastava, K. Munagala, and J. Widom, “Operator placement for
in-network stream query processing,” in PODS ’05: Proceedings of the
Twenty-Fourth ACM SIGMOD-SIGACT-
SIGART Symposium on
Principles of Database Systems, pp. 250-258, 2005.
129. U. Srivastava, K. Munagala, J. Widom, and R.
Motwani, “Query
optimization over web services,” in
VLDB ’06: Proceedings of the 32nd
international conference on Very large data bases, pp. 355-366, VLDB
Endowment, 2006.
130. V. G. V. Prasad, Parametric Query O
ptimization: A Geometric
Approach. Master’s thesis, IIT Kanpur, 1999.
131. V. Markl, V. Raman, D. Simmen, G. Lohman, H.
Pirahesh, and
M. Cilimdzic, “Robust query
processing through progressive
optimization,” in SIGMOD ’04: Proceedings
of the 2004 ACM SIGMOD
international conference on Management of data, (New York, NY, USA),
pp. 659-670, ACM Press, 2004.
132. V. Raman and J.
M. Hellerstein, “Partial results for online query
processing,” in SIG
MOD ’02: Proceedings of the 2002 ACM SIGMOD
International Conference on Management of Data, pp. 275-
286, ACM
Press, 2002.
133. V. Raman, A. Deshpande, and J.
M. Hellerstein, “Using state
modules for adaptive query processing.,” in ICDE ’03: Proceedings of the
19th International Conference on Data Engineering, Bangalore, India,
62
pp. 353-364, 2003.
134. V. Raman, B. Raman, and J.
M. Hellerstein, “Online dynamic
reordering for interactive data processing,” in VLDB ’99: Proceedings of
the 25th International Conference on Very Large Data Bases, pp. 709-
720, Edinburgh, Scotland: Morgan Kaufmann, 1999.
135. W. Kim, “On optimizing an SQL-like nested query,” ACM Transactions
on Database Systems, vol. 7, no. 3, pp. 443-469, 1982.
136. W.-S. Li, V. S. Batra, V. Raman, W. Han, and I. Narang, “QoS-based
data access and placement for federated systems,” in
VLDB ’05:
Proceedings of the 31st internatio
nal conference on Very large data
bases, pp. 1358-1362, VLDB Endowment, 2005.
137. Y. E. Ioannidis and S.
Christodoulakis, “On the propagation of errors
in the size of join results,” in SIGMOD ’91: Proceedings of the 1991 ACM
SIGMOD international conference on Management of data, (New York,
NY, USA), pp. 268-277, ACM Press, 1991.
138. Y. E. Ioannidis, “Query optimization,” ACM Computing Surveys,
vol. 28, no. 1, pp. 121-123, 1996.
139. Y. E. Ioannidis, R. T. Ng, K. Shim, and T. K. Sellis, “Parametric query
optimization,” The VLDB Journal, vol. 6, no. 2, pp. 132-151, 1997.
140. Y. Zhu, E. A. Rundensteiner, and G. T. Heineman, “Dynamic plan
migration for continuous queries over data streams,” in SIGMOD ’04:
Proceedings of the 2004 ACM SIGMOD international conference on
Management of data, (New York, NY, USA), pp. 431-442, ACM Press,
2004.
141. Z. G. Ives and N. E. Taylor, “Sideways information passing for push-
style query processing,” Tech. Rep. MS-CIS-07-
14, University of
Pennsylvania, 2007.
142. Z. G. Ives, A. Y. Halevy, and D.
S. Weld, “Adapting to source
properties in processing data integration queries,” in
SIGMOD ’04:
Proceedings of the 2004
ACM SIGMOD international conference on
Management of data, (New York, NY, USA), pp. 395-
406, ACM Press,
2004.
143. Z. G. Ives, D. Florescu, M. Friedman, A. Levy, and D. S. Weld, “An
adaptive query execution system for data integration,” in SIGMOD 99:
Proceedings of the 1999 ACM SIGMOD international conference on
Management of data, (New York, NY, USA), pp. 299-310, ACM Press,
1999.
144. Z. G. Ives, Efficient Query Processing for Data Integration
. PhD
thesis, University of Washington, August 2002.
145. Z. Zabret, Dynamic Query Scheduling in Data Integration Systems,
Proceedings of the 16th International Conference on Data Engineering,
p.425, February 28-March 03, 2000.
Livros Grátis
( http://www.livrosgratis.com.br )
Milhares de Livros para Download:
Baixar livros de Administração
Baixar livros de Agronomia
Baixar livros de Arquitetura
Baixar livros de Artes
Baixar livros de Astronomia
Baixar livros de Biologia Geral
Baixar livros de Ciência da Computação
Baixar livros de Ciência da Informação
Baixar livros de Ciência Política
Baixar livros de Ciências da Saúde
Baixar livros de Comunicação
Baixar livros do Conselho Nacional de Educação - CNE
Baixar livros de Defesa civil
Baixar livros de Direito
Baixar livros de Direitos humanos
Baixar livros de Economia
Baixar livros de Economia Doméstica
Baixar livros de Educação
Baixar livros de Educação - Trânsito
Baixar livros de Educação Física
Baixar livros de Engenharia Aeroespacial
Baixar livros de Farmácia
Baixar livros de Filosofia
Baixar livros de Física
Baixar livros de Geociências
Baixar livros de Geografia
Baixar livros de História
Baixar livros de Línguas
Baixar livros de Literatura
Baixar livros de Literatura de Cordel
Baixar livros de Literatura Infantil
Baixar livros de Matemática
Baixar livros de Medicina
Baixar livros de Medicina Veterinária
Baixar livros de Meio Ambiente
Baixar livros de Meteorologia
Baixar Monografias e TCC
Baixar livros Multidisciplinar
Baixar livros de Música
Baixar livros de Psicologia
Baixar livros de Química
Baixar livros de Saúde Coletiva
Baixar livros de Serviço Social
Baixar livros de Sociologia
Baixar livros de Teologia
Baixar livros de Trabalho
Baixar livros de Turismo