Kneević, Predrag (2007)
High Data Availability and Consistency for Distributed Hash Tables Deployed in Dynamic Peer-to-peer Communities.
Technische Universität Darmstadt
Dissertation, Erstveröffentlichung
Kurzbeschreibung (Abstract)
Decentralized and peer-to-peer computing, as a subset of distributed computing, are seen as enabling technologies for future Internet applications. However, many of them require some sort of data management. Apart from currently popular P2P file-sharing, there are already application scenarios that need data management similar to existing distributed data management, but being deployable in highly dynamic environments. Due to frequent changes of the peer availability, an essential issue in peer-to-peer data management is to keep data highly available and consistent with a very high probability. Usually, data are replicated at creation, hoping that at least one replica is available when needed. However, due to unpredictable behavior of peers, their availability varies and the number of configured replicas might not guarantee the intended data availability. Instead of fixing the number of replicas, the requested guarantees should be achieved by adapting the number of replicas at run-time in an autonomous way. This thesis presents a decentralized and self-adaptable replication protocol that is able to guarantee high data availability and consistency fully transparently in a dynamic Distributed Hash Table. The proposed solution is generic and can be applied on the top of any DHT implementation that supports common DHT API. The protocol can detect a change in the peer availability and the replication factor will be adjusted according to the new settings, keeping or recovering the requested guarantees. The protocol is based on two important assumptions: (1) ability to build and manage a decentralized replica directory and (2) ability to measure precisely the actual peer availability in the system. The replica directory is built on top of the DHT by using a key generation schema and wrapping replicas with additional system information such as version and replica ordinal number. The way in which replicas are managed in the DHT helps us to define a measurement technique for estimating peer availability. Peers cannot be checked directly, due to the fact that the common DHT API does not reveal any details about the underlying DHT topology. The peer availability is computed based on the measured the availability of replicas. With the help of confidence interval theory, it is possible to determine the sufficient number of probes that produces results with an acceptable error. Finally, two variants of the protocol are defined: one that assumes that data are immutable, and another one without such a limitation. A theoretical model is developed to determine the sufficient number of replicas needed to deliver the pre-configured guarantees. If a peer detects that the current availability of peers and the replication factor are not sufficient for maintaining the guarantees, the sufficient replication factor will be determined according to the measured availability of peers. Knowing the previous and new replication factor, the peer is able to insert into the DHT additional replicas of data managed in its local storage. On the other hand, if the number of replicas is higher than needed, the peer will remove unnecessary replicas from its storage, reducing the storage overhead. Replication makes the consistency of data harder to maintain. Every logical update is translated into a set of replica updates. Due to the dynamic nature of the DHT, many replicas can be unavailable at the time when an application issues an update request. Such conditions force the usage of some weak-consistency models that updates all available replicas and synchronizes all the others eventually when they become online again. Until this is achieved, none of guarantees about the consistency of the accessed data cannot be given. The proposed protocol implements a weak-consistency mechanism that, unlike the others, is able to provide an arbitrary high probabilistic guarantees about the consistency of available data before all replicas are synchronized. This is done by updating the available and inserting the new version of all offline replicas. As soon as a replica becomes available again, it is informed about missing updates, and is merged with the new version. Such approach ensures that at least one consistent replica is available with a high probability when data are requested. The approach presented was evaluated by using a custom-made simulator. The requested availability and consistency levels are fully guaranteed in the DHT with a stable or increasing peer availability. During churns (periods when the peer availability decreases), the guarantees are maintained only in cases when the dynamic of churns is low (the peer availability decreases slowly). Faster changes cannot be compensated fully, but eventually, after the system stabilizes enough replicas will be generated, and the guarantees will be recovered.
Typ des Eintrags: | Dissertation | ||||
---|---|---|---|---|---|
Erschienen: | 2007 | ||||
Autor(en): | Kneević, Predrag | ||||
Art des Eintrags: | Erstveröffentlichung | ||||
Titel: | High Data Availability and Consistency for Distributed Hash Tables Deployed in Dynamic Peer-to-peer Communities | ||||
Sprache: | Englisch | ||||
Referenten: | Aberer, Prof. Dr. Karl ; Buchmann, Prof. Dr. Alejandro ; Steinmetz, Prof. Dr.- Ralf | ||||
Berater: | Neuhold, Prof. Dr. Erich J. | ||||
Publikationsjahr: | 27 Juli 2007 | ||||
Ort: | Darmstadt | ||||
Verlag: | Technische Universität | ||||
Datum der mündlichen Prüfung: | 10 Juli 2007 | ||||
URL / URN: | urn:nbn:de:tuda-tuprints-8547 | ||||
Kurzbeschreibung (Abstract): | Decentralized and peer-to-peer computing, as a subset of distributed computing, are seen as enabling technologies for future Internet applications. However, many of them require some sort of data management. Apart from currently popular P2P file-sharing, there are already application scenarios that need data management similar to existing distributed data management, but being deployable in highly dynamic environments. Due to frequent changes of the peer availability, an essential issue in peer-to-peer data management is to keep data highly available and consistent with a very high probability. Usually, data are replicated at creation, hoping that at least one replica is available when needed. However, due to unpredictable behavior of peers, their availability varies and the number of configured replicas might not guarantee the intended data availability. Instead of fixing the number of replicas, the requested guarantees should be achieved by adapting the number of replicas at run-time in an autonomous way. This thesis presents a decentralized and self-adaptable replication protocol that is able to guarantee high data availability and consistency fully transparently in a dynamic Distributed Hash Table. The proposed solution is generic and can be applied on the top of any DHT implementation that supports common DHT API. The protocol can detect a change in the peer availability and the replication factor will be adjusted according to the new settings, keeping or recovering the requested guarantees. The protocol is based on two important assumptions: (1) ability to build and manage a decentralized replica directory and (2) ability to measure precisely the actual peer availability in the system. The replica directory is built on top of the DHT by using a key generation schema and wrapping replicas with additional system information such as version and replica ordinal number. The way in which replicas are managed in the DHT helps us to define a measurement technique for estimating peer availability. Peers cannot be checked directly, due to the fact that the common DHT API does not reveal any details about the underlying DHT topology. The peer availability is computed based on the measured the availability of replicas. With the help of confidence interval theory, it is possible to determine the sufficient number of probes that produces results with an acceptable error. Finally, two variants of the protocol are defined: one that assumes that data are immutable, and another one without such a limitation. A theoretical model is developed to determine the sufficient number of replicas needed to deliver the pre-configured guarantees. If a peer detects that the current availability of peers and the replication factor are not sufficient for maintaining the guarantees, the sufficient replication factor will be determined according to the measured availability of peers. Knowing the previous and new replication factor, the peer is able to insert into the DHT additional replicas of data managed in its local storage. On the other hand, if the number of replicas is higher than needed, the peer will remove unnecessary replicas from its storage, reducing the storage overhead. Replication makes the consistency of data harder to maintain. Every logical update is translated into a set of replica updates. Due to the dynamic nature of the DHT, many replicas can be unavailable at the time when an application issues an update request. Such conditions force the usage of some weak-consistency models that updates all available replicas and synchronizes all the others eventually when they become online again. Until this is achieved, none of guarantees about the consistency of the accessed data cannot be given. The proposed protocol implements a weak-consistency mechanism that, unlike the others, is able to provide an arbitrary high probabilistic guarantees about the consistency of available data before all replicas are synchronized. This is done by updating the available and inserting the new version of all offline replicas. As soon as a replica becomes available again, it is informed about missing updates, and is merged with the new version. Such approach ensures that at least one consistent replica is available with a high probability when data are requested. The approach presented was evaluated by using a custom-made simulator. The requested availability and consistency levels are fully guaranteed in the DHT with a stable or increasing peer availability. During churns (periods when the peer availability decreases), the guarantees are maintained only in cases when the dynamic of churns is low (the peer availability decreases slowly). Faster changes cannot be compensated fully, but eventually, after the system stabilizes enough replicas will be generated, and the guarantees will be recovered. |
||||
Alternatives oder übersetztes Abstract: |
|
||||
Freie Schlagworte: | decentralized, peer-to-peer, data management, replication, data availability, data consistency, distributed hash table, dht | ||||
Sachgruppe der Dewey Dezimalklassifikatin (DDC): | 000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik | ||||
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik | ||||
Hinterlegungsdatum: | 17 Okt 2008 09:22 | ||||
Letzte Änderung: | 26 Aug 2018 21:25 | ||||
PPN: | |||||
Referenten: | Aberer, Prof. Dr. Karl ; Buchmann, Prof. Dr. Alejandro ; Steinmetz, Prof. Dr.- Ralf | ||||
Datum der mündlichen Prüfung / Verteidigung / mdl. Prüfung: | 10 Juli 2007 | ||||
Export: | |||||
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |