Improving DHT reliability
In this post I am going to make a few notes about how reliability of Distributed Hash Table (DHT) can be improved. Distributed Hash Table is a decentralized peer-to-peer system that provides lookup service in the similar way how regular hash table does. One of the popular implementation of DHT is file sharing used in BitTorrent Protocol, it allows to download files without use of the tracker.
Let's take a quick look on how DHT Pastry works. All other DHT tables have similar implementation with a few differences. The main concepts are:
- Each data object has a unique identifier (key) which is calculated as hash of it's content.
- All keys belongs to 160 bit circular key space.
- Files are stored on a multiples nodes (equal PCs).
- Each node have random unique identifier in the same key space.
- Data object is stored on the working node which id is closest to the data object key. Special routing function is used to find such node.
- Once data object has been pushed it could not be deleted from the DHT.
- Replica function transforms key to another key, that could be used to store copy of data object.
DHT structure is quite scalable, so if you have no enough storage in the system you could add new nodes and some of the data will be distributed automatically. For example, if only one node exists in the system, all data objects will be stored on that node. Id of such node could be equal to e.g. 7ab1d3241f. When new node is added to the DHT system it is assigned to the random id e.g. 12f539f2c5. That means that about half of the data objects stored on the first node should be transfered to the second node, because keys of such data objects is closer to the second node rather to the first node. There is also routing function that provides fast search of the node by it's id. Routing function uses routing tables that are stored on the nodes and are always updated.
Data Loss Problems
While adding new nodes to the DHT system is easy tasks and it could solve the problem with lack of storage, deleting of nodes could bring some new problems. If node is just disconnected for a short time it does not affect on the data unavailability, cause nodes with replicas of data objects, stored on disconnected node, will still be working.
When node is been fully deleted from the DHT system, all data objects that were stored on it are also deleted, however replicated data objects still be available on other nodes. From this point of view nodes that stores original or replicated data object should check for presence or absence of the other "replica nodes" for this data object and if those nodes do not respond for a long time data should be pushed again.
If node is deleted deliberately, it should notify all other nodes to update routing table and push own data to the system. The same should be done if id of node has been changed.
All ideas which were described under Data Loss Problems could already be implemented in most of the DHT systems, however I have not seen implicit addressing into them. Any feedback on this is appreciated.