Decentralized peer-to-peer (P2P) overlay networks are distributed systems by nature, without any hierarchical organization or centralized control. They are typically divided into two main classes: structured and unstructured [39]. Structured P2P overlay networks have tightly controlled topologies, and content is placed in specific locations to efficiently resolve queries. Some well-known examples are Content Addressable Network (CAN) [44], Chord [15], and Pastry [45]. Such overlays use a Distributed Hash Table (DHT) as a substrate, where data objects (or values) are deterministically placed on peers whose identifiers match the data object's unique key. In DHT-based systems, node identifiers are uniformly and randomly assigned to peers from a large identifier space. Similarly, unique identifiers, chosen from the same identifier space and called keys, are calculated from the given objects using a hash function. The keys are then mapped by the overlay network protocol to a unique active peer in the overlay network. The structured P2P overlay network supports scalable storage and retrieval of {key, value} pairs. Given a key, operations such as put(key,value) and get(key) can be invoked to store and retrieve the data object corresponding to the key respectively, which involves routing requests to the peer corresponding to the key. However, they only support exact matching and are heavily affected by peer churn [31]. A Content Addressable Network [44] is designed around the idea of a virtual d-dimensional Cartesian coordinate space (i.e. d-torus), which is divided among peers and used to assign IDs to data resources. Each peer knows the IDs and IP addresses of its neighbors. While receiving a request, each node sends the message to a neighbor...... middle of paper ......that takes advantage of the fact that nodes that share a large amount of files tend to remain in the network for a while longer, and therefore, by querying this small portion of nodes, the success rate increases and search traffic decreases. To reduce the number of probed hosts and consequently reduce the overall search load, it has been proposed to replicate the data across multiple hosts [67] . The location and number of replicas vary with different replication strategies. Thampi et al mention in [41] that there are three main site selection policies. Owner replication where the object is replicated on the requesting node and the number of replicas increases proportionally to the popularity of the file. Random replication where replicas are distributed randomly and path replication where the requested file is copied to all nodes in the path between the requesting node and the source.
tags