Efficient deduplication using hadoop

Manjunath R. Hudagi; Sachin A. Urabinahatti

International Journal of Latest Trends in Engineering and Technology

Index Copernicus, ICV - 77.02/100
Index Copernicus, ICV - 7.39/10
e-ISSN : 2278-621X
Cosmos Impact Factor - 4.490/10
Global Impact Factor - 0.685/1
p-ISSN : 2319-3778

Volume 10 Issue 3 - May 2018

1. Efficient deduplication using hadoop

Authors : Manjunath R. Hudagi, Sachin A. Urabinahatti

Pages : 236-238

DOI : http://dx.doi.org/10.21172/1.103.40
Keywords : Cloud storage, Deduplication, Hadoop, Hadoop distributed file system, Hadoop database.
Abstract :
In cloud computing, we found that when user uploads the same file twice with same file name it doesnâ€™t allow saving the same file .Also doesnâ€™t allows to saving file with same file name with different content. Hadoop is high-performance distributed data storage and processing system. Hadoop doesnâ€™t provide effective Data Deduplication solution. Assuming a popular video or movie file is uploaded to HDFS by one million users and stored into three million files through Hadoop replication and thus it is wasting of disk space. Through proposed system, only single file spaces are occupied namely reaching the utility of completely removing plicate files. Before uploading data to HDFS we calculate Hash Value of File and store that Hash Value in Database for later use. Now same or other user wants to upload the same file name with same content. An SHA algorithm used to calculate Hash value and verify it to HBase (HBase is called the Hadoop database because it is a NoSQL database that runs on top of Hadoop). Now if Hash Value is matched with stored hash value then it will give message that â€œFile is already exitsâ€.

Citing this Journal Article :
Manjunath R. Hudagi, Sachin A. Urabinahatti, "Efficient deduplication using hadoop", Volume 10 Issue 3 - May 2018, 236-238

Click here to Submit Copyright Takedown Notice for this article.

View Paper

International Journal of Latest Trends in Engineering and Technology

Index Copernicus, ICV - 77.02/100

Index Copernicus, ICV - 7.39/10

e-ISSN : 2278-621X

Cosmos Impact Factor - 4.490/10

Global Impact Factor - 0.685/1

p-ISSN : 2319-3778

Other Links

Indexed By

Volume 10 Issue 3 - May 2018

1. Efficient deduplication using hadoop