Idris Hanafi
Big Data Researcher - Software Developer - Mobile App Developer

I am currently a master student at University of Michigan, Ann Arbor and graduate researcher in the field of Big Data Systems. I went to college at the age of 11 and received my Bachelor Degree in Computer Science (Honors) in the Spring of 2015 at the age of 16. My research interest comprises of Big Data Systems, Databases, Machine Learning, Image Extraction, and Computer Vision.


The increasing amount and size of data being handled by data analytic applications running on Hadoop has created a need for faster data processing. One of the effective methods for handling big data sizes is compression. Data compression not only makes network I/O processing faster, but also provides better utilization of resources. However, this approach defeats one of Hadoop’s main purposes, which is the parallelism of map and reduce tasks. The number of map tasks created is determined by the size of the file, so by compressing a large file, the number of mappers is reduced which in turn decreases parallelism. Consequently, standard Hadoop takes longer times to process. In this paper, we propose the design and implementation of a Parallel Compressed File Decompressor (P-Codec) that improves the performance of Hadoop when processing compressed data. P-Codec includes two modules; the first module decompresses data upon retrieval by a data node during the phase of uploading the data to the Hadoop Distributed File System (HDFS). This process reduces the runtime of a job by removing the burden of decompression during the MapReduce phase. The second P-Codec module is a decompressed map task divider that increases parallelism by dynamically changing the map task split sizes based on the size of the final decompressed block. Our experimental results using five different MapReduce benchmarks show an average improvement of approximately 80% compared to standard Hadoop.
View Research Paper here.
Hadoop-MapReduce is a common software framework processing parallelizable problem across big datasets using a distributed cluster of processors or stand-alone computers. Hadoop-MapReduce can scale incrementally in the number of processing nodes. The Hadoop-MapReduce is designed to provide a processing platform with powerful computation. Hadoop has been successfully used within processing/executing big data. However, Hadoop has an issue of assuming all nodes on a cluster are homogeneous, whether it is based on memory, speed, etc. There have been many attempts to solve this variety of homogeneous assumption, the most common one being the scheduling algorithms for node processing speeds. For our project we are solving a homogeneous assumption of Hadoop assuming that all nodes on the cluster have free memory, which causes Hadoop to split a file size by a fixed size and assign them to each computer on the node while ignoring if the node has that much free space. While collecting an extensive list of papers that solve these homogeneous assumptions, we can conclude to the best of our knowledge that this project has not been done yet. We evaluate our algorithm based on two test cases: (1) a heterogeneous cluster with one node that has low memory and (2) performance based on time. We will be running our experiments on a 100 GB text file over 5 heterogeneous systems on a cluster.
Download my updated version of Hadoop here.
Inventory tracking and item pricing is becoming increasingly difficult to perform, as more items are purchased employees need assistance in keeping track of them. This is especially true in distributed environments with multiple warehouses/branches. So we developed an android application in which a user can manage inventory by scanning a UPC code with an attached camera and can then enter additional details about the product into a database such as a location, description, price, and adding/ removing quantities.
Every business where inventory management is an issue could benefit from this application. When items arrive and leave the premises, an employee can add or remove quantity to the database. In businesses where many items are for sale, employees can scan the items and receive the total accurate cost without guessing what the price of an item is.
If you are on an Android Device you install it now here. Also, make sure "install from Uknown Sources" is enabled on your device.
For a user guide download the handbook: Scanner App User Guide

Use arrow keys to move, reach portal to go to next level, and dodge lavas and falling rocks.

Or check my GitHub for more of my projects:

About Me

Setting Up Hadoop on 20 Machines.
Find it here.
I graduated from Central Connecticut State University at the age of 16. It felt pretty normal, except for the fact that I was not eligible for a full-time job due to being underage. I was mentioned here.

Coming Soon