Accomplished Big Data / Analytics scientist with 2-3 years’ experience using Big Data tools and frameworks (Hadoop, Map/Reduce) to analyze large, high volume data sets and develop current and predictive models for performance improvement and data-oriented decision making
MoreWrote a BNF grammar (using yacc/bison) to parse the HTML data obtained from NASDAQ and extracted the option chain sheet. This sheet is then moved into a database for further processing and analysis.
Used apriori algorithm to find associations between different complaints of New York city. Used 311 complaint data and after analyzing the data using qlik sense cloud created a hypothesis(on small data file 100MB) and generated extra columns. Validated that hypothesis by applying apriori algorithm on huge data file of 6GB.
R-Trees are widely used in various mobile applications and apps. Built and query an R-tree described in the Guttman-1984 paper. We were provided with a data file of 100,000 points (two dimensional points specified as x,y coordinates). Used the Hilbert curve function to build the tree.
In this project, we develop a specialized R program to crawl, parse and extract price history of houses and associated housing characteristics for a specific city from real estate online platform Zillow (https://www.zillow.com).
Using the Apriori algorithm, generated the association rules for transactional databases. By giving the value of Minimum Support and Minimum Confidence we predict the items customer will buy together with support and confidence percentage.
Developed a Big Data/Analytics Oozie workflow comprising several coordinated MapReduce jobs to analyze 130 million flight records (over a 22-year period) and compute performance sta- tistics such as on-time arrivals, short taxi-times and reasons for customer dissatisfaction (e.g. cancellation). I further developed a model to predict various airline performance measures to be used to improve airline rat- ings, profitability and customer satisfaction.
In this project we developed a high performance concurrent web proxy to handle a mix of hypertext and file requests. Our server handled data of 500MB with a memory footprint of 30MB over a set of concurrent connections. In an extension to this project we developed an Internet ‘radio’ which would broadcast audio over lossy (UDP) links. We analyzed packet loss, delay and developed an adaptive model to overcome delays and stream data at near-real-time.
Analyzed a series of data sets to arrive at a prudent product mix, product positioning, and marketing strategy that will be applicable for at least a decade to identify the motivators for continuous adult education, occupation poised for growth and decline over the next 10 years, regions that have potential for growth across potential industries, financial capacity of consumers across regions and demographics.
Developed Android Applications such as Employee Tracking System, Location Based Reminder, Home Automation, etc. Taught courses on Android Application Development to colleagues.