Wipro’s hiring hackathon – sustainability machine learning challenge – concluded on February 14, 2022. The hackathon had close to 1,880 participants and 550+ solutions posted on the leaderboard. The top three winners of the hackathon will receive cash prizes worth INR 3.5 lakh.
AIM spoke to the winners to understand their data science journey, winning approach, and their overall experience at MachineHack.
Rank 1: Rajat Ranjan
Ranjan is currently working at TheMathCompany as a data scientist. A former Infoscian, he has worked in the areas of data science and analytics in the IT and services/product industry. He is skilled in machine learning, modelling, and visualisation using Python. Ranjan is also proficient in strong onsite client interaction and analysing stakeholders’ needs.
“I love interacting with data and creating models which best suit the business needs, along with participating in different ML hackathons to learn new technology and grow professionally,” said Ranjan.
- Rajat said seasonality and trends across days for the clear sky helped him come up with a cross-validation strategy using group time series split based on data (the year was also fine, though the day gave much better results).
- Feature engineering using relevant, clear sky metrics such as temperature, transmittance, air mass, etc., helped provide context on clear sky calculations.
- Signal analysis for peaks within the day gave a clear indication.
- Feature engineering using lags, rolling and expanding statics are the base for any time series forecasting problem.
- Final predictions are based on an ensemble of different cross-validation strategies using LGBM as the base model.
“I have been part of MachineHack from its inception. Hackathons like these boost the confidence of any aspiring data scientist and help us to grow more technically proficient in the ML/DS field, as consistency is the key to learn and grow,” said Ranjan.
Check out the code here.
Rank 2: Tapas Das
Taps Das currently works as a data engineer in TheMathCompany. He got interested in machine learning and deep learning in 2018. “I went through different MOOCs like the Andrew Ng ML course and Deep Learning Specialisation course on Coursera,” said Das.
Further, he said he spent a significant amount of time learning Python programming basics. He then started picking diverse projects from online sources like Kaggle, HackerEarth, Driven Data, to get proficient.
He also participates in various hackathons to stay ahead of the curve. “I was inspired and overwhelmed by the ability of ML algorithms to solve a variety of real-world problems,” he added.
Das started with extensive EDA to explore the training dataset, which resulted in a few interesting insights, including
- Both DNI and GHI values are 0, when DHI = 0
- All target variables are 0 for Solar Zenith Angle >= 93 degrees
- All target variables are 0 for Hour between 1 AM to 9 AM (just before sunrise)
After this, he filtered out all records in both train/test datasets, which satisfied the above conditions. He then used feature engineering for the rest of the records, which generated the below feature types.
- Date/time features (Quarter, Week, DayOfWeek, is Weekend). He also encoded all date variables to make them cyclic for easy interpretation by models.
- Manual interaction features like “Dew Point / Temperature”, “Temperature / Pressure”, “Humidity * Wind”, etc.
- Lead/Lag features by shifting the records by different periods.
“I changed the problem statement from time-series forecasting to purely regression problem and trained different tree-based models on the same. Finally, I used a weighted average ensemble of LightGBM, CatBoost and XGBoost models to generate the final predictions. Also, I used the Optuna library for hyperparameters search for the different models,” said Das.
“Competitive DS is a whole different ball game. The winning solutions of most of these challenges involve techniques that are seldom taught in academia, but are used in many production systems,” said Das.
For a while, he has been participating in different hackathons on the MachineHack platform. He said he loves how the platform allows anyone, regardless of background or prior experience, to compete on a level playing field where the only thing that matters is optimising a metric.
“Winning solutions from previous hackathons are an invaluable learning resource that I highly encourage aspiring participants to leverage. It is fun to compete with the greatest minds in the area of data science,” added Das.
Check out the code here.
Rank 3: Gopi Durgaprasad
“AppliedAI workshop at my college was my first step, after which I took some courses at Coursera. After taking Deeplearning.ai in my third year of B.Tech, I got a summer internship, then I got my first job offer, became a Kaggle notebook expert and eventually became a grandmaster. Then I got a full-time position, and now I am working at Karmalife.ai as a data scientist,” said Durgaprasad.
In the data processing step, after some experiments, Durgaprasad figured out there was some correction over the years. So he took each year as one fold. hE then trained 10-folds each year as a fold and found that Public LB was calculated on 30 percent of data. “This was the multi-label regression problem,” said Durgaprasad. Hence, he trained the model on each fold, predicted and saved oofs and test prediction for each fold.
In the feature engineering step, he used sample code for creating rolling and shifting features. In the modelling part, he tried catboost multi-label regression and LSTM without any feature engineering, and achieved good results.
“MachineHack hackathons are one of my favourite platforms for learning from others and collaborating with others,” said Durgaprasad.
Check out the code here.