Meet the Hackathon Weekend Winners

The Hackathon Weekend Edition # 2 – The Last Hacker Standing music genre classification was successfully completed on August 12, 2021. The challenge was to create an evolving music genre classification model that generalizes well to unpublished data. It had almost 300+ participants and over 120+ actively competing in the leaderboard.

Based on the leaderboard score, we have the top 3 winners of the Music Genre Classification Challenge, who will get free passes to the Virtual DevCon 2021 Deep Learning, which will be held September 23-24, 2021. Here we take a look at the winners, solutions approaches and experiences at MachineHack.

Please note that most of the winning solutions are shared voluntarily, within a given time frame, so we will offer you the top three solutions in order of their ranking in the leaderboard.

Rank 2 – Eric Vos

Eric learned industrial computing and robotics 30 years ago. As part of his course, the basics of traditional AI were covered. A few years ago, he became interested in new machine learning techniques such as neural networks and deep learning. He started taking relevant courses taught by Andrew NG, Geoffrey Hinton, etc. Eric now participates in most data science competitions and hackathons to practice the newly learned machine learning skills.

Approach

Eric is happy to have worked on such a unique problem statement and data set. He started with exploratory data analysis using AutoViz and AutoViML and, in the process, realized that double-encoding minutes and milliseconds wasn’t just transformational work. So he decided to keep track of the original time format in a separate function. In addition to the existing features, it added standard NLP and language features, adding up to 29 features for modeling. After several experiments with different models, only one CatBoost produced by AutoViML gave the best result.

Experience

Eric is a Serial MachineHacker and has learned a lot from published solutions shared by top hackers. He says, “MachineHack is a great place to improve my machine learning skills and play around with various original datasets. I like the ‘weekend’ format; it is now my weekly brain sport.

Check out its solution here.

Rank 3 – Anand Kumar

Anand graduated in Electronics and Communication Engineering from Anna University, Chennai, and has nearly 10 years of data science / machine learning experience. He is currently working as an associate director at a large analytical research firm.

Approach

Anand thinks that the fact that the problem statement is a multiclass problem makes it slightly different. In particular, the Track Name variable had 15,000 unique values, so a variable with high cardinality can be quite difficult to manage.

He tried categorical label encoding features and different techniques for imputing missing values ​​such as mean / median imputation. What worked best was converting two categorical characteristics, “Artist name” and “Track name”, into a string and a NA value populated in “Popularity”, “key” and “l ‘instrumentality’ with a single zero.

Anand used the fast and lightweight FLAML – AutoML to get the best model and for further tuning of the hyperparameters. He converted two categorical characteristics to a string, then, while adapting the CatBoostClassifier model, passed them as “cat_features”. He also used loss_function = ‘MultiClass’ in this case.

Experience

Anand says: “It was a great experience and a learning at the same time”.

Check out its solution here.

Rank 4 – Harshad Patil

Harshad started his data science journey about five years ago. He previously worked as a business analyst, but always found the deep learning fascinating. He started learning from online sources which helped him with the basics. He then changed his role from analyst to data scientist and started participating in hackathons of different websites like Machine Hack, HackerEarth, Zindi-Africa, etc. He carefully followed the top 3 solutions in each hackathon and incorporated some approaches. He ultimately won a competition and received a cash prize of $ 500.

Approach

Harshad decided to use tree-based models due to the large number of unique values ​​in Artist and Track Name. First, it separated out all the outliers and labeled all categorical data, thus getting a good baseline score. Then, using Winsorizer to deal with outliers only improved the review score. Then it incorporated a total of 55 features, many of which were just aggregation features. But after doing a principal component analysis, it stopped at 20 features, which earned it a good score in the leaderboard. Finally, he used the CatBoost model, which is the most robust for handling categorical characteristics.

Experience

Harshad says, “This is a great website to challenge your knowledge. And, nowadays, companies look at the candidate ranks to get a glimpse of the candidate. So, keeping a higher ranking on that website can give the person an advantage. “

Check out its solution here.

Once again, join us in congratulating the winners of this exciting hackathon – who were indeed the “last hackers standing” in the musical genre classification – Weekend Hackathon Edition-2. We’ll be back next week with the winning solutions for the ongoing challenge – Tea Story.