Is logistic regression the COBOL of machine learning? – Analytics India Magazine

Listen to this story

Joseph Berkson developed logistic regression as a general statistical model in 1944. Today, logistic regression is one of the main pillars of machine learning. From predicting Trauma and Injury Severity Score (TRISS) to sentiment analysis of movie reviews, logistic regression has umpteen applications in ML.

In a recent tweet, Bojan Tunguz, senior software engineer at NVIDIA, compared logistic regression to COBOL, a high-level programming language used for business applications. 

“It would be great if we could replace all of the logistic regressions with more advanced algos, but realistically we will never completely get rid of them,” said Bojan.

COBOL first gained currency in 1970 when it became the de-facto programming language for business applications in mainframe computers around the world.

COBOL’s relevance is chalked up to its simplicity, ease of use and portability

Logistic regression is a simple classification algorithm used to model the probability of a discrete outcome from a given set of input variables. LR is used in supervised learning for binary classification problems.

Origins of logistic regression

Pierre François Verhulst published the first logistic function in 1838. Logistic regression was used in the biological sciences in the early twentieth century. 

Source: dataaspirant.com

After Verhulst’s initial discovery of logistic function, the most notable discoveries were the probit model, developed by Chester Ittner Bliss in 1934 and maximum likelihood estimation by Ronald Fisher in 1935.

In 1943, Wilson and Worcester used the logistic model in bioassay which was the first known application of its kind. In 1973 Daniel McFadden connected the multinomial logit to the theory of discrete choice, specifically Luce’s choice axiom. This gave a theoretical foundation for logistic regression, and earned McFadden a Nobel prize in 2000.

COBOL

According to the global survey by Micro Focus, COBOL is viewed as strategic by 92 % of respondents.

Key findings of the Micro Focus COBOL Surveys include:

  • More than 800 billion lines of code running on production systems and in daily use, far exceeding any previous estimates.
  • Nearly half of the survey’s respondents expect the amount of COBOL in use at their organisation to increase in the next 12 months. Furthermore, last year’s research report showed that over half of respondents (52 %) expect their organisations’ COBOL applications to remain for at least the next decade, with more than four in five expecting that COBOL will still be in use when they ultimately retire–creating a need for continued COBOL investment and modernisation for next-gen developers.
  • 92 % of respondents stated that their organisations’ COBOL applications are strategic with future IT strategy and application portfolio alignment with new technology being listed as the key drivers for COBOL modernisation.
  • As opposed to a rip and replace approach, 64 % of respondents intend to modernise their COBOL applications and 72 % of respondents see modernisation as an overall business strategy. 
  • 43 % of the survey’s respondents stated that their COBOL applications do and will support cloud by the end of the year. In addition, 41 % stated that new business projects require integration with existing COBOL systems.

The importance of different AI/ML topics in organisations worldwide (2019).  Source: Statista

Twitterati is split

Bojan Tunguz’s tweet garnered both for and against responses.

While many said a simple solution that works should not be messed with, the opposite camp said complex algorithms like XGBoost provide better results. 

Andreu Mora, an ML and Data science expert at Adyen payments, said: “If a simple algorithm gets you a good performance it might not be a wise move to increase operational work by 500% for a 5% performance uplift.”

To this, Bojan replied: “Depends on the use case. If a 5% improvement in performance can save you $5B, then you totally should consider it.”

Amr Malik, a research fellow at Fast.ai, said: “For this scenario to be true, you’d need to be supporting a $100 Billion dollar business operation with LR based models. That’d be a gutsy bet on a really big farm”.

We have picked the best responses from the tweet thread:

  1. Simple models are useful as a pedagogical tool when teaching and for setting a baseline. COBOL is around because of infrastructure built around it: Brandon Behring, data scientist at Prudential Financial 
  2. I disagree. I don’t think it’d be “great” to replace all logistic regressions with more advanced algos. In some scenarios, it’s the best choice like applications with high stakes and bias risk (ex: pay equity analysis) and models that need to be vetted by regulators (ex: insurance): Brydon Parker, senior data scientist at Shopify
  1. The thing I love about logistic regression is that I’m able to understand it, and it is freaking reliable. I could get a 3 or 4 % improvement in accuracy with other more complex algorithms, but I don’t feel comfortable running something I don’t understand to a reasonable degree: Kelmer Carvalho, automation engineer at Ericsson.
  2. “And it makes a lot of sense indeed, if your performance constraints are loose, why bother yourself deploying a fancy thing? Going online with a (e.g.) transformer is an actual pain: Sergio Rozada, director of engineering, TSK

On LinkedIn, Damien Benveniste, an ML tech lead at Meta AI, said he never uses algorithms like logistic regression, Naive Bayes, SVM, LDA, KNN, Feed Forward Neural Network, etc. and relies only on XGBoost. 

Spread the love

Leave a Reply

Your email address will not be published.