Data Scientist to Machine Learning Engineer: 3 things I wish I knew

Read Time ~4 Minutes

I wanted to write about something that is a bit more specific, personal, and something I wish I would have known before I pivoted from Data Science (DS) to Machine Learning Engineering (MLE).

My background is in mathematics and I leaned heavily toward data analysis and statistical testing. These things served me well as a data scientist but tend to take a back seat the closer I got to software engineering, which machine learning engineer is. So here are 3 things I wish I would have known before becoming a machine learning engineer.

1. Data Structures and Algorithms

As a DS, coding was just a means to an end. A way to do analysis on data and answer my questions.

As an MLE, coding is much more of an integral part of the job. Your code will most likely be deployed or be in the hands of end users. Due to that, special attention must be paid to what data structures you use and algorithms you employ. Reason being, data structure and algorithms can vary in speed, size, and compute needed to execute your code. Even if your code runs fine it might be sub-optimized and take longer than it needs to. When I was a DS, I really didn’t mind sitting there and waiting for my code to run but that sometimes isn’t an option for a deployed model or an end user.

A core focus to study in this subject, at least for algorithms, is Big O notation. This is a tool used to describe the time complexity of algorithms. The standard four you will see is O(1), O(log n), O(n), and O(n^2) although there are others. These basically tell you how fast your code will run. The fastest to execute being O(1) and gradually getting slower as you move to the right in the list. I could spend the rest of the article and then some explaining how Big O notation works but I’ll just say it is something I’m glad I learned.

A graph of different Big O notations.

The other side of this subject is data structures. When I was a DS, a Python List or any other data structure was just something I took for granted. But understanding how that List is structured can help you decide if it is the best way to store the data you need to store. Different types of data structures have different types of trade offs and you need to understand which is best for any given situation.

2. Object Oriented Programming (OOP)

Again, most of the work I was doing as a DS was analysis and a bit of machine learning. I knew a bit of OOP but never really needed it as simple functions were good enough for my daily tasks.

While I still rely heavily on functions as a MLE, I find myself more and more leaning towards OOP and the use of classes over functions. The added benefits of things like inheritance, higher levels of abstraction, and polymorphism is something I seem to use more and more. While I still believe most of what you need to get done can be accomplished with functions, OOP is a nice tool to have in your toolbox.

While not strictly pertaining to OOP, OOP tends to lend itself to well structured code and modules. Understanding how to layout your program in a concise manner will allow you to scale your programs without all your code turning into the spaghetti monster.

3. Using a Debugger

How many times have you been trying to get a piece of code working and end up with 15 different print statements scattered throughout it in order to figure out why it won’t work. I know I’m guilty.

Good news is there is an easy fix and you can learn it in less than 15 minutes. Use a debugger.

An example of the debugger in VSCode

A debugger is a way to control your code and see what is happening while it executes. You can run your code step by step, inspect variables, set breakpoints for the code to stop when a certain condition is met, and more.

This was definitely something I was lacking when most of my coding was taking place in a Jupyter Notebook. While I still use notebooks, a proper IDE with a debugger is becoming more and more of a necessity as my projects progress.

Wrapping it up

If you notice, none of what I talked about was in reference to specific machine learning concepts or frame works. It was all basic software engineering skills. While knowing machine learning is a prerequisite for the job, good software engineering fundamentals will ensure that your code is deployable, scalable, and readable.

I hope this will help someone who might find themselves in a similar situation as me or just someone who wants to become a bit more well rounded.

Until next time

Andrew-

Have something you want me to write about?

Head over to the contact page and drop me a message. I will be more than happy to read it over and see if I can provide any insights!

Previous
Previous

Coding organization: A quick look at virtual environments

Next
Next

No experience? No problem!: Best practices when building a portfolio.