4 Proven Ways Newbie Analysts Can Become Machine Learning Pros
These four recommendations can help prepare you -- or the novice analyst on your team -- for a career in this burgeoning field.
- By Scott McKinney
- January 25, 2018
When Aurora Peddycord-Liu started as an analytical education intern at SAS in the summer of 2017, she came with a solid educational background from Worcester Polytechnic Institute and NC State's computer science Ph.D. program.
These programs prepared her well for her current position at SAS, where she uses data to derive actionable insights on the design and use of SAS e-learning courses, but she's had to adapt her skill set to face the challenges of a real-world analytics position.
To learn how newbie analysts can prepare for their work in this hot new age of machine learning, I spoke with Peddycord-Liu and senior executive, Dan Olley, global CTO at Elsevier. Here are their recommendations.
Recommendation #1: Don't be overwhelmed -- just get started
Don't be intimidated by the powerful tools at your disposal; find a point to start and dive in. "We're entering the era of symbiotic technology. Machine learning allows us to deal with unstructured data in a very powerful way," says Olley. "That's allowing us to work in a much more symbiotic way with the digital world." Olley warns that he sees many analysts talking about what they are going to do rather than getting started, testing, and learning from their mistakes.
This is especially true because analysis training programs tend to focus on technical tools, whereas real-world problems tend to involve the data itself, according to Peddycord-Liu. In the real world, the data may have errors in its collection or parts may be misleading. For example, Peddycord-Liu's team analyzed an educational math game using a variable for problem completion time. They noticed students in version A scored the same as students on other versions but completed the problems significantly faster.
This misled the team to conclude that version A taught the students to complete the problems more quickly; closer analysis revealed the time difference was due to a shorter animation time for the problems in version A. When the team corrected the time variable to only count non-animation time for the problems, it gave a more trustworthy result.
"You need a longer process to understand what the data really records, what it means, and how to use it well," Peddycord-Liu says. A practical way to do this is to focus on the underlying problem you're trying to solve.
Recommendation #2: Consider what knowledge (in context) you need
Before starting any analysis, consider what data you need to solve the problem. "In fact, getting your data under control is a key first step in my mind," Olley says. Are there any potential errors in the data or parts where it is misleading? This is something many organizations are focusing on, and anyone can start now.
Next, what is the knowledge you need to understand the problem and the context of the question you want to answer? According to Olley, knowledge and context serve as a guide for solving any problem in any domain. They will also help you identify data you need but don't have. From there you can think about how you can collect this additional data.
Also consider the people who are interested in your analysis. "Ask yourself, Who is my audience and how can I best use this data to benefit them?" says Peddycord-Liu. What business or other problems are you trying to answer with the data at your disposal?
Recommendation #3: Understand the regulatory environment
As machine learning becomes more powerful and sophisticated, expect oversight from citizens and governments. Olley notes that this is a specific concern with deep learning. With older analytical tools, analysts choose which aspects of the data they will use in the analysis so they can explain their work.
With deep learning, a "black box" chooses which aspects of the data are important for a decision. The data selection process is performed by the machine, not the researcher, so it is not as clear which aspects of the data are used in making the decision.
Methods are evolving that look inside the black box, but it is still early days for these approaches. "When deciding which algorithms you're going to use to build these predictive models, if you think you're in a regulatory environment where you might have to explain why the machine gave the answer it did, you will need to be careful which techniques you're going to use because some are quite difficult to do that for certain," says Olley. Hence, keep the regulatory atmosphere in the back of your mind as you consider how to prepare your algorithm.
Recommendation #4: Don't give up
Data analysis is a good field to get into, according to Olley, for a few reasons. Machine learning is unlocking a blend of the digital and human worlds, including ways we haven't considered yet. "Machine learning, coupled with the continued miniaturization of technology, AR devices, IOT, and hyperconnectivity, are opening up huge opportunities in terms of operational efficiencies and product possibilities," Olley explains.
"In the next decade, all companies will have data products or services within their portfolio." Data will be the currency of the future.
Stick with it. Follow these recommendations to get going with your machine learning projects and you'll be well rewarded.
About the Author
Scott McKinney is a freelance business writer based in South Carolina. He brings his background as a math education professional to use as a specialist in the technology and training industries. You can reach him at scott.nextpage@gmail.com or on LinkedIn at www.linkedin.com/in/scottamckinney/.