As companies and industries move online, data has taken over oil as the world's most valuable commodity. It is hardly surprising that data scientist became "the sexiest job of the 21st century" and data science is the most in-demand skill currently.
In short, data scientists ask questions related to a fundamental business problem, then work with raw data, collecting, organising and analysing it. They create and use algorithms for the identification of patterns and trends in the work of answering questions.
But to pursue and succeed in the hottest profession in the new era, professionals are required so much more than having an ordinary knowledge of programming, or coding. Let Mastt's data scientist Domenic Prestia walk you through some must-have skills for the job. Dom is working on the future of data science and data discover at Mastt, applying machine learning, data analysis and big data management to bring never before possible insights to the construction industry.
Understanding of statistics is a fundamental requirement to have. There are many subjects that fall under the topic of statistics, and you should understand the basics (random variables, basic probability, probability distributions, etc).
It allows you to explain and interpret data through concepts such as:
Exploratory Data Analysis
Experimental Design-Multi-variable Calculus and Linear Algebra - the pillars that machine learning is built on.
Learning them will enhance your skills
Machine Learning algorithms are basically out of the box nowadays, and even fine tuning of the algorithms can be done with minimal knowledge of what is going on under the hood. So knowing Calculus and Linear Algebra may not be required to get started, but to get the absolute most out of techniques and be confident you have the best results, knowing these areas are a must.
Essential to being a data scientist is putting your knowledge into practice. Being proficient in a language is well and good but practising good programming principles will keep your projects from becoming hot messes of bugs and failure.
Key design principles for data science are:
KISS (Keep it Simple Stupid)
DRY (Don't Repeat Yourself)
Single Responsibility Principle
Python is a general purpose language, easy to learn and has a lot of support for machine learning/data science.
R is another language used for machine learning and data analysis but is not general purpose like Python.
Julia is a relatively new programming language, which is a general-purpose language like python, but has a significant speed advantage.
SQL for accessing databases, which is essential.
Data will always be messy, and cleaning it is essential to be useful in modelling. It can be as simple as correcting typos, fixing dates, or can be as hard as interpolating missing data.
This skill is needed for exploring data.
Exploration allows you to familiarise with data before modelling/analysing. How you decide to test/model data is heavily influenced by data exploration
Data Visualisation is also critical for storytelling as large amounts of data need to be transformed into something is easy to comprehend. This helps stakeholders make data-driven decisions. For example, your audience won't understand p-values or correlation coefficients. As such communicating results is essential and visualisation will help with this.
To master visualisation, you need to learn:
Basic chart types
Best visualisation tools for your chosen programming language
Which charts are useful for which scenario
You will need a complete understanding of fundamental ML algorithms, including:
Linear Regression and Logistics Regression
Support Vector Machines
K-means-Knowing which algorithms are appropriate for which situations. The No Free Lunch Theorem states "All optimisation algorithms perform equally well when their performance is averaged across all possible problems". This implies there is no single best machine learning algorithm for predictive modelling problems, and so having a wide breadth of experience with different algorithms will allow you to get the best possible solution.
Interested in what we do at Mastt? Get in touch now via email at email@example.com.
CEO Altaf Ganihar shares his journey of making his collaborative building design software Snaptrude. Finding a better, more sustainable way to build for the community has always been his moral compass.
Ivy Halstead on her love of building, forever being curious and new challenges.
To pursue and succeed in data science, professionals are required so much more than having an ordinary knowledge of programming, or coding. Let Mastt's data scientist Domenic Prestia walk you through some must-have skills for the job.
"Mastt provides software that is currently at the forefront of owner/client-side project management for the construction industry and continues to change the way we operate for the better."