Career in Data Science
In the recent times, Data Scientist (or related roles such as Data Manager, Statistician, Data Analyst, etc.) is one of the most sought career paths. Several top universities have started offering programs dedicated to data science in order to fill the skill gaps currently the industry is facing.
Allured by the tremendous opportunities, great compensation and visibility to business leaders, many people are moving towards the Data Scientist career path without a thorough careful assessment of the day-to-day responsibilities of such a role, the required attitude; and balance of technical and business skills. The aim of this blog is to highlight the roles and responsibilities of a data scientist and help aspirants in the field to understand the challenges and skills required to overcome them.
Before we go into further details, lets first understand who data scientists are? Data Scientists are one who fine-tune the statistical and mathematical models that are applied onto data. When somebody is applying their theoretical knowledge of statistics and algorithms to find the best way to solve a data science problem or build a model to predict the number of credit card defaults next month, they are wearing the data scientist hat.
A data scientist will be able to take a business problem and translate it to a data question, create predictive models to answer the question and story-tell about the findings. Statisticians that focus on implementing statistical approaches to data, and Data Managers who focus on running data science teams tend to fall in the data scientist role. Data scientists are the bridge between the programming and implementation of data science, the theory of data science, and the business implications of data.
On any given day, a data scientist may be required to:
Conduct undirected research and frame open-ended industry questions
Extract huge volumes of data from multiple internal and external sources
Employ sophisticated analytics programs, machine learning and statistical methods to prepare data for use in predictive and prescriptive modeling
Thoroughly clean and prune data to discard irrelevant information
Explore and examine data from a variety of angles to determine hidden weaknesses, trends and/or opportunities
Devise data-driven solutions to the most pressing challenges
Invent new algorithms to solve problems and build new tools to automate work
Communicate predictions and findings to management and IT departments through effective data visualizations and reports
Recommend cost-effective changes to existing procedures and strategies Scientists
Main challenges data scientists have to face in their jobs is the problem of evolution, confusion and ambiguity. New technologies are coming up on a daily basis to let data scientists analyze data and visualize trends. In such a scenario, what you learn over a period of time might get
redundant in the near future. There are problems that nobody has solved before and therefore more often than not data scientists are venturing onto unexplored areas. Choosing the best tool to get the job done in a simplest way is another issue when one is spoilt for choices. The below “Skill” versus “Type of Job” matrix gives a deep insight into what skills are required to carry out the role of a data scientist in the most efficient manner.
Fig: Skill versus Type of Job matrix (Source: Udacity)
There are many roles that persons pursuing data science can fit into based on their experience. Below is a table which shows for each role, what kind of skills are required and what kind of industries are interested in hiring such a talent. There are two distinct career tracks that of a data administration and that of a data scientist whose career can start with the role of a data analyst or a data engineer and progress to a data scientist and a data architect role. There is also a lot of scope of becoming independent consultants in this field and become entrepreneurs once you have gained enough knowledge. The role of a Statistician cuts across the field and often mentioned with the other roles. The industry types that are shown as mapped to each role are known to have hired people with these skills but it by no means suggests that people with such skills can apply only to these types of industries. The fact is that there are many data science jobs open currently and less talent around to fill these positions.
Role/Job function | Tools/Languages | Skills Required | Industry match |
Statistician | R, SAS, SPSS, Matlab, Stata, Python, Perl, Hive, Spark, Pig, SQL | · Statistical theories · Data mining and machine learning · Distributed computing (Hadoop) · Database systems (SQL and NO SQL based) Cloud tools | Healthcare, Market Analysis companies |
Database Administrator | SQL, Java, Ruby on Rails, XML, C#, Python | · Back and recovery · Data modeling & design · Distributed computing · Database systems (SQL and NO SQL based) · Data security · ERP & business knowledge | Cuts across the industry which have anything to do with data management |
Data Analyst | R, Python, HTML, JavaScript, C/C++, SQL | · Spreadsheet tools (Excel) · Database systems (SQL and NO SQL based) Math, Stats, Machine Learning | Logistics, IoT companies |
Data Engineer | SQL, Hive, Pig, R, Matlab, SAS, SPSS, Python, Java, Ruby, C++, Perl | · Database systems (SQL and NO SQL based) · Data modeling & ETL tools · Data APIs Data warehousing solutions | E-commerce, SNS |
Data Scientist | R, SAS, Python, Matlab, SQL, Hive, Pig, Spark | · Distributed computing · Predictive modeling Story-telling & visualizing | Web Services, Enterprise Software |
Data Architect | SQL, XML, Hive, Pig, Spark | · Data warehousing solutions · In-depth knowledge of database architecture · Extraction Transformation and Load (ETL), spreadsheet and BI tools Systems development | Banking, Computer Hardware |
Comments