Skills Needed to Solve Problems with Data : Getting Started

I’m often asked about the skills required to have a career working with data. As a data analyst, you may be responsible for data mining, analyzing existing data assets, and presenting insights based on the needs of your company. As a data scientist, you may also be responsible for identifying methods for acquiring high volume/velocity data, creating machine learning models, and making predictions using mathematical & scientific methods. If you’re interested in careers that solve problems with data, here are some fundamental skills to develop:

Fundamental Skills To Develop

  1. Python – A general purpose programming language, object oriented, extensible with a wide range of libraries. 
    • 66% of data scientists are using Python daily and 84% of them use it as their main language 
    • Top Python libraries: Tensorflow (ML solutions), Keras (deep learning), Scikit-learn (ML), NumPy (data analysis/ML), PyTorch (deep learning)
  2. R – An open-source programming language used for with robust visualization libraries (ggplot2, plotly) 
    • 47% of data scientists are using R. 
    • It is used by 70% of data miners 
    • Good For: statistical analysis and modeling, analyzing structured and unstructured data
  3. Structured Query Language – combines analytics with transactional capabilities.
    • 32% of data scientists use SQL
    • Good for: data management, transactional capabilities.
  4. Data Visualization – for exploration, storytelling and communicating quick insights. 
  5. Statistics – To support a general understanding of probabilities, distribution, sampling, hypothesis testing, confidence intervals, variables
  6. Spreadsheets – The all purpose data review(er)/calculator.
  7. Algebra & Calculus – To support a general understanding of how algorithms work under the hood.

Your Development Space : IDEs & Dev Tools

Tools for writing software.

  • Jupyter notebooks – Provides an interactive programming interface in a notebook environment. Good for: rapid prototyping, visualization.
  • PyCharm – Python IDE can support single or multi file/language projects. Good for: useful for writing code for production.
  • VS Code for Python – Python IDE based on Visual Studio
  • Github – version control, tracking and recording code changes
  • R Studio – IDE for R.

Safe Spaces to Practice : Communities & Open Datasets

Finding open communities & data sources for practice projects.

  • Google Dataset Search – over 25 million datasets indexed
  • Data.gov – Open data lake provided by the U.S. Government
  • Kaggle – Real-world datasets provided to the kaggle community for collaborative problem solving

Thinking About Solving Problems With Data

Here are more resources and considerations for people that want to solve problems with data.

Total
1
Shares
Related Posts
Read More

Characteristics of Machine Learning solutions

It is estimated that 87% of data science projects never reach production. One of the pitfalls to developing a production-ready machine learning solution is the ability to define if it's an appropriate tool for solving the problem. Not every problem should be solved with machine learning.
Read More

Systems Thinking Resources

Systems thinking is a discipline used to understand systems to provide a desired effect. It provides methods for "seeing wholes and a framework for seeing interrelationships rather than things, for seeing patterns of change rather than static snapshots." The intent is to increase understanding and determine the point of “highest leverage”, the places in the system where a small changes can make a big impact.