Skills Needed to Solve Problems with Data : Getting Started

I’m often asked about the skills required to have a career working with data. As a data analyst, you may be responsible for data mining, analyzing existing data assets, and presenting insights based on the needs of your company. As a data scientist, you may also be responsible for identifying methods for acquiring high volume/velocity data, creating machine learning models, and making predictions using mathematical & scientific methods. If you’re interested in careers that solve problems with data, here are some fundamental skills to develop:

Fundamental Skills To Develop

  1. Python – A general purpose programming language, object oriented, extensible with a wide range of libraries. 
    • 66% of data scientists are using Python daily and 84% of them use it as their main language 
    • Top Python libraries: Tensorflow (ML solutions), Keras (deep learning), Scikit-learn (ML), NumPy (data analysis/ML), PyTorch (deep learning)
  2. R – An open-source programming language used for with robust visualization libraries (ggplot2, plotly) 
    • 47% of data scientists are using R. 
    • It is used by 70% of data miners 
    • Good For: statistical analysis and modeling, analyzing structured and unstructured data
  3. Structured Query Language – combines analytics with transactional capabilities.
    • 32% of data scientists use SQL
    • Good for: data management, transactional capabilities.
  4. Data Visualization – for exploration, storytelling and communicating quick insights. 
  5. Statistics – To support a general understanding of probabilities, distribution, sampling, hypothesis testing, confidence intervals, variables
  6. Spreadsheets – The all purpose data review(er)/calculator.
  7. Algebra & Calculus – To support a general understanding of how algorithms work under the hood.

Your Development Space : IDEs & Dev Tools

Tools for writing software.

  • Jupyter notebooks – Provides an interactive programming interface in a notebook environment. Good for: rapid prototyping, visualization.
  • PyCharm – Python IDE can support single or multi file/language projects. Good for: useful for writing code for production.
  • VS Code for Python – Python IDE based on Visual Studio
  • Github – version control, tracking and recording code changes
  • R Studio – IDE for R.

Safe Spaces to Practice : Communities & Open Datasets

Finding open communities & data sources for practice projects.

  • Google Dataset Search – over 25 million datasets indexed
  • – Open data lake provided by the U.S. Government
  • Kaggle – Real-world datasets provided to the kaggle community for collaborative problem solving

Thinking About Solving Problems With Data

Here are more resources and considerations for people that want to solve problems with data.

Related Posts
Read More

Simulation & Modeling Resources

Simulation and computer modeling tools allow engineers to model and evaluate real world events in a computer generated environment. Here are a few simulation tools and projects to get started:
Read More

Characteristics of Machine Learning solutions

It is estimated that 87% of data science projects never reach production. One of the pitfalls to developing a production-ready machine learning solution is the ability to define if it's an appropriate tool for solving the problem. Not every problem should be solved with machine learning.