BigThinking.io

Skills Needed to Solve Problems with Data : Getting Started

I’m often asked about the skills required to have a career working with data. As a data analyst, you may be responsible for data mining, analyzing existing data assets, and presenting insights based on the needs of your company. As a data scientist, you may also be responsible for identifying methods for acquiring high volume/velocity data, creating machine learning models, and making predictions using mathematical & scientific methods. If you’re interested in careers that solve problems with data, here are some fundamental skills to develop:

Fundamental Skills To Develop

  1. Python – A general purpose programming language, object oriented, extensible with a wide range of libraries. 
    • 66% of data scientists are using Python daily and 84% of them use it as their main language 
    • Top Python libraries: Tensorflow (ML solutions), Keras (deep learning), Scikit-learn (ML), NumPy (data analysis/ML), PyTorch (deep learning)
  2. R – An open-source programming language used for with robust visualization libraries (ggplot2, plotly) 
    • 47% of data scientists are using R. 
    • It is used by 70% of data miners 
    • Good For: statistical analysis and modeling, analyzing structured and unstructured data
  3. Structured Query Language – combines analytics with transactional capabilities.
    • 32% of data scientists use SQL
    • Good for: data management, transactional capabilities.
  4. Data Visualization – for exploration, storytelling and communicating quick insights. 
  5. Statistics – To support a general understanding of probabilities, distribution, sampling, hypothesis testing, confidence intervals, variables
  6. Spreadsheets – The all purpose data review(er)/calculator.
  7. Algebra & Calculus – To support a general understanding of how algorithms work under the hood.

Your Development Space : IDEs & Dev Tools

Tools for writing software.

  • Jupyter notebooks – Provides an interactive programming interface in a notebook environment. Good for: rapid prototyping, visualization.
  • PyCharm – Python IDE can support single or multi file/language projects. Good for: useful for writing code for production.
  • VS Code for Python – Python IDE based on Visual Studio
  • Github – version control, tracking and recording code changes
  • R Studio – IDE for R.

Safe Spaces to Practice : Communities & Open Datasets

Finding open communities & data sources for practice projects.

  • Google Dataset Search – over 25 million datasets indexed
  • Data.gov – Open data lake provided by the U.S. Government
  • Kaggle – Real-world datasets provided to the kaggle community for collaborative problem solving

Thinking About Solving Problems With Data

Here are more resources and considerations for people that want to solve problems with data.

Kishau Rogers

Kishau Rogers is the editor and founder of the bigThinking project. bigThinking is a resource and collaborative innovation center which promotes the principles of systems thinking. Our mission is to empower the next generation of innovators to think bigger, to think better, and to create solutions that make a significant impact in the areas that matter. Kishau Rogers is an award-winning entrepreneur with a deep background in Computer Science, over twenty-five years of experience in the technology industry, and more than 15 years of entrepreneurial leadership. She currently serves as the Founder & CEO of Time Study, Inc., a high-growth startup offering solutions for using machine learning, advanced natural language processing, and data science to automatically tell a story of how enterprise employees spend their time.

Subscribe

Join our community of bigThinkers! Subscribe to learn, share and receive resources to apply to wicked problems.

Follow us

Don't be shy, get in touch. We love meeting interesting people and making new friends.