哪些数据科学技能最重要?
文章来自微信公众号“科文路”,欢迎关注、互动。转发须注明出处。
by Matt Dancho
| Plan | Skills |
|---|---|
| Machine Learning | Supervised Classification, Supervised Regression, Unsupervised Clustering, Dimensionality Reduction, Local Interpretable Model Explanation-H2OAutomatic Machine Learningparsnip(XGBoostSVM, Random ForestGLM), K-Means, UMAP, recipes, lime |
| Data Visualization | Interactive and Static Visualizations, ggplot2 and plotly |
| Data Wrangling & Cleaning | Working with outliers, missing data, reshaping data, aggregation, filtering, selecting, calculating, and many more critical operations dplyr and tidyr packages |
| Data Preprocessing& Feature Engineering | Preparing data for machine learning, Engineering Features(dates, text, aggregates), Recipes package |
| Time Series | Working with date/datetime data, aggregating transforming, visualizing time series, timetk package |
| Forecasting | ARIMA, Exponential Smoothing, Prophet, Machine Learning(XGBoostRandom Forest GLMnet, etc), Deep Learning(GluonTS), Ensembles, Hyperparameter Tuning, Scaling to1000s of forecasts, Modeltime package |
| Text | Working with text data, Stringr |
| NLP | Machine learning, Text Features |
| Functional | Progamming Making reusable functions, sourcing code |
| Iteration | Loops and Mapping, using Purrr package |
| Reporting | Rmarkdown, Interactive HTML, Static PDF |
| Applications | Building Shiny web applications, Flexdashboard |
| Deployment | Cloud(AWSAzureGCP)Docker, Git |
| Databases | SQL(for data import), MongoLs(for apps) |
都看到这儿了,不如关注每日推送的“科文路”、互动起来~
哪些数据科学技能最重要?