哪些数据科学技能最重要?
文章来自微信公众号“科文路”,欢迎关注、互动。转发须注明出处。
by Matt Dancho
Plan | Skills |
---|---|
Machine Learning | Supervised Classification, Supervised Regression, Unsupervised Clustering, Dimensionality Reduction, Local Interpretable Model Explanation-H2OAutomatic Machine Learningparsnip(XGBoostSVM, Random ForestGLM), K-Means, UMAP, recipes, lime |
Data Visualization | Interactive and Static Visualizations, ggplot2 and plotly |
Data Wrangling & Cleaning | Working with outliers, missing data, reshaping data, aggregation, filtering, selecting, calculating, and many more critical operations dplyr and tidyr packages |
Data Preprocessing& Feature Engineering | Preparing data for machine learning, Engineering Features(dates, text, aggregates), Recipes package |
Time Series | Working with date/datetime data, aggregating transforming, visualizing time series, timetk package |
Forecasting | ARIMA, Exponential Smoothing, Prophet, Machine Learning(XGBoostRandom Forest GLMnet, etc), Deep Learning(GluonTS), Ensembles, Hyperparameter Tuning, Scaling to1000s of forecasts, Modeltime package |
Text | Working with text data, Stringr |
NLP | Machine learning, Text Features |
Functional | Progamming Making reusable functions, sourcing code |
Iteration | Loops and Mapping, using Purrr package |
Reporting | Rmarkdown, Interactive HTML, Static PDF |
Applications | Building Shiny web applications, Flexdashboard |
Deployment | Cloud(AWSAzureGCP)Docker, Git |
Databases | SQL(for data import), MongoLs(for apps) |
都看到这儿了,不如关注每日推送的“科文路”、互动起来~
哪些数据科学技能最重要?