How to organize and execute ML and DS code. Learn more about it here.
This may be more relevnat for the team Lead, but it is important to know how github works and how your team can manage its code. Here is a guide on team collaboration with github. You can utilize a team github repository to host team notebooks and other scripts.
Your team will need access to GPUs in order to train deep learning networks. There are a number of ways to access GPUS....
This article does a good job at explaining the typical workflow of a ML/DS project
This first step is where the objective is defined. An understanding of how the machine learning system's solution will ultimately be used is important. This step is also where comparable scenarios and current workarounds to a given problem are discussed, as well as assumptions being contemplated, and the degree of need for human expertise determined. Other key technical items to frame in this step include determining which type of machine learning problem (supervised, unsupervised, etc.) applies, and adopting appropriate performance metric(s).
This step is data-centric: determine how much data is needed, what type of data is needed, where to get the data, assess legal obligations surrounding data acquisition... and get the data. Once you have the data, ensure it is appropriately anonymized, make certain you know what type of data it actually is (time series, observations, images, etc.), convert the data to a format you require of it, and create training, validation, and testing sets as warranted.
This step in the checklist is akin to what is often referred to as Exploratory Data Analysis (EDA). The goal is to try and gain insights from the data prior to modeling. Recall that in the first step assumptions about the data were to be identified and explored; this is a good time to more deeply investigate these assumptions. Human experts can be of particular use in this step, answering questions about correlations which may not be obvious to the machine learning practitioner. Studying features and their characteristics is done here, as is general visualization of features and their values (think of how much easier it is, for example, to quickly identify outliers by box plot than by numerical interrogation). Documenting the findings of your exploration for later use is good practice.
Time to apply data transformations you identified as being worthy in the previous step. This step also includes any data cleaning you would perform, as well as both feature selection and engineering. Any feature scaling for value standardization and/or normalization would occur here as well.
Time to model the data, and whittle the initial set of models down to what appear to be the most promising bunch. (This is similar to the first modeling step in Chollet's process: good model → "too good" model, which you can read more about here) Such attempts may involve using samples of the full dataset to facilitate training times for preliminary models, models which should cut across a wide spectrum of categories (trees, neural networks, linear, etc.). Models should be built, measured, and compared to one another, and the types of errors made for each model should be investigated, as should the most significant features for each algorithm used. The best performing models should be shortlisted, which can then be fine-tuned afterwards.