Search a title or topic

Over 20 million podcasts, powered by 

Player FM logo
Artwork

Content provided by Anywhere Club. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Anywhere Club or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.
Player FM - Podcast App
Go offline with the Player FM app!

Coding Interview Questions and Answers - Machine Learning / Mock Interview Show #6

1:34:31
 
Share
 

Manage episode 488096940 series 3457106
Content provided by Anywhere Club. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Anywhere Club or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

In this episode we shift gears into the world of data science and machine learning engineering.
Join us as Mykhailo Kuznietsov steps into a real interview setting, answering 30+ questions covering ML theory, data pipelines, prompt engineering, and more.
Hosted by Oleksii Malashyna and interviewed by Denys Soloviov, this session is packed with ML best practices and an engaging live coding challenge — including transforming raw user data into a structured format using Python.
Whether you're preparing for an ML role or just want to learn how top candidates think on the spot, this episode offers deep insights and honest feedback to help you grow!

NAVIGATION
0:00 - Intro
05:02 - First part. Could you tell more about your experience
13:04 - Second part. What sampling methods to get training data do you know?
17:23 - What is the main disadvantage of simple random sampling?
18:57 - Hand labels and natural labels
23:34 - Problem of lacking the labels
31:51 - What feature engineering operations do you know?
33:19 - Handling missing values. Compare deletion and imputation for it
38:07 - What is the bias/variance tradeoff during training?
40:40 - What is ensemble learning?
43:40 - What is the difference between batch inference and online inference?
46:25 - Compare deployments to the cloud and to edge devices. Their pros and cons.
50:21 - What is the data distribution shift? What methods exist to detect data distribution shifts?
53:45 - How would you standardize the development environment across different workstations?
01:00:17 - What prompt engineering techniques do you know to get more qualitative responses from LLMs?
01:03:54 - What is RAG, its purpose?
01:12:28 - When is it beneficial to fine-tune a language model?
01:14:40 - Third part. Practical task
01:25:34 - Feedback session

WHERE TO WATCH US AND LISTEN
🔸 YouTube: https://youtu.be/cUQpQiX5jcE
🔸 Google Podcasts: https://bit.ly/awclub-en-google
🔸 Apple Podcasts: https://bit.ly/awclub-en-apple
🔸 Spotify: https://bit.ly/awclub-en-spotify
🔸 Download mp3: https://anywhereclub.simplecast.com/episodes/41

ADDITIONAL QUESTIONS

  1. What sampling methods to get training data do you know?
  2. What is the main disadvantage of simple random sampling? (Here I can also ask about stratified sampling, its pros and cons)
  3. What are the hand labels and natural labels? What are the pros and cons of hand labeling?
  4. The label multiplicity problem: how to minimize the disagreement among annotators?
  5. What techniques do you know for handling the lack of labels (shortly tell how each technique works)?
  6. Explain class imbalance problem and approaches to handle it.
  7. What is data augmentation? What problems does it solve?
  8. What feature engineering operations do you know?
  9. Compare deletion and imputation for handling missing values. What are their advantages and drawbacks?
  10. What is data leakage? How to detect it?
  11. What things should we consider while selecting an appropriate algorithm for model training? The model will be used by the app serving external clients. (You can mention and compare specific algorithms while answering this question).
  12. What is the bias/variance tradeoff?
  13. What is cross-validation? Why is it needed?
  14. What is model regularization, its goal? What regularization techniques do you know?
  15. What is ensemble learning? What algorithms do you know that leverage ensemble learning?
  16. What types of neural networks do you know? Map types and suitable tasks for them.
  17. What is the difference between the grid search and randomized search hyperparameter tuning techniques?
  18. What model evaluation methods do you know to ensure that the model can be deployed to the production environment? (Tell about perturbation tests, slice based evaluation and so on)
  19. What is the difference between batch inference and online inference? Tell about use cases.
  20. What model compression methods do you know to reduce the size of a model and reduce inference latency?
  21. Compare deployments to the cloud and to edge devices. Their pros and cons.
  22. What is the data distribution shift? Provide examples. (Here you can also mention that this is one of the reasons why it is important to do model retraining)
  23. What methods exist to detect data distribution shifts?
  24. What techniques do you know for model testing in production?
  25. How would you standardize the development environment across different workstations? What can be done for environment reproducibility and why is it important?
  26. What is Docker? How does it help with autoscaling? Why is Docker Compose needed?
  27. What is the model store, its purpose?
  28. What is the feature store, its purpose?
  29. What prompt engineering techniques do you know to get more qualitative responses from LLMs?
  30. What is RAG, its purpose?
  31. When is it beneficial to fine-tune a language model?
  32. What is parameter efficient fine-tuning and what techniques for it do you know?
  continue reading

40 episodes

Artwork
iconShare
 
Manage episode 488096940 series 3457106
Content provided by Anywhere Club. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Anywhere Club or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

In this episode we shift gears into the world of data science and machine learning engineering.
Join us as Mykhailo Kuznietsov steps into a real interview setting, answering 30+ questions covering ML theory, data pipelines, prompt engineering, and more.
Hosted by Oleksii Malashyna and interviewed by Denys Soloviov, this session is packed with ML best practices and an engaging live coding challenge — including transforming raw user data into a structured format using Python.
Whether you're preparing for an ML role or just want to learn how top candidates think on the spot, this episode offers deep insights and honest feedback to help you grow!

NAVIGATION
0:00 - Intro
05:02 - First part. Could you tell more about your experience
13:04 - Second part. What sampling methods to get training data do you know?
17:23 - What is the main disadvantage of simple random sampling?
18:57 - Hand labels and natural labels
23:34 - Problem of lacking the labels
31:51 - What feature engineering operations do you know?
33:19 - Handling missing values. Compare deletion and imputation for it
38:07 - What is the bias/variance tradeoff during training?
40:40 - What is ensemble learning?
43:40 - What is the difference between batch inference and online inference?
46:25 - Compare deployments to the cloud and to edge devices. Their pros and cons.
50:21 - What is the data distribution shift? What methods exist to detect data distribution shifts?
53:45 - How would you standardize the development environment across different workstations?
01:00:17 - What prompt engineering techniques do you know to get more qualitative responses from LLMs?
01:03:54 - What is RAG, its purpose?
01:12:28 - When is it beneficial to fine-tune a language model?
01:14:40 - Third part. Practical task
01:25:34 - Feedback session

WHERE TO WATCH US AND LISTEN
🔸 YouTube: https://youtu.be/cUQpQiX5jcE
🔸 Google Podcasts: https://bit.ly/awclub-en-google
🔸 Apple Podcasts: https://bit.ly/awclub-en-apple
🔸 Spotify: https://bit.ly/awclub-en-spotify
🔸 Download mp3: https://anywhereclub.simplecast.com/episodes/41

ADDITIONAL QUESTIONS

  1. What sampling methods to get training data do you know?
  2. What is the main disadvantage of simple random sampling? (Here I can also ask about stratified sampling, its pros and cons)
  3. What are the hand labels and natural labels? What are the pros and cons of hand labeling?
  4. The label multiplicity problem: how to minimize the disagreement among annotators?
  5. What techniques do you know for handling the lack of labels (shortly tell how each technique works)?
  6. Explain class imbalance problem and approaches to handle it.
  7. What is data augmentation? What problems does it solve?
  8. What feature engineering operations do you know?
  9. Compare deletion and imputation for handling missing values. What are their advantages and drawbacks?
  10. What is data leakage? How to detect it?
  11. What things should we consider while selecting an appropriate algorithm for model training? The model will be used by the app serving external clients. (You can mention and compare specific algorithms while answering this question).
  12. What is the bias/variance tradeoff?
  13. What is cross-validation? Why is it needed?
  14. What is model regularization, its goal? What regularization techniques do you know?
  15. What is ensemble learning? What algorithms do you know that leverage ensemble learning?
  16. What types of neural networks do you know? Map types and suitable tasks for them.
  17. What is the difference between the grid search and randomized search hyperparameter tuning techniques?
  18. What model evaluation methods do you know to ensure that the model can be deployed to the production environment? (Tell about perturbation tests, slice based evaluation and so on)
  19. What is the difference between batch inference and online inference? Tell about use cases.
  20. What model compression methods do you know to reduce the size of a model and reduce inference latency?
  21. Compare deployments to the cloud and to edge devices. Their pros and cons.
  22. What is the data distribution shift? Provide examples. (Here you can also mention that this is one of the reasons why it is important to do model retraining)
  23. What methods exist to detect data distribution shifts?
  24. What techniques do you know for model testing in production?
  25. How would you standardize the development environment across different workstations? What can be done for environment reproducibility and why is it important?
  26. What is Docker? How does it help with autoscaling? Why is Docker Compose needed?
  27. What is the model store, its purpose?
  28. What is the feature store, its purpose?
  29. What prompt engineering techniques do you know to get more qualitative responses from LLMs?
  30. What is RAG, its purpose?
  31. When is it beneficial to fine-tune a language model?
  32. What is parameter efficient fine-tuning and what techniques for it do you know?
  continue reading

40 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Copyright 2025 | Privacy Policy | Terms of Service | | Copyright
Listen to this show while you explore
Play