• Magdalena Konkiewicz

Machine Learning Data Sets



Introduction

You have read several articles about data analysis, then you have followed several tutorials and watched people applying basic machine learning algorithms. Now you are ready to try some of this technique yourself but where do you start? You do not have millions of rows of data sitting on your laptop waiting for analysis. In this article, you will be introduced to five different data sources you can search to find your first data set that will be perfect for a Data Science project.



Kaggle

Kaggle is probably the most popular resource where inspiring or existing Data Scientists find data sets for side projects. Currently, there are almost 25000 publicly available data sets on this website and they range across a variety of topics. You can search for data sets using keywords and see the work of other data scientists that have worked on the data set if they have decided to share their work by creating a publicly available kernel. Additionally, this website hosts Machine Learning competitions in case you would like to compete for prize awards with other Data Scientists around the globe and once you have learned enough you can also search their job board to find Data Science related jobs.



DrivenData

DrivenData is a website similar to Kaggle competition as the data sets are published in the form of prize contests. They do not have such a big database but their main theme is to make a difference to the world and social impact. Therefore many find their data sets are inspiring and definitively worth working on.



UCI Machine Learning Repository

UCI Machine Learning Repository has currently almost five hundred data sets ready for you to download and start using. It is definitely worth searching their data sets are they are mostly clean and well organized.



US Gov Data

US Gov Data is another place with large number of data sets. You can currently browse through more than 250 000 different government data sets. They are divided into categories such as Agriculture, Climate, Consumer, Ecosystems, Education, Energy, Finance, Health, Local Government, Manufacturing, Maritime, Ocean, Public Safety, Science & Research.



Conclusion

I have presented my favorite resources for finding Machine Learning and Data Science data sets. Now it’s your time, so get browsing, downloading, and loading data in your jupyter notebooks.


13 views0 comments

Recent Posts

See All