Where to find datasets?
You can find interesting datasets on Kaggle:
The data should be in CSV format and should contain at least 3 columns and 150 rows.
You can also create a new dataset on Kaggle by uploading a CSV file here: https://www.kaggle.com/datasets?new=true (make sure to keep your dataset public, otherwise it will not be downloadable)
How to download a dataset within Jupyter?
Datasets can be downloaded withing Jupyter using the opendatasets
Python library. Here’s some sample code for downloading the US Elections Dataset:
import opendatasets as od
dataset_url = 'https://www.kaggle.com/tunguz/us-elections-dataset'
od.download('https://www.kaggle.com/tunguz/us-elections-dataset')
Some interesting datasets
- Video Games sales: https://www.kaggle.com/gregorut/videogamesales
- World University Rankings: https://www.kaggle.com/mylesoneill/world-university-rankings
- Netflix Tv shows and Movies: https://www.kaggle.com/shivamb/netflix-shows/notebooks
- StackOverflow Developer Survey: https://www.kaggle.com/stackoverflow/stack-overflow-2018-developer-survey
- Google Play Store Android Apps Data: https://www.kaggle.com/lava18/google-play-store-apps
- Indian Stock Market Data: https://www.kaggle.com/rohanrao/nifty50-stock-market-data
- Indian Air Quality: https://www.kaggle.com/rohanrao/air-quality-data-in-india
- Worldwide Covid-19 Cases: https://www.kaggle.com/imdevskp/corona-virus-report
- USA Covid-19 Cases: https://www.kaggle.com/sudalairajkumar/covid19-in-usa
- US Election Results (2012): https://www.kaggle.com/tunguz/us-elections-dataset
- US Stock Market: https://www.kaggle.com/borismarjanovic/price-volume-data-for-all-us-stocks-etfs/
- Crop production in India: https://www.kaggle.com/srinivas1/agricuture-crops-production-in-india
- Agricultural raw material prices: https://www.kaggle.com/kianwee/agricultural-raw-material-prices-19902020
- Agricultural land values: https://www.kaggle.com/jmullan/agricultural-land-values-19972017
- Digital payments in India: https://www.kaggle.com/lazycipher/upi-usage-statistics-aug16-to-feb20
- US Unemployment Rate Data: https://www.kaggle.com/jayrav13/unemployment-by-county-us
- India Road accident Data: https://community.data.gov.in/statistics-of-road-accidents-in-india/
- Data Science Jobs Data:
- Youtube Trending Videos: https://www.kaggle.com/datasnaek/youtube-new
- Asteroid Dataset: https://www.kaggle.com/sakhawat18/asteroid-dataset
- Solar flares Data: https://www.kaggle.com/khsamaha/solar-flares-rhessi
- F-1 Race Data: https://www.kaggle.com/cjgdev/formula-1-race-data-19502017
- Automobile Insurance: https://www.kaggle.com/aashishjhamtani/automobile-insurance
- PUBG video game matches: https://www.kaggle.com/skihikingkevin/pubg-match-deaths
- CounterStrike GO (video game)
- Dota 2 (video game): https://www.kaggle.com/devinanzelmo/dota-2-matches
- Cricket One-Day Internationals Data: https://www.kaggle.com/jaykay12/odi-cricket-matches-19712017
- Cricket Indian Premier League Data: https://www.kaggle.com/nowke9/ipldata
- Basketball (NCAA): https://www.kaggle.com/ncaa/ncaa-basketball
- Basketball NBA Players Stats: https://www.kaggle.com/ncaa/ncaa-basketball
- Football datasets:
- Hotel Booking Demand: https://www.kaggle.com/jessemostipak/hotel-booking-demand
- New York Airbnb listings: https://www.kaggle.com/dgomonov/new-york-city-airbnb-open-data
Other sources to look for datasets:
If you use an external source other than Kaggle, you’ll create a new dataset on Kaggle by uploading a CSV file here: https://www.kaggle.com/datasets?new=true (make sure to keep your dataset public, otherwise it will not be downloadable using opendatasets
)
Downloading Personal data for EDA
You can also analyze your own personal data for exploratory data analysis, from the following sources:
- Whatsapp Chat data https://jovian.ai/PrajwalPrashanth/whatsapp-chat-data-analysis/v/10#C2
- Google Apps data from https://takeout.google.com/
- Chrome (https://medium.com/free-code-camp/understanding-my-browsing-pattern-using-pandas-and-seaborn-162b97e33e51)
- Contacts
- Calendar
- Drive
- Fit
- Google Pay
- Maps
- … - Data from Apple’s Apps https://appleinsider.com/articles/18/05/23/how-to-request-your-personal-data-using-apples-data-privacy-portal
- Instagram Data https://www.instagram.com/accounts/login/?next=/download/request/
- Fitbit Data https://help.fitbit.com/articles/en_US/Help_article/1133.htm
- LinkedIn Data https://www.linkedin.com/help/linkedin/answer/50191/downloading-your-account-data?lang=en
- Shopping analysis, Amazon data https://www.amazon.com/gp/help/customer/display.html?nodeId=G5NBVNN2RHXD5BUW
- Spending analysis, check your bank’s website you would able to export CSV/excel statements for at least a year.
Use this thread for sharing interesting datasets.