Course Project on Exploratory Data Analysis - Discuss and Share Your Work

I got this error

2 Likes

Now it’s problem with the date parsing. Try to change the '12hr' argument to ’24hr' (where you call rawToDf function).

Thank you I got no error now. You are the best. Thanks a lot.

image

I am not getting any data. What to do . Is it because I have,nt uploaded the data properly. If so how should I upload.

On jovian.ai page, in your notebook files, can you see your file being there?
Did you run binder after uploading it?

I am not running using jovian.ai. I am using my jupyter notebook.

I just wanna analyse it personally . I do not want to share

2 Likes

OK, so are you sure this file is located next to your notebook? (inside the same directory)

Try running this:

import os
os.path.exists("chat.txt")

image

I got this

So the file is there.

My next guesses:

  1. The file is empty for some reason.
  2. The data format changed so the rawToDf is no longer valid.

In the first case you should make sure that the data exporting is successful.

In the second case I would try and place some print functions inside the rawToDf function, printing some intermediate values (like usernames, date_time, user_msg, msgs). This is to check if anything gets processed actually.
PLEASE NOTE that if this is data from your whatsapp account, you might not want to share this publicly because these variables will contain your messages etc.

If you want you can PM me and we’ll try to resolve it privately, because it seems like this problem actually leaves the scope of usual python errors, and we might end producing hundreds of messages here :stuck_out_tongue:.

im texting u. plz see

1 Like

How to save a dataset for repeated use? I have created a folder and saved a dataset there, next time when I see, it is gone. Again I saved it in artifacts, but unable to retrieve it. Pls help. Uploading the dataset repeatedly is annoying.

4 Likes

I wonder if you have tried to install this module.

3 Likes

I found a very interesting dataset on Kaggle, but the file size is huge due to a great amount of measurements in rows (about 10GB).
I’m almost certain that uploading that .csv file to the Jupyter notebook is not viable. I’m running the code locally, but I’m not sure if that’s allowed.
How should I proceed in this case?

1 Like

Hey guys, im having trouble loading the data from the “recommended datasets for the course project” to my jovian project. I know we went over the urlretrieve but I don’t know how to get the desired url. Is it better to download the data onto my computer? if someone could give me some guidance on how to load the datasets that would be much appreciated! Thank you!

1 Like

Can I tag team with my friend, who is taking the same course here, in the Course project. Our project would be same. Would it be problematic?

Why use Jupyter local use global and fetch that 100gb data via put the link.

Even I’m encountering the same issue, can some please help as I’m not able to sort values in a Series.

1 Like

Just my 2 cents. I guess you can look into the data and make a digestible sample. You can cut by rows or columns or even both.

Personally, I don’t think we need to have a very big data set or sets to come up with a good project. On contrary, playing with a big data set or sets is both time consuming, and even worse, they have a higher chance in clouding your thoughts and investigation.

Welcome! Even with the same set of data among 10 people, I don’t think any one of the assignments will be similar. Everyone may have a different set of questions and approaches.

To secure that you both won’t hit the wall, you can sit down together and brainstorm the questions. With 2-4 directional question set, you should be safe. After all, you only need to answer 5 questions minimum. Also, you can do some operations to go through the data set. Unless the data set itself is too small, you both should go different directions in either part.

Explore and enjoy!

1 Like

I don’t quite understand your issue. Instead, here is an example, and hope it helps you to resolve your issue.

t = pd.Series([10, 2, 4, 6, 1, 4, 2, 9, 7, 9])
t.sort_values()