Lesson 1 - Linear Regression with Scikit Learn

I got approx 75% accuracy to my model. How to improve its accuracy.

Hey @yashwanthtelukuntla,
Few steps to improve accuracy/model:

  • Include standardization/scaling of the input data.
  • Apply different algorithms like decision tree, random forest, etc.
  • Increasing the size of the data might help to train a better model(if possible).

Few steps to improve accuracy will be taught in future lectures.


Wow thanks, it worked.
Is it possible to have a r_score of 9 %

In this assignment we have to visualize targets and predictions. I want to plot both of them in a single graph such that I can look at up at that graph and see which values of target and prediction are different.
I can’t figure out which visualization technique I should use.
Can you please help me?

Why my plotly graph is not showing?

Hey @mariamuhammadyounas4,
You may use a scatter plot to plot the targets and a lineplot to plot the predictions and vice versa, or you can use two scatterplots/lineplots for both targets and predictions.
Note this is just a hint and not the only option, there are many different types of plots that can be used.

@tanushagupta4 Please check the post below.

what is the reason behind the error?

Hey @manish-mishra1513,
At first please check if estimate_charges() function is already defined? If yes, then change the variable name estimate_charges in this picture to any other variable name. When you use a variable name the same as that of a function name the function definition is overwritten with the value of the variable.
Tip: You don’t need to post the same thing twice.
Thank You.

What will be the Correlation in this graph ?

The line is parallel to x-axis, which means with increase/decrease in the value of x the value of y does not change. Therefore, the correlation between x and y is 0 (or tending to 0).

what is the reason behind the error.

As you can see there is a KeyError which says that None of the indexes are present in your data…
If you take a look at `medical_df’ you will see there are no columns with the name
‘northeast’, ‘northwest’, ‘southeast’ or ‘southwest’.

As @vinaypratapsingh609 said, the columns are not present in the dataset, so indexing will not work here i.e df[[column_name]]. But you can add new columns in a DataFrame using df[column_name] = data. (Just a Hint)


How can i create model for this exercise? I am confused how to select dataset. what are the steps required for smoker/Non smoker dataset?

Assuming you’ve done previous exercises…you would be having both dataframes with the data of smokers as-well-as non-smokers…
If not You can easily create those by:
smoker_df = medical_df[medical_df.smoker == 'yes']

Now for this Exercise specifically you have to do the same thing on both the datasets that we’ve done on the whole set…
Essentially, it would look something like this…

Now you have to do same for non-smokers and the loss will tell you which model performs better (single or separate)

Got it. Thank you Brother

I have created models for both smoker and non_smoker separately. Can anyone please tell me that my model is correct or wrong?

The code seems Good to me But…
My non-smoker loss was 4573.07498879549
Maybe there can be some differences because both the yours and mines have very low difference So I would say your Model is Good

Maybe @birajde Sir can help us out here…

I am not sure why the values are different, but my guess is because of different data types, probably @vinaypratapsingh609 was using something like float32, whereas @zaid441997 was using something like float64 due to which one had more precision and gave different results.(This is just a guess)

