During lecture video, we have the following code:
year = pd.to_datetime(raw_df.Date).dt.year
train_df = raw_df[year < 2015]
However, I don’t understand how is the Series ‘year’ can be linked to the raw_df?
There is no column with the name ‘year’, so how is that we can select the rows with year < 15 in the second line of code??
Anyone can explain this? Thanks a lot.
You will understand this if you run the
year < 2015 code separately. Basically the year is a series of the raw_df DataFrame. So the index column of Year matches with the index column of the DataFrame. When we do
year<2015 the series gets converted into a boolean series of False and True. We get those rows from the
raw_df where the value of
year<2015 are True.
Ok, the link between year and raw_df stays with the first line. I thought it creates a separate series on its own. Thanks a lot.