Issue with opening the file in downloaded dataset

Hi Dear Course Support Team,

I already downloaded the dataset which is extracted to a folder following by loading it into a Pandas data frame.
However, when I click on fileopen and go inside the corresponding folder to select one of the CSV files containing the description of the columns in the various data files, I see the following error:

May you please take a look at this issue.

The fileopen functionality is reserved for opening notebooks.

You need to use read_csv() function from pandas to open such file.

Hi Sebastian,

Thank you for your reply.
I already tried that but it issues the following error:

According to the error (albeit it would be great if you shown the whole error next time) there’s no data in this file.

Make sure that it has data. Try opening this on your system and see what it contains.
Perhaps during download something went wrong.

Hi Sebastian,

Thank you for follow-up.

All the CSV files inside the zipped folder were loaded successfully according to the following screenshot.

That column description CSV file is the one seems to be problematic issuing the following UnicodeDecodeError (I copy/pasted the whole error here with screenshot for your convenience):

That is why I tried opening the content of that CSV file manually via fileopen and going inside the corresponding folder to see the content of that file which showed an error:
… columns_description.csv is not UTF-8 encoded

UnicodeDecodeError Traceback (most recent call last)
/tmp/ipykernel_37/ in
1 # Columns Description data
----> 2 cols_desc_df = pd.read_csv(‘home-credit-default-risk/HomeCredit_columns_description.csv’)

/opt/conda/lib/python3.9/site-packages/pandas/util/ in wrapper(*args, **kwargs)
309 stacklevel=stacklevel,
310 )
→ 311 return func(*args, **kwargs)
313 return wrapper

/opt/conda/lib/python3.9/site-packages/pandas/io/parsers/ in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
584 kwds.update(kwds_defaults)
→ 586 return _read(filepath_or_buffer, kwds)

/opt/conda/lib/python3.9/site-packages/pandas/io/parsers/ in _read(filepath_or_buffer, kwds)
481 # Create the parser.
→ 482 parser = TextFileReader(filepath_or_buffer, **kwds)
484 if chunksize or iterator:

/opt/conda/lib/python3.9/site-packages/pandas/io/parsers/ in init(self, f, engine, **kwds)
809 self.options[“has_index_names”] = kwds[“has_index_names”]
→ 811 self._engine = self._make_engine(self.engine)
813 def close(self):

/opt/conda/lib/python3.9/site-packages/pandas/io/parsers/ in _make_engine(self, engine)
1038 )
1039 # error: Too many arguments for “ParserBase”
→ 1040 return mapping[engine](self.f, **self.options) # type: ignore[call-arg]
1042 def _failover_to_python(self):

/opt/conda/lib/python3.9/site-packages/pandas/io/parsers/ in init(self, src, **kwds)
67 kwds[“dtype”] = ensure_dtype_objs(kwds.get(“dtype”, None))
68 try:
—> 69 self._reader = parsers.TextReader(self.handles.handle, **kwds)
70 except Exception:
71 self.handles.close()

/opt/conda/lib/python3.9/site-packages/pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.cinit()

/opt/conda/lib/python3.9/site-packages/pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._get_header()

/opt/conda/lib/python3.9/site-packages/pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()

/opt/conda/lib/python3.9/site-packages/pandas/_libs/parsers.pyx in pandas._libs.parsers.raise_parser_error()

UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0x85 in position 1283: invalid start byte

In this case, I would try to play around with encoding argument of read_csv function. No idea which one you should use, perhaps the dataset description contains any info.

By opening, I meant with any software used to work with spreadsheets. They usually give a more meaningful description of the error. It might even autodetect the encoding.
You could also open it with any notepad. Since it’s CSV (Comma Separated Values), you would see, well, the values separated by comma :stuck_out_tongue:

It might also be the case that only this one file got screwed up during download, but I would consider checking other solutions first before redownloading.

Hi Sebastian,

Thank you for your reply.
Sure. I make use of the encoding parameter in read_csv API to deal with files in different formats.

Thank you.