Possible extra topics#

One of the rubric items for the course project is to include something “extra” that wasn’t covered in Math 10. Here are a few possibilities. It’s even better if you find your own extra topic; it can be anything in Python that interests you.

K-Nearest Neighbors#

An understandble supervised machine learning model is K-Nearest Neighbors. It can be used for classification or regression, but is typically used for classification. This topic also provides a good example of the potential for overfitting (when a small number of neighbors is used). There is some information about this topic in the course notes by Christopher Davis from Winter 2022.

Neural Networks#

These are a fundamental (maybe the most fundamental) area of modern Machine Learning. If you want to try learning about them, that would be a great extra topic. This 3Blue1Brown video is a great introduction. For the interactive visualization and exploration of neural networks, you can refer here.

Choosing parameters#

scikit-learn user guide. A Machine Learning topic I would like to understand better is how to choose parameters (for example, the number of clusters when doing clustering, or the depth of a decision tree). That link provides some guidance, but it is a big topic and there are many different approaches.

pandas styler#

See these examples in the pandas documentation. This provides a way to highlight certain cells in a pandas DataFrame, and is good practice using apply and applymap.

Kaggle#

A general way to get ideas is to browse Kaggle. Go to a competition or dataset you find interesting, and then click on the Code tab near the top. You will reach a page like this one about Fashion-MNIST. Any one of these notebooks is likely to contain many possibilities for extra topics.

Big Data(sets)#

Deepnote does not allow files bigger than 100mb to be uploaded. Many real-world datasets are bigger than this. Deepnote does definitely work with larger datasets. If you end up using a larger dataset, describe how you made it work in Deepnote. Some general guidelines are listed in the Deepnote documentation.

Different Python libraries#

If you want to use a Python library that isn’t by default installed in Deepnote, you can install it yourself within Deepnote, using a line of code like the following, which installs the vega_datasets library. Notice the exclamation point at the beginning (which probably won’t appear in the documentation you find for the library).

!pip install vega_datasets

Other libraries#

Here are a few other libraries that you might find interesting. (Most of these are already installed in Deepnote.)

  • sympy for symbolic computation, like what you did in Math 9 using Mathematica.

  • Pillow for image processing.

  • re for advanced string methods using regular expressions.

  • Seaborn and Plotly. We introduced these plotting libraries briefly together with Altair early in the quarter, and we have used Seaborn frequently for importing datasets. Their syntax is similar to Altair.

  • ipywidgets provides a way to add interactivity to a Jupyter notebook, but last I checked, not all of it works in Deepnote.

ChatGPT#

You could get help writing your project from ChatGPT, documenting along the way how the process is working. That would be interesting, just keep it clear what is your work and what is provided by ChatGPT.

Created in deepnote.com Created in Deepnote