Coding, Cats, and Calculations: Navigating Life with Data and Mathematics

Image generated using GPT-4 (DALLE-3) 

GPT-4 for Parents: Having fun and being responsible with AI - 9 ideas to try 

12/5/2023 

By: Rebecca Hadi (assisted by GPT-4) 

The world is buzzing with news about Open AI and GPT. It can be hard to know how this tool can plug in to your daily life, and do you need to be an engineer or data scientist to use it?  For parents, there are several fun applications that are great for kids. It can also be a mechanism to teach kids important tips about plaragism, quality of sources, and conducting their own research. 

Here are some to help you get started. 

Get started: https://chat.openai.com (assumes you have access to GPT-4, if not GPT 3.5 is suitable for most use cases) 

The fun ideas 

1. Crafting Custom Bedtime Stories

Imagine tucking your child into bed with a unique story that's never been told before. With GPT-4, you can create personalized bedtime stories. Just provide a few details like your child's favorite animals, colors, or a magical world they dream of, and voila! You have a story that speaks directly to your child's imagination. 

Sample Prompt: Write a bedtime story for a 5 year-old featuring sharks and volcanos. 

2. Unleashing Creativity with Image Generation

Kids love to see their imaginations come to life, and the image generation feature can do just that. Whether it's a drawing of a fantastical creature they've described or a scene from the story you just read, GPT-4/DALLE-3 can generate images that will amaze and inspire them. It's a wonderful way to visually engage your child with their own creativity.

Sample Prompt: Generate an image of a cat looking out the window into space, synthwave style, digital painting 

3. Learning and Education

GPT-4 is a treasure trove of knowledge. Whether your child has questions about space, dinosaurs, or historical events, GPT-4 can provide informative and kid-friendly explanations. It can also help with language learning, offering practice in reading, writing, and even conversing in new languages.

Sample prompt: Where does rain come from? 

4. Interactive Games and Puzzles

Keep your kids entertained and intellectually stimulated with custom-made puzzles and games. GPT-4 can create crossword puzzles, word searches, or trivia quizzes on topics your child is interested in, making learning fun and interactive.

Sample Prompt: Create a crossword puzzle for a 5 year old, with the theme of video games

5. Party Planning and Inspiration

Planning a child's birthday party? Use GPT-4 for creative party theme ideas, decoration suggestions, and even unique recipes for party snacks. It can help brainstorm fun activities and games, ensuring your child's party is a hit.

Sample prompt: Give me 20 ideas for a 5 year old's birthday party. 

6. Art and Craft Ideas

Stuck for ideas on a rainy day? Ask GPT-4 for craft and art project suggestions suitable for your child's age. It can guide you through the steps, making for a fun and educational activity you can do together.

    Sample prompt: What arts and crafts can I do easily at home with my child? Give me 15 ideas.  

7. Homework Helper

For older kids, GPT-4 can be a helpful resource for homework. It can explain complex concepts in a simpler way or provide a different perspective on a topic they're learning about in school.

   Sample Prompt: Explain GPT models to a 5 year old. 

The responsible ideas 

8. Teaching Children about Skepticism and GPT's Limitations

While GPT-4 is a powerful tool, it's crucial to teach children about its limitations and the importance of skepticism. GPT-4, like any AI, can "hallucinate" or generate information that isn't accurate. This presents an excellent opportunity to educate kids about critical thinking and fact-checking.

Understanding AI Hallucinations

Explain to your children that GPT-4 can sometimes make mistakes or create facts that aren't true. This is known as "hallucination" in AI terminology. Use simple examples to show how the AI might be incorrect and emphasize the importance of double-checking information, especially for school projects or learning new facts.

Encouraging Critical Thinking

Use GPT-4's outputs as a springboard for discussions on critical thinking. Ask your child whether they think the information provided by the AI is accurate and why. Encourage them to look up information in books or trusted online sources to verify or dispute what GPT-4 says.

Balancing Technology and Human Judgment

Help your children understand that while technology like GPT-4 is powerful, human judgment and critical thinking are irreplaceable. Encourage them to use both the AI's capabilities and their own reasoning skills to come to conclusions.

Safe Internet Practices

GPT-4’s interactions can also be a lesson in safe internet practices. Remind your children not to share personal information with any online platform, including AI tools like GPT-4.

9. Instilling integrity of work 

Using GPT-4 with your children also presents an invaluable opportunity to teach them about the integrity of work. Understanding the difference between assistance and plagiarism, and developing a sense of ownership and originality, is crucial in the digital age.

Defining Assistance vs. Plagiarism

It's essential to clarify to your children when it's appropriate to use AI like GPT-4 for help and when it crosses the line into plagiarism. Explain that using GPT-4 to generate ideas or understand a concept is fine, but passing off AI-generated content as their own in school assignments or projects is not ethical.

Fostering Original Thinking

While GPT-4 can be a great tool for inspiration, encourage your children to add their own thoughts, ideas, and creativity to what the AI provides. This practice helps in nurturing their original thinking and creativity, making them not just consumers of AI-generated content but also innovators and creators in their own right.

Understanding Authorship and Credit

Teach your children the importance of giving credit where it's due. If they use ideas or content generated by GPT-4, they should understand how to acknowledge it. This practice not only upholds academic integrity but also teaches them respect for others' work, a value that extends beyond academic boundaries.


While keeping these tips in mind, GPT-4 can provide hours of fun for you and your little ones!  Believe it or not, a majority of this article was written by GPT-4, and revised by me. It's amazing what these models can accomplish and fit into daily life at a variety of age levels. Start using GPT-4 (or 3.5) in your life, responsibly, today! 

Estimating the proportion of peanuts in a bag of trail mix 


Yesterday, I was eating trail mix and it got me thinking about the importance of representative samples. If I drew a handful of trail mix and got back only peanuts, analyzed their contents, and tried to generalize to a true population that included peanuts and raisins, I would have a bad time. While this is a simplistic example, it underscores the importance of a representative sample when it comes to inference in our every day lives (e.g. surveys, poll estimates).  I'm a big fan of the bootstrap, so I decided to simulate the estimation of the proportion of peanuts using the hypergeometric distribution (where the 'bag' is full of raisins and peanuts, and I draw a handful of size 4 without replacement).  I modeled the true number of raisins and peanuts as Poission processes.   


As you would expect, as the number of samples increases, the estimated mean (the blue line) converges to the true mean (the red line). I love to see the central limit theorem in action!  The results are in the image below. 


Code here: 

https://github.com/bhadi26/trail-mix-bootstrap/blob/main/trail-mix.R

Disneyland! gganimate! ggmap! oh my. 

In a few weeks, my family and I are going on vacation to Disneyland and California Adventure in California.  I started to think about how the trip could be optimized based on distance between rides.  The first step in this was to get the coordinates of rides (e.g. Splash Mountain, Star Wars Rise of the Resistance) and plot those using ggmap. 

Then, I created a random ordering of rides and used gganimate to simulate traveling between rides.  I currently have this set up for 12 that I manually pulled coordinates for using google maps, which I then saved to a csv file.  

The starting place is the entrance to the park, then I loop through each 'candidate ride', calculate the distance (using distGeo), then remove that ride from consideration, and calculate the distance to the remaining rides and repeat. 

The final ride order based on this approach and the ones I have recorded is below.  

> df$name

 [1] "Space Mountain"                     "Buzz Lightyear Astro Blasters"      "Finding Nemo"                      

 [4] "Mr Toad Wild Ride"                  "Sleeping Beauty Castle Walkthrough" "Dumbo"                             

 [7] "It's a Small World"                 "Big Thunder Railroad"               "Splash Mountain"                   

[10] "Pirates of the Carribean"           "Main Street"                        "Buzz Lightyear Astro Blasters"     

[13] "Star Wars: Rise of the Resistance" 


Limitations and potential next steps: 


Code can be found here: 

https://github.com/bhadi26/disney-trip/blob/main/disney-trip.R


rstudio::conf(2022) 

A few weeks ago I had the opportunity to attend and present at rstudio::conf(2022) in Washington, D.C.  It was easily the most inclusive conference I have ever attended. One of my favorite things was the "PacMan" style of talking - where you leave an open space in a circle of people so others can jump in.  The inclusivity was also evident in the food choices as there was a mixture of meat, vegetarian, and vegan options! Including the best root vegetable casserole I've ever tasted. 

I also presented at the conference! My talk was about Query Optimization and the recording can be found here. The highlights were to use an explain plan and use distribution keys and sort keys.  

I highly recommend checking out the keynote talks - specifically about Quarto and the past and future of Shiny!  As someone who loves to build Shiny apps, it's exciting that Python users will also be able to get in on the fun.

  

Graduation 

After three years, I finished my Master's degree in Applied and Computational Mathematics from Johns Hopkins University (while working full time!).  The graduation ceremony in Baltimore was wonderful. It is a surreal feeling.  It's bittersweet as I love being a student and learning new things, but it will be nice to have a lot more free time (and maybe finish some side projects?).  I'll also miss my institutional access to research papers.  

I had the opportunity to work on some fun projects through my coursework that I uploaded to my Github.  The implementations use a mixture of R and Python.  

Diabetes Prediction Using Probabilistic Graphical Models 

Mortality Prediction in Heart Failure Patients Admitted to the Intensive Care Unit (ICU)

Kalman Filters in Remote Patient Monitoring: A Review and Application of Literature 


RMarkdown is my favorite mechanism for writing papers as it makes beautiful documents.  There's a lot of R in my life lately, as at the end of July I'll be speaking at rstudio::conf(2022) in Washington DC.  It will be my first in-person conference since COVID.  


Until next time. 

Book Topics using Project Gutenberg 

There has been a lot of discussion in the media recently about banned books. I won't pretend that I'm educated on this topic, however I do believe that it's important to learn from the past (especially the ugliest parts of history) and literature is a great way to achieve this. With some books being banned in some areas, I wanted to think of a way to quickly summarize topics for a given book to try to extract meaning/topics/key terms. 

Building upon code by Andrea Perlato, I created a Shiny app that takes a book title as an input and returns topics and terms.  TW: some terms may be culturally explicit.  

Try it out! The book must be available on Project Gutenberg because of how the script sources text data. An improvement on this project could be leveraging a new data source to be able to model more books. The topic modeling is done using Latent Dirichlet Allocation (LDA) and implemented in R.  


PS - if you've never built a Shiny app (and like programming), I highly recommend it. They are so fun with lots of code examples online! 

Book Ranking - 2021 Edition

In 2021, I tried to keep data as the year progressed on books I read and how I felt about them at the time.  Emphasis on try - I'm disappointed that I didn't capture start/end dates very well.  I suppose I will have to give in to syncing my Kindle with GoodReads so that type of information is more easily accessible.  With the exception of LOTR, this list contains net new reads.  

Here it is. 

The Champions (Top 5)

The Others 


Cheers to another year of life, literature, and better data collection.

Our House is on Fire 

What action will you take today to reduce your carbon footprint?

Like many others, I regularly experience climate related anxiety.  It feels overwhelming and hopeless.  I worry about my son's future. I worry about suffering on a global scale.  Worry often turns into spiraling and hyperventilating panic and it can be difficult to have hope for humanity. 

Today, the Fridays for Future organization organized a global climate strike.   I am inspired by this group of young people. They motivate me to be an agent of change instead of a powerless victim.

Lacking the courage to protest in person, I took today off work to research tangible ways I can make a difference, and take action based on that research.  I'm grateful to work at a company where I am able to take the time to do this - a luxury I realize many do not have.   In light of my privilege, I'm writing this post to share my findings. 

Take action today by:

Take action over time by: 


We can do this.  


Sources: 

The Risk of Christmas During a Pandemic 

As we continue to find ourselves in the midst of a terrible pandemic, it can be difficult to navigate holiday plans. So far in the US, ~215K people have died from Covid-19.  It is imperative that we do everything in our power to prevent the spread of the virus, inclusive of making the difficult choice to limit exposure during traditional gathering times (e.g. Thanksgiving, Christmas) and wearing masks. 

One way to remediate the risk of gatherings is to have attendees get tested. We know that the tests in place today are not perfect, with a sensitivity ranging from 80%-90% depending on the type of test (rapid antibody vs. RT-PCR). My inner statistician had the urge to estimate the likelihood of exposure given that all event attendees are tested for the virus using a mixture of Bayes' Theorem and the Binomial Distribution.  

Find it here: https://github.com/bhadi26/covid-christmas/blob/master/Covid-19-Christmas.pdf 


Gitlab repo: https://github.com/bhadi26/covid-christmas

Neural Networks

May 26, 2020

It's been a while since my last post. My focus for the past year has been on coursework toward my master's degree in Applied and Computational Mathematics at Johns Hopkins University. Between school, work, and family, I haven't had as much time to work on my side projects.  In the past year I have taken courses related to statistical methods and data analytics (highly reminiscent of Actuarial Exam P), Linear Algebra, Statistical Models and Regression, and most recently, Neural Networks.  One of the assignments in the class was to code a multi-layer perceptron feed-forward backpropagation network from scratch (e.g. Numpy in Python). 

My code can be found here: https://github.com/bhadi26/neural-net/blob/master/NodeLayerClass.py 

The FFBP network is a type of supervised learning because there is a desired output. The network takes inputs and brings them through the hidden layer of the network with weights at each edge, then to the output node or nodes.  FFBP networks are a good candidate for regression or classification problems, but have other use cases. 

Example Multi-Layer Perceptron Topology

Spotify Year in review: The API Version

January 23, 2019

This post has been a long time in the making. Back in November, I thought it would be fun to explore the Python package spotipy that could export data from Spotify on a variety of metrics (e.g. top artists, songs, recent listening history).


Work and family life became busy over the course of November and December and other things took precedence (such as Elon getting back-to-back ear infections, poor bud!).  In that time, Spotify released Spotify Wrapped ,  showcasing user data.  I'd be very surprised if behind the hood the Year in Review leveraged some of the API data.   What a fun exercise in personalization that allows me to bask in a subtle and sharable narcissism.


This post is going to be much less interesting now that you just go look at your own Spotify Year in Review. But if you want to see a version using Python, today is your lucky day.   If I had more time, I would produce a word cloud visualization of the genre data I end up exporting. I'm going to be starting my master's program in Applied and Computational Mathematics via Johns Hopkins University next week, so it may be a while before I pick back up on this Spotify project.  Go Blue Jays!   I feel very fortunate to live in the age of virtual learning.



It's about time

December 16, 2018

This time on the blog, I explore time series forecasting using R the packages forecast and timeseries (along with the tidyverse) on data from Washington DC's bike share program and the counts of bike shares over time.   Given that its almost Christmas, it felt right to do a post dealing with seasonality.

Click the link below to be directed to the hosted HTML file of my R markdown file.

In other news, I am working on another post using Python to extract data using Spotipy, a Python library for accessing Spotify's API, and then get the top most recent artists.  Similar idea to Spotify's end of the year summary, which I suspect the API has a similar code base.


Code on gitlab: https://github.com/bhadi26/time-series

Baby Elon

November 2, 2018

It's been a quiet summer on the blog. My first child, Elon, was born in June, and now that the newborn phase has ended and I have (eagerly) returned to work, it's time to get back to blogging! 

Check out the post below on my analysis of some of the data we captured for Elon during his first few months of life.  Data are everywhere! Even in your offspring. 



Game of Thrones - You win or you die

May 10, 2018

While eagerly awaiting for season 8 of Game of Thrones, or The Winds of Winter to be released, check out a quick analysis I did on the percentage of living members by house allegiance. Shout out to whomever gathered these data (link to Kaggle contained in HTML file). 

For now, my data watch has ended!

Code can be found at https://github.com/bhadi26/game-of-thrones 

Seattle Pets!

April 25, 2018

As a pet lover, I was excited when I found some data on pet licenses in Seattle. Shout out to Kaggle for serving as an excellent repository for fun data sets to play with.  Lately I've been particularly interested in spatial analysis/playing with maps, so I was also looking for a data set that had some geographic attributes.

Click the link below to view!

I've heard good things about data.gov, but I personally haven't found data sets that I'm interested in, but perhaps that would be a good source of data for a future post.

For this month's blog post, I am trying to find the balance between producing posts entirely in R Markdown (then attaching the PDF or HTML output), or manually copying the desired results and commentary into the post.  Since I'm a big fan of reproducibility, I lean toward linking the R markdown output that I'm hosting on my github page so that if there are any changes/corrections, the blog post would be pointing to the most up to date version! 

The downside of that approach is that I believe it's more user friendly from a blog perspective to only have to go to one page to read content (vs. clicking on another link).  I imagine my preferences will continue to evolve as I continue to blog.  I also want to play more with Github pages as this may meet my need. Until then, I've linked a hosted version of my HTML output file that's on my github.

Lights, Camera, Analysis 

April 5, 2018 

I recently completed a course on data analysis and modeling using R.  I thought the final project for the class was a fun challenge in finding a data set, formulating a question that can be answered using data, then analyzing that question.


For my project, I found a data set on Kaggle on movies with features such as genre, votes, budget, and revenue.  I was then curious on to what extent these features could model the probability that a movie would be profitable.  More detail on my project can be found below!  The Shiny app is particularly fun to play around with.  Even though this data set was published to Kaggle, a significant portion of this project involved cleaning and transforming the data before any modeling actually took place.

The main principles I took away from this class were those of validity and reproducibility.  

Validity seems obvious, but it's an important principle to keep in mind when using data to answer a question.  Do these data accurately measure the question at hand?  What limitations exist within the data set and what impact does that have on our ability to measure the question? 

The second principle of reproducibility is one I feel can't be emphasized enough.  Not only does creating reproducible code and processes make it easier to understand your work as time goes by, it improves collaboration since others can understand your assumptions and method of analysis. This is a principle I try to employ in my personal and professional work.  


Pokemon Go - Gyms in Seattle (i.e. having fun with ggmap)

March 18, 2018 


Remember the summer of 2016 when everyone had the urge to go outside, get some exercise, and observe wildlife?   I remember something close to that. Except... the wildlife were Pokemon and the exercise was an unintended consequence of searching the city for their nests.


I recently completed an R course on data analysis and modeling that featured a lab on analysis of spatial data with the ggmap package.  I was blown away by how simple it was to play with google maps (thanks Google API!) data and layer points on top of it.  Now that the course has ended, I wanted to see what lat/long data I could easily find online to plot using ggmap.


That's where ggmap meets Pokemon Go in this blog post.  I found this website that contained the latitude and longitude of Pokemon Go gyms in the Seattle area.  Using the archaic method of copying/pasting, I created an excel spreadsheet of the data on the site.  Since I'm just playing around with this data I'm OK with that approach, but ideally I could have found a more automate way to export the data or access it within R directly via an API or some other means.


Having missed out on the height of Pokemon Go's popularity because the smart phone I had at the time couldn't handle the UI, I am not sure if the data I downloaded is an exhaustive set. I'm also not familiar with the locations of gyms vs. other popular sites in the Pokemon Go realm.  My purpose here is to play with adding data to a ggmap object, I'm not bothered by these shortcomings.


Here is the first map I created that contained all of the data points. I was surprised that a "Seattle" dataset had points closer to Renton (and seemingly none in between?).   


To get a better glimpse of the downtown region, I altered the zoom in the ggmap.


In this view, there don't appear to be any gyms outside the Downtown/Queen Anne area. Ideally I would be able to zoom in more, but I'm not sure how to change the center of the map and increase the zoom without cutting off more points.  It's surprising that there aren't any gyms in Fremont/U-District/Ballard neighborhoods (or any other neighborhoods of Seattle, really), so that makes me question the integrity of this data set.  Perhaps the initial points in Renton were entered incorrectly? Perhaps there is another page that contains the data for other gyms in the city of Seattle?


The code and data and I used to create these maps can be found here: https://github.com/bhadi26/pokemon-go