One of my goals when I migrated my personal site from Wordpress to a static site was to write more often. I was happy with how quickly I was able to bang out a few posts but soon ran out of topics that I wanted to write about. I made a list to hold ideas as they came to me, but I haven’t added anything in months.
I recently came across a question that I couldn’t get answered within a few quick Google searches, so I had to figure it out myself. This happens a couple of times a month, whether because what I’m searching for is novel, or niche, or simply phrased differently from how an expert might approach the topic. Perhaps this presents opportunities to publish content for searches that are result poor, and that can drive some site traffic.
So I decided to dig into my Google search history to see topics that I have searched for in the past, but for which I had to discover the answers on my own. Here’s how:
- Go to Google’s Account Takeout feature
- In the “Select data to include” section, under the “Products” header, click Deselect all
- Select “My Activity”, click on the “all activity data selected”, deselect all, then select the products you are interested in. I chose just “Search” for this proof-of-concept, but you may want to also include Books, Image Search, Shopping, Video Search and YouTube
- Click on the “Multiple formats” button, and select JSON instead HTML for the “Activity records” option.
- Select “Next Step”, one-time export, create archive, you’ll get an email. The default format is HTML, but if you are going to do anything interesting with your data, I recommend selecting JSON as your format.
You’ll receive an email alerting you that a data archive request has been requested, then a second once it’s available for download.
Here’s some simple playing I did as part of the proof-of-concept.
import pandas as pd from PIL import Image from wordcloud import WordCloud, ImageColorGenerator import matplotlib.pyplot as plt # Read the JSON into a data frame, convert the time column into datetime # and add a flag that distinguishes between searches and other activity. # You could also dump the JSON back to a CSV if you want to analyze in Excel. activity = pd.read_json('Takeout/My Activity/Search/MyActivity.json') activity['time'] = pd.to_datetime(activity['time']) activity['searches'] = activity['title'].apply(lambda x: 'Yes' if x[:13]=="Searched for " else "No") # Create a temporary copy of the activity DF, but only include # actual searches since the beginning of the year. # Strip off the first 13 chars from the title to leave just the query, # then put the queries in a list. _ = activity[(activity['searches']=='Yes') & (activity['time'] >= "2019-01-01")].copy() _['query'] = _['title'].apply(lambda x: x[13:]) queries = list(_['query']) # Join each query into a single string, then run it through WordCloud. text = " ".join(query for query in queries) wordcloud = WordCloud(max_font_size=50, max_words=25, background_color="white").generate(text) plt.figure() plt.imshow(wordcloud, interpolation="bilinear") plt.axis("off") plt.show() wordcloud.to_file("my_search_history.png")
And the result:
A couple of notes:
- If you download data from multiple products, they will be zipped up into separate folders by product. You could step through the folders and join them into a single data frame if so inclined, or look at them separately. For instance, if also conduct a lot of YouTube searches looking for “how-to” content. I expect video content for these queries, so I might mine video search and YouTube data for ideas of video content to create, but may not find much blog-worthy ideas there.
- I use separate multiple Google accounts: one personal, one each for each business email domain I have. I have created separate Chrome profiles for each account. I can mine my history for each account to find more specific types of activities. For example, the above is from my personal account. But when I look at the search history tied to my main work account, I find a lot more searches about Python, data science, and other platforms my company uses.