Creating an AI-Powered Chrome Extension: A Comprehensive Guide
Written on
Chapter 1: Introduction to AI in App Development
Hello there! 👋
Are you enthusiastic about artificial intelligence? Do you want to develop AI-driven applications and see practical examples of its capabilities? This article is tailored just for you!
I'm genuinely excited about the recent breakthroughs in AI and eager to start building innovative applications. There's nothing quite like creating tools that I can personally benefit from. To kick things off, I've put together a collection of ideas to automate some of my daily tasks. Recently, I built a Chrome extension that can accomplish one of these tasks in mere seconds—it's truly incredible! 🤯
As a technical content creator, generating engaging social media posts for my articles can be quite laborious. I often have to distill my content, pinpoint key messages, choose the right emojis for visual impact, and carefully select my wording to grab attention. However, tools like ChatGPT and Gemini can efficiently handle these tasks in no time, often producing results that surpass my own efforts.
This scenario exemplifies a fantastic use case for AI, allowing me to learn while automating tasks, thus freeing me to focus on what truly matters: crafting exceptional content! ❤️
In this guide, I will walk you through the entire process, from concept to execution, of building an AI-powered Chrome extension designed to generate engaging social media posts from articles:
→ 🧰 Required Tools
→ 🧠 The Extension’s Brain
→ 🌈 User Interface Design
→ 🏗️ Bringing Everything Together
→ 🚀 Extension Deployment
→ 💰 Cost Overview
→ 👀 Demonstration
By the end of this guide, you'll understand how to leverage Generative AI and various tools to create your own AI-enabled applications. You don’t need to be an expert in AI or LLMs; all you need is enthusiasm and motivation!
This extension is open-source, and contributions to enhance its functionality are warmly welcomed. ⚠️ This article is quite extensive but packed with excitement. I appreciate your patience as we explore each section!
Section 1.1: Required Tools
Many fantastic tools have been developed by creative minds, making my journey smooth and efficient:
LangChain:
This widely-used framework offers excellent documentation with various applications. It can be utilized with either Python or JavaScript. I see it as a series of interconnected puzzles. For instance, if we want to summarize a PDF, we can employ a document loader from LangChain, an LLM model (such as Gemini, OpenAI, or Llama), and use the load summarize chain function—all working together to create a summary. We can even introduce another puzzle, the PromptTemplate, to instruct the LLM model with directives like "write a concise summary, maximum of 200 tokens."
Large Language Models (LLMs):
LLMs are a type of AI capable of generating text, translating languages, crafting various types of creative content, and informally answering questions. For this extension, I utilized Gemini and OpenAI models, which analyze my articles and generate creative social media content!
Additional Tools:
I leveraged Gemini, ChatGPT, and DALL·E to learn Python (my first experience coding in this language 😅), develop the extension's UI, create prompts, and design the extension's logo. I utilized Google Colab for Python coding and Google Cloud Functions to deploy the extension's brain. For the UI design, I chose Canva, although Figma might be better suited for this task, I'm just more accustomed to Canva for quick mockups.
Section 1.2: The Extension’s Brain
Now, let's delve into the most thrilling aspect: developing the extension's brain! We will break it down into three main phases:
Content Extraction:
For the initial version of this extension, I focused on two blogging platforms: Medium and dev.to. I used Python's regex library to define the domains from which content would be sourced. For extracting text from dev.to, I employed BeautifulSoup4. Unfortunately, scraping content from Medium posed a challenge, so I utilized the Medium Rapid API instead.
Content Summarization (Optional):
If the extracted text is lengthy, I use AI to summarize it 🪄. This ensures I stay within the maximum token limit that the LLM can process. In the future, I anticipate that this step will be less necessary as LLMs evolve to support larger token counts. Currently, I rely on Gemini’s next-gen model, which can handle up to 1 million tokens consistently, allowing me to streamline the post generation process.
Post Generation:
Finally, I present the article or its summary to the AI models and let them work their magic.
Before proceeding, we’ll need API keys for Gemini, OpenAI, and the Medium Rapid API.
Section 1.3: Crafting High-Quality Prompts
The way we communicate with an LLM significantly affects the quality of its responses; a well-crafted prompt leads to a high-quality answer! Through extensive research, article reading, video watching, and plenty of experimentation, I’ve learned that a high-quality prompt should include specific elements arranged in a particular order:
- Persona: Define the character you want the model to emulate based on the task.
- Goal: Specify what you need from the AI (begin with a verb).
- Context: Explain why you require this information.
- Source: Mention which information sources or examples the LLM can reference.
- Format: Visualize the desired output and specify its format (paragraphs, bullet points, etc.).
- Tone: Use tone descriptors (e.g., friendly, confident) to guide the AI.
Order of Importance: Persona, Task/Goal, Context, Sources/Exemplars (optional), Format, Tone.
Section 1.4: User Interface Design
Let’s break down the UI creation for my extension. I began with a basic mockup using Canva.
Next, I harnessed AI! I provided a description to DALL·E and was amazed by the logo options it generated. One of them perfectly encapsulated the essence of the extension: transforming articles into social media posts with the help of AI.
Finally, I transitioned to coding the UI. I utilized large language models like Gemini and ChatGPT to generate the code based on my design, ensuring that semantic HTML and accessibility were prioritized.
I took the generated code and made some adjustments, adding hover effects for posts and loading animations for a seamless user experience. I also selected fonts by prompting ChatGPT to suggest suitable options based on current trends and the nature of the extension.
Section 1.5: Bringing Everything Together
I deployed the extension's brain on Google Cloud Functions, adding several elements to the code: a section for retrieving secrets, error management with HTTP status codes, framework annotations for the main function, and handling CORS origins.
To build the extension, the following components are necessary:
- Manifest file to configure the extension, its name, description, UI, web worker, etc.
- Web worker (background.js)
- Popup HTML, CSS, and JS for the extension's UI and interactions.
[Insert explanation video here]
Section 1.6: Extension Deployment
It’s time to publish the extension on the Google Store. Here are the steps to follow:
- Create a Google developer account (if you don’t have one, you can register for a one-time fee of $5).
- Upload a zip file containing all extension files.
- Provide a description of the extension’s goals and explain each permission requested (like the activeTab).
- Upload a demo video and screenshots.
- Create a privacy policy (I used Google Sites for a quick and efficient solution).
- Finally, save and submit for review (it took about a day to receive feedback from the Google team).
And voilà! 😁 The ArticleToPosts extension is now available on the Chrome Web Store!
🔗 Get the extension
Section 1.7: Cost Overview
I cannot provide an exact cost since I didn’t solely use OpenAI tokens for this extension, but here are the main expenses involved:
- The Google developer account has a one-time fee of $5.
- The OpenAI API incurs costs for tokens, but it is reasonably priced.
- The Gemini API is free for the time being, thanks to Google 🙏. However, I needed a VPN to access it from France.
- The Medium Rapid API is free for a limited number of calls; once exceeded, charges apply. I ended up paying around $4 due to my excitement about the results 😅.
- The Google Cloud Function is also free within certain limits, but I did opt for additional memory, leading to a charge of $0.07 at the time of writing this.
The extension itself is free for users, so enjoy! 😊 I would appreciate your ratings and comments on the Google Store. 🙏
Chapter 2: Demo and Conclusion
Here it is! As you can see, the output maintains a clear structure while being both engaging and informative. Gemini and OpenAI effectively highlight key takeaways with emojis, often suggesting fitting emojis for each main idea. This saves me time compared to selecting them myself, and I find their selections to be quite apt.
After experimenting with both models, I’ve discovered that Gemini's output often meets my expectations better than OpenAI's. However, neither consistently follows all my instructions, such as including the hyperlink for the "Read more" section. Nonetheless, I’m pleased with the results.
What’s fascinating about using distinct models is the ability to observe the differences in their outputs due to variations in their training data. I can choose one or test both by posting the Gemini output on LinkedIn and the OpenAI output on X, for instance, to gauge which one garners more engagement. 🤭
If you’ve made it this far, congratulations! 🙌 Your patience and interest in AI are commendable. AI is not just a trend; it's the future, and I hope you’ve enjoyed this journey. I encourage you to keep exploring and creating AI-powered applications, contributing to the growth of this captivating field.
Thank you! ❤️
That wraps things up for today! If you have any questions or feedback, feel free to comment or reach out to me on LinkedIn—I’m always open to discussions!
Want to buy me a coffee? ☕️
If you enjoyed this article, please clap 👏, share 🔗, and subscribe 🔔 to stay updated on my latest posts.
Let’s connect on Medium, LinkedIn, Facebook, Instagram, YouTube, or Twitter!