Your Comprehensive Guide to AI Chatbots: A Programming Perspective
I’ve been involved in technology long enough that very little excites me, and even less surprises me. However, when OpenAI’s ChatGPT successfully created a functional WordPress plugin for my wife’s e-commerce site, it captured my interest. This interaction marked the beginning of a profound exploration into the capabilities of chatbots and AI-assisted programming.
Since that initial experiment, I've subjected 14 large language models (LLMs) to four rigorous real-world coding tests. This experience has revealed significant differences in performance among these chatbots. Approximately two years later, five of the 14 LLMs tested still fail to produce working plugins. This article provides an in-depth analysis of each LLM’s performance on my coding tests.
The Evaluation Criteria
Before diving into the results, it’s crucial to understand the framework of my tests. I focused on several vital programming tasks to gauge each LLM's ability to handle real-world coding challenges. The tasks include generating a WordPress plugin, creating regular expressions, debugging, and developing user interfaces.
Performance Comparison
The following table summarizes the performance of the 14 LLMs on my four coding tests:
Chatbot | Tests Passed | Price | LLM Models |
---|---|---|---|
ChatGPT Plus | 4/4 | $20/mo | GPT-4o, GPT-4, GPT-3.5 |
Perplexity Pro | 4/4 | $20/mo | Multiple LLMs |
Grok | 3/4 | Free (for now) | Grok-1 |
ChatGPT Free | 3/4 | Free | GPT-4o, GPT-3.5 |
Perplexity Free | 3/4 | Free | GPT-3.5 |
DeepSeek V3 | 3/4 | Free (API fees) | DeepSeek MoE |
Detailed Reviews of Top Chatbots
ChatGPT Plus
Price: $20/month
LLM: GPT-4o, GPT-4, GPT-3.5
Tests Passed: 4/4
ChatGPT Plus emerged as the best overall AI chatbot for coding. It successfully passed all my tests, demonstrating strong coding capabilities with a dedicated Mac app. Although one test with GPT-4o produced a dual-choice answer, a quick verification identified the correct response. I recommend the GPT-4 setting for a more consistent performance.
Perplexity Pro
Price: $20/month
LLM: GPT-4o, Claude 3.5 Sonnet, and others
Tests Passed: 4/4
Perplexity Pro is another standout, excelling in multiple LLMs and search criteria displays. Despite its lack of a dedicated desktop app and primary reliance on email logins, it offers robust coding assistance and varied research capabilities.
Grok
Price: Free (for now)
LLM: Grok-1
Tests Passed: 3/4
Initially underestimated, Grok from X (formerly Twitter) provided commendable coding support, even though it faltered on one test. It is a promising candidate for the future, backed by the AI prowess of Tesla and SpaceX.
ChatGPT Free
Price: Free
LLM: GPT-4o, GPT-3.5
Tests Passed: 3/4
ChatGPT's free version offers substantial coding assistance within its limitations, such as prompt throttling and potential downgrades to GPT-3.5 under high traffic. Despite these constraints, it performs better than many paid alternatives.
Perplexity Free
Price: Free
LLM: GPT-3.5
Tests Passed: 3/4
Perplexity’s free version excels both as a coding assistant and a research tool, with structured responses and sourced citations. This dual capability makes it valuable for programming and comprehensive research tasks.
DeepSeek V3
Price: Free (API fees)
LLM: DeepSeek MoE
Tests Passed: 3/4
DeepSeek V3, an open-source chatbot from China, managed to pass most of our coding tests efficiently. Its performance in obscure programming environments needs improvement, but it outshines competitors like Google’s Gemini and Microsoft’s Copilot.
Chatbots to Avoid for Programming
A few chatbots including Microsoft’s Copilot and Google’s Gemini did not meet the mark for reliable coding assistance. Noteworthy mentions include:
- DeepSeek R1 - Struggled with basic regex tasks despite its advanced reasoning capabilities.
- Github Copilot - Often produces incorrect code blocks, posing a risk for integration into projects.
- Meta AI and Meta Code Llama - Inconsistent results in handling straightforward programming challenges.
- Claude 3.5 Sonnet - Claimed as a programming tool but failed most of our tests.
- Gemini Advanced - Although informative for niche languages, it performed poorly in standard tasks.
Conclusion
Choosing the right AI chatbot for programming largely depends on your specific needs and budget. While tools like ChatGPT Plus and Perplexity Pro offer superior performance, their free counterparts also provide valuable assistance under certain constraints. It’s always wise to understand the limitations of each tool and choose the one best suited to your requirements.