Building production-ready Generative AI apps on AWS—Lessons from bringing AI into ZEPIC’s platform

Bharathi Kannan Ravikumar

Co-founder, Head of Engineering
December 26, 2024

Hey tech folks!

Like many of you, we at ZEPIC have been exploring how to architect AI into the foundation of our product. As the co-founder and head of engineering, I've had the privilege of leading this transformation. Having worked with AI technologies for over a decade, well before the current Gen-AI revolution, I've witnessed how the AI landscape has evolved and the massive opportunity it now presents.

Along the way of building our AI engine, we’ve made our share of mistakes to understand what truly works in production. I wanted to share some practical insights and lessons from building Gen AI applications on AWS. If you're considering a similar path, here’s what we've discovered— Read along and I hope it helps.

Core Technology Decisions

Programming Language and Frameworks

The choice of Python was a no-brainer due to the wide range of libraries available for machine learning and generative AI purposes. For frameworks, we evaluated several options, including LlamaIndex, Langchain, and Autogen. For web server implementation, FastAPI proved to be our best choice.

AWS Offerings

Before settling on AWS Bedrock, we had evaluated several AI providers including OpenAI, Anthropic, and Cohere. We went with AWS Bedrock for its wide range of models, high availability, and robust guardrails. Here is why I believe that AWS's offerings are better suited for building generative AI applications—

1. Bedrock: Bedrock's strength lies in its diverse model selection, allowing us to choose the right model for specific use cases. Some of the popular models include Claude models for complex reasoning, LLama models for general-purpose tasks, Cohere for embeddings, and Jurassic for specialized applications. We loved Bedrock's flexibility, where you can even import custom models from HuggingFace to suit your specific needs.

Source

2. SageMaker: For custom model deployment and training

3. Guardrails: For security and content control

Choosing the Right Model

Start with a smaller model like LLama 3.1 (available in 8B, 70B, 405B versions). While Claude Sonnet is the most intelligent model available in the AWS marketplace, remember that smaller models mean less latency and lower costs. 

Production Challenges and Solutions

When selecting an AI provider, we found three factors to be critical—the range of available models, security features, and system availability. Here is how we addressed each of these in our production environment:

Security with AWS Guardrails

Security sits at the heart of ZEPIC's AI architecture. We really wanted to ensure that our ZENIE AI is secure. So, we built an app firewall with AWS Guardrails, which masks PII data, prevents prompt injection, enables response grounding, filters profanity, and blocks violence-related content.

High Availability

AWS supports cross-region inference for certain models, but this is limited. To overcome this, we built our custom model route algorithm which distributes the load across multiple regions. This includes deploying Claude instances in both US and Europe, implementing round-robin traffic routing between regions. For enterprises with predictable, high-volume needs, AWS offers provisioned throughput, but comes with a hefty price tag—one model unit for Sonnet costs $29,262.

Output Control and Optimization

Most of the time, we want the model to return structured output. We use prompt techniques by providing examples, format instructions, and adding fallback mechanisms to make the model output as deterministic as possible.

Prompt Caching

If you're using Anthropic models, try implementing prompt caching to improve performance and save money.

Model Deployment

For high-volume traffic, especially if you're a dedicated GenAI company, consider deploying your own models using SageMaker. For example, for image generation, you can deploy Stable Diffusion.

How to build advanced AI capabilities?

Simple prompting doesn't always meet complex implementation needs. When working with custom implementations, particularly where specific output formats are important, you might need to consider fine-tuning.

Fine-Tuning

While most use cases can be solved with good prompt engineering, some scenarios require fine-tuning your model. AWS offers two ways to implement fine-tuning:

  • Full Fine-tuning: While powerful, this requires substantial computational resources and a large dataset.
  • Performance Efficient Fine-tuning (PEFT/LORA): A more resource-efficient approach available for certain models on AWS, reducing computational costs while maintaining performance.
A simple flow of fine-tuning a model(Source)

RAG (Retrieval Augmented Generation)
For some use cases, we wanted to build AI capabilities on top of our own data—be it documents, knowledge base articles, or other content. This is where RAG (Retrieval Augmented Generation) comes in. As the name suggests, it augments the model's responses by retrieving relevant information from our data sources.

These were our key considerations for RAG implementation:

  • Choose the right embedding model (we use Cohere)
  • Select the appropriate vector database (we evaluated Milvus, Weaviate, ES)
  • Implement an effective chunking strategy
  • Focus on retrieval and reranking

A typical RAG implementation(Source)
Our Production Journey

Organizational Structure

We established a dedicated team focused on building AI solutions for the long term. We are also educating our team members about GenAI and providing access to tools like Amazon Q and GitHub Copilot to developers and QA.

Current Implementation

Our production journey has been focused and iterative, with our entire infrastructure managed through Terraform CDK to ensure consistency and reproducibility across deployments. Implementation time has been surprisingly reasonable, particularly for API-based integrations, allowing us to move faster while maintaining reliability.

The decision to build everything with API-first architecture has been the right choice for us. Our trial customers have responded positively, and our approach has gained recognition—ZEPIC was ranked as a top AI marketing tool on Good AI Tools immediately after launch, and continues to be featured on leading AI directories including The Rundown AI, Futurepedia, AI Tools Inc, and The Neuron.

Looking back at our journey, one thing is clear: Generative AI isn't just another hype—AI can now understand text, multiple languages, audio, video, images, and more. The key is starting small and growing based on actual needs. Remember—before jumping into fine-tuning or complex implementations, do what you can with prompt engineering. Focus on security and availability, and let real user needs guide your expansion.

Hope these pointers help you out - Good luck with your AI journey! If you have any thoughts, please feel free to share.

Desperate times call for desperate Google/Chat GPT searches, right? "Best Shopify apps for sales." "How to increase online sales fast." "AI tools for ecommerce growth."

Been there. Done that. Installed way too many apps.


But here's what nobody tells you while you're doom-scrolling through Shopify app reviews at 2 AM—that magical online sales-boosting app you're searching for? It doesn't exist. Because if it did, Jeff Bezos would've bought (or built!) it yesterday, and we (fellow eCommerce store owners) would all be retired in Bali by now.


Growing a Shopify store and increasing online sales isn’t easy—we get it. While everyone’s out chasing the next “revolutionary” tool/trend (looking at you, DeepSeek), the real revenue drivers are probably hiding in plain sight—right there inside your customer data.
After working with Shopify stores like yours (shoutout to Cybele, who recovered almost 25% of their abandoned carts with WhatsApp automation), we’ve cracked the code on what actually moves the needle.


Ready to stop app-hopping and start actually growing your sales by using what you already have? Here are four fixes that will get you there!

Fix #1: Convert abandoned carts instantly (Like, actually instantly)

The Painful Truth: You're probably losing about 70% of your potential sales to cart abandonment. That's not just a statistic—it's real money walking out of your digital door. And looking for yet another Shopify app for abandoned cart recovery isn't going to fix it if you're not getting the fundamentals right.

The Quick Fix: Everyone knows you need multi-channel recovery that hits the sweet spot between "Hey, did you forget something?" and "PLEASE COME BACK!" But here's the reality—most recovery apps are a one-trick pony. They either do email OR WhatsApp, not both. And don't even get us started on personalizing offers based on cart value—that usually means toggling between three different dashboards while praying your apps talk to each other.

Enter ZEPIC: This is where we come in. With ZEPIC's automated Flows, you can:
Launch WhatsApp recovery messages (with 95% open rates!)
Set up perfectly timed email sequences (or vice versa)
Create personalized recovery offers not just on cart value but based on your customer’s behavior/preferences
Track and optimize everything from one dashboard

Fix #2: Reactivate past customers today

The Painful Truth: You're probably losing about 70% of your potential sales to cart abandonment. That's not just a statistic—it's real money walking out of your digital door. And looking for yet another Shopify app for abandoned cart recovery isn't going to fix it if you're not getting the fundamentals right.

The Quick Fix: Everyone knows you need multi-channel recovery that hits the sweet spot between "Hey, did you forget something?" and "PLEASE COME BACK!" But here's the reality—most recovery apps are a one-trick pony. They either do email OR WhatsApp, not both. And don't even get us started on personalizing offers based on cart value—that usually means toggling between three different dashboards while praying your apps talk to each other.

Enter ZEPIC: This is where we come in. With ZEPIC's automated Flows, you can:
Launch WhatsApp recovery messages (with 95% open rates!)
Set up perfectly timed email sequences (or vice versa)
Create personalized recovery offers not just on cart value but based on your customer’s behavior/preferences
Track and optimize everything from one dashboard

Offering light at the end of the tunnel is Google’s Privacy Sandbox which seeks to ‘create a thriving web ecosystem that is respectful of users and private by default’. Like the name suggests, your Chrome browser will take the role of a ‘privacy sandbox’ that holds all your data (visits, interests, actions etc) disclosing these to other websites and platforms only with your explicit permission. If not yet, we recommend testing your websites, audience relevance and advertising attribution with Chrome’s trial of the Privacy Sandbox.

Top 3 impacts of the third-party cookie phase-out

Who’s impacted

How

What next

Digital advertising and
acquisition teams
Lack of cookie data results in drastic fall in website traffic and conversion rate
Review all cookie-based audience acquisition. Sign up for Chrome’s trial of the Privacy Sandbox
Digital Customer Experience
Customers are not served relevant, personalised experiences: on the web, over social channels and communication media
Multiply efforts to collect first-party customer data. Implement a Customer Data Platform
Security, Privacy and Compliance teams
Increased scrutiny from regulators and questions from customers about data storage and usage
Review current cookie and communication consent management, ensure to align with latest privacy regulations