Talkative Photo: Designing a Scalable Image Analysis and Voice Explanation App with Python

Bilal Tonga
3 min readDec 5, 2023
The Sixth Sense (1999)


Imagine a world where pictures not only capture our attention but also tell their stories. Introducing “Talkative Photo,” an app that combines image analysis with voice explanation. This series explores the development of a scalable Python app using microservices architecture, integrating GPT-4 with Vision, OpenAI Text to Speech, and other features. This journey will explore software design and practical applications of this new technology.

System Overview

Our application is a harmonious blend of several interconnected microservices, each playing a unique role:

  • Authentication/Authorization Microservice
  • Image Analysis and Voice Explanation Microservice
  • Notification Service
  • Download Service
  • Postgresql Database
  • AWS S3 for Storage
  • RabbitMQ Message Queue

Each microservice is designed to perform its specific task efficiently and communicate with others as needed.

Microservices Architecture: The Backbone of Our Application

Why use microservices?

Microservices bring several benefits:

1. Scalability

Microservices are much easier to scale than the monolithic approach. Developers can scale specific services, rather than an entire application, and execute customized tasks and requests together with greater efficiency.

2. Improved data security

Microservices communicate with each other through secure APIs, which can provide development teams with better data security than the monolithic method.

3. Faster development

Microservices lead to faster development cycles because developers focus on specific services that need to be deployed or debugged.

Authentication/Authorization Microservice

Tech Stack: FastAPI, OAuth2, JWT.

Why OAuth2 and JWT?: For secure, token-based authentication and to ensure horizontal scaling due to its stateless nature.

Key Features:

  • Implementing OAuth2 with JWT for secure, token-based authentication.
  • Stateless architecture to facilitate horizontal scaling.
  • Integration with identity providers like Keycloak.

Image Analysis and Voice Explanation Microservice

Tech Stack: FastAPI, integrated with OpenAI’s GPT-4 with Vision for image analysis, and text-to-speech conversion.

Scalability Secrets: Leveraging RabbitMQ to manage request loads efficiently.

Deployment and Storage: Containerization with Docker, orchestrated by Kubernetes, and data storage on AWS S3, featuring secure S3 bucket policies and data lifecycle management.

Storage on AWS S3

Integration: Utilizing Boto3 for S3 interactions.

Security: Implementing robust S3 bucket policies and IAM roles.

Data Management: S3 lifecycle policies for data retention.

Notification Service

Tech Stack: FastAPI, SMTP for email functionalities.

Scalability Strategies: Implementing non-blocking I/O models or message queues to manage high email volumes gracefully.

Download Service

Tech Stack: FastAPI and AWS S3.

Optimization Techniques: Rate limiting and caching for enhanced performance and efficiency.

UML Diagram for Visualization

To provide a clearer understanding of our system’s architecture and data flow, we’ve designed UML diagram:

Component UML Diagram

What’s Next?

This post lays the foundation for our “Talkative Photo” series. In next posts, we’ll dive deeper into each microservice, discussing the challenges, solutions, and code snippets that bring our application to life. Stay tuned for a journey that not only explores the technicalities of software design, but also uncovers the potential impact of our application in the real world.