Add Gemini Vision interactive CLI tool for image analysis #1105

emredeveloper · 2025-03-30T01:58:06Z

Create interactive CLI tool using smolagents and Gemini Vision API
Add image analysis, code extraction, and comparison features
Include documentation and requirements
Implement user-friendly command interface
Support code detection and automatic file saving

- Create interactive CLI tool using smolagents and Gemini Vision API - Add image analysis, code extraction, and comparison features - Include documentation and requirements - Implement user-friendly command interface - Support code detection and automatic file saving

…lity - Implement focused CodeAgent with strict system instructions - Add screenshot capture and analysis capabilities - Improve tool interaction with more deterministic behaviors - Fix issue where tools would perform unintended additional actions - Add auto-correction for common command syntax - Update documentation with improved usage examples - Support direct function calling in CodeAgent mode

aymeric-roucher · 2025-03-31T09:46:37Z

examples/gemini_vision_agent/gemini_vision_agent.py

+        analyze_screenshot
+    ]
+
+    # Özel bir sistem prompt'u tanımla


Some Turkish comments here and there: please translate them to English!

This commit translates all Turkish comments in the gemini_vision_agent.py file to English to maintain consistent documentation and improve code readability for international developers. Changes include: - Translating temperature reduction comment - Translating system prompt definition comment - Translating max_steps adjustment comment - Translating base_tools configuration comment - Translating system_prompt parameter comment The code functionality remains unchanged; this is purely a documentation improvement.

aymeric-roucher · 2025-03-31T10:12:02Z

examples/gemini_vision_agent/gemini_vision_agent.py

+        verbosity_level=LogLevel.INFO,
+        max_steps=7,  # Reduce processing steps
+        add_base_tools=False,  # Disable basic tools to focus on vision capabilities
+        system_prompt=system_prompt  # Add custom system instructions


Did you test the agent with latest version?

I've missed applying some of the tests; I'll fix them and get back to you with feedback

make quality
ruff check examples src tests utils
All checks passed!
ruff format --check examples src tests utils
64 files already formatted
python utils/check_tests_in_ci.py
✅ All good!

gemini_vision_agent.py code is not the source of these issues.

This commit fixes formatting issues in the Gemini Vision Agent example to follow the smolagents code style guidelines. The following changes were made: - Fixed whitespace in blank lines within docstrings - Applied proper quotation style (double quotes instead of single quotes) - Adjusted spacing around operators and commas - Improved indentation consistency - Added trailing comma in multi-line collections - Fixed line breaks according to ruff formatting rules All quality checks are now passing: ruff check and ruff format.

aymeric-roucher · 2025-03-31T11:45:22Z

@emredeveloper please test your PR with the latest version of smolagents. In the current state it's clear you didn't, i'll let you try and find out why 😉

- Modified CodeAgent initialization to be compatible with smolagents 1.13.0.dev0 - Removed system_prompt parameter and used description instead - Fixed test suite by properly mocking dependencies - Ensured display_image test passes by using effective mocking - Fixed test_create_smolagent to work with the current codebase - Optimized imports and error handling - Updated requirements.txt to use smolagents version 1.12.0

emredeveloper · 2025-03-31T12:51:16Z

@aymeric-roucher I think I've understood and solved it, hopefully, I'm not wrong, haha

aymeric-roucher · 2025-03-31T13:24:41Z

@emredeveloper that's the direction! The system_prompt argument is deprecated. But this is good news, because you don't need to change the system prompt at all! Also since your agent it not a managed agent, it does not need a description.

aymeric-roucher · 2025-03-31T13:31:57Z

Also I only did superficial checks but deeper, we support using a VLM as the main model for a CodeAgent, as shown here: so using tools to do this is more setup for no benefit.

We prefer to highlight efficient/short setups in examples, so if you want to add one, it would be better to just show an agent natively using gemini to analyse images, as in: your CodeAgent's initialization should directly take a Gemini VLM for its model argument, and no dedicated tools for image analysis: just make a tool that loads an image from a folder and adds it to memory in observations_images.

emredeveloper · 2025-03-31T16:08:47Z

@aymeric-roucher okay...

emre added 3 commits March 30, 2025 04:54

update

ba222d9

aymeric-roucher reviewed Mar 31, 2025

View reviewed changes

emre and others added 3 commits March 31, 2025 13:56

Make test :

a5ff929

gemini_vision_agent.py code is not the source of these issues.

Merge branch 'huggingface:main' into main

241df12

emredeveloper requested a review from aymeric-roucher March 31, 2025 11:08

emre added 3 commits March 31, 2025 15:48

Merge branch 'main' of https://github.com/emredeveloper/smolagents

c41dd1c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Gemini Vision interactive CLI tool for image analysis #1105

Add Gemini Vision interactive CLI tool for image analysis #1105

emredeveloper commented Mar 30, 2025

aymeric-roucher Mar 31, 2025

emredeveloper Mar 31, 2025

aymeric-roucher Mar 31, 2025

emredeveloper Mar 31, 2025

emredeveloper Mar 31, 2025

aymeric-roucher commented Mar 31, 2025

emredeveloper commented Mar 31, 2025

aymeric-roucher commented Mar 31, 2025 •

edited

Loading

aymeric-roucher commented Mar 31, 2025 •

edited

Loading

emredeveloper commented Mar 31, 2025

Add Gemini Vision interactive CLI tool for image analysis #1105

Are you sure you want to change the base?

Add Gemini Vision interactive CLI tool for image analysis #1105

Conversation

emredeveloper commented Mar 30, 2025

aymeric-roucher Mar 31, 2025

Choose a reason for hiding this comment

emredeveloper Mar 31, 2025

Choose a reason for hiding this comment

aymeric-roucher Mar 31, 2025

Choose a reason for hiding this comment

emredeveloper Mar 31, 2025

Choose a reason for hiding this comment

emredeveloper Mar 31, 2025

Choose a reason for hiding this comment

aymeric-roucher commented Mar 31, 2025

emredeveloper commented Mar 31, 2025

aymeric-roucher commented Mar 31, 2025 • edited Loading

aymeric-roucher commented Mar 31, 2025 • edited Loading

emredeveloper commented Mar 31, 2025

aymeric-roucher commented Mar 31, 2025 •

edited

Loading

aymeric-roucher commented Mar 31, 2025 •

edited

Loading