-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Add Gemini Vision interactive CLI tool for image analysis #1105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
emredeveloper
commented
Mar 30, 2025
- Create interactive CLI tool using smolagents and Gemini Vision API
- Add image analysis, code extraction, and comparison features
- Include documentation and requirements
- Implement user-friendly command interface
- Support code detection and automatic file saving
- Create interactive CLI tool using smolagents and Gemini Vision API - Add image analysis, code extraction, and comparison features - Include documentation and requirements - Implement user-friendly command interface - Support code detection and automatic file saving
…lity - Implement focused CodeAgent with strict system instructions - Add screenshot capture and analysis capabilities - Improve tool interaction with more deterministic behaviors - Fix issue where tools would perform unintended additional actions - Add auto-correction for common command syntax - Update documentation with improved usage examples - Support direct function calling in CodeAgent mode
analyze_screenshot | ||
] | ||
|
||
# Özel bir sistem prompt'u tanımla |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some Turkish comments here and there: please translate them to English!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did
This commit translates all Turkish comments in the gemini_vision_agent.py file to English to maintain consistent documentation and improve code readability for international developers. Changes include: - Translating temperature reduction comment - Translating system prompt definition comment - Translating max_steps adjustment comment - Translating base_tools configuration comment - Translating system_prompt parameter comment The code functionality remains unchanged; this is purely a documentation improvement.
verbosity_level=LogLevel.INFO, | ||
max_steps=7, # Reduce processing steps | ||
add_base_tools=False, # Disable basic tools to focus on vision capabilities | ||
system_prompt=system_prompt # Add custom system instructions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you test the agent with latest version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've missed applying some of the tests; I'll fix them and get back to you with feedback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make quality
ruff check examples src tests utils
All checks passed!
ruff format --check examples src tests utils
64 files already formatted
python utils/check_tests_in_ci.py
✅ All good!
gemini_vision_agent.py code is not the source of these issues.
This commit fixes formatting issues in the Gemini Vision Agent example to follow the smolagents code style guidelines. The following changes were made: - Fixed whitespace in blank lines within docstrings - Applied proper quotation style (double quotes instead of single quotes) - Adjusted spacing around operators and commas - Improved indentation consistency - Added trailing comma in multi-line collections - Fixed line breaks according to ruff formatting rules All quality checks are now passing: ruff check and ruff format.
@emredeveloper please test your PR with the latest version of smolagents. In the current state it's clear you didn't, i'll let you try and find out why 😉 |
- Modified CodeAgent initialization to be compatible with smolagents 1.13.0.dev0 - Removed system_prompt parameter and used description instead - Fixed test suite by properly mocking dependencies - Ensured display_image test passes by using effective mocking - Fixed test_create_smolagent to work with the current codebase - Optimized imports and error handling - Updated requirements.txt to use smolagents version 1.12.0
- Modified CodeAgent initialization to be compatible with smolagents 1.13.0.dev0 - Removed system_prompt parameter and used description instead - Fixed test suite by properly mocking dependencies - Ensured display_image test passes by using effective mocking - Fixed test_create_smolagent to work with the current codebase - Optimized imports and error handling - Updated requirements.txt to use smolagents version 1.12.0
@aymeric-roucher I think I've understood and solved it, hopefully, I'm not wrong, haha |
@emredeveloper that's the direction! The system_prompt argument is deprecated. But this is good news, because you don't need to change the system prompt at all! Also since your agent it not a managed agent, it does not need a |
Also I only did superficial checks but deeper, we support using a VLM as the main model for a We prefer to highlight efficient/short setups in examples, so if you want to add one, it would be better to just show an agent natively using gemini to analyse images, as in: your |
@aymeric-roucher okay... |