Overview
DataGen provides powerful AI tools for content analysis, structured data extraction, and intelligent processing. These tools leverage advanced language models to transform unstructured data into structured formats, write content, and understand multimedia files.OpenAI-Powered Tools
Extract Structured Output
Transform unstructured content into structured data using AI-powered analysis with theextract_structured_output tool. This tool is particularly useful for classification, data extraction, and parameter extraction.
Input Parameters
Input Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
instruction_prompt | string | Yes | Instruction describing fields to extract and constraints |
content | string | Yes | The content to extract structured data from |
structured_output_type | type[BaseModel] | Yes | Pydantic model type defining the output structure |
Output Data
Output Data
Returns an instance of the provided
structured_output_type with extracted data structured according to your Pydantic model definition.Use Cases
Use Cases
Content Classification
Content Classification
Example: Classify LinkedIn comments by engagement level
Data Extraction
Data Extraction
Example: Extract person information from text
Parameter Extraction
Parameter Extraction
Example: Extract tool parameters from user queries
AI Writer
Generate high-quality content based on instruction prompts using theai_writer tool. Commonly used for writing emails, summaries, and other text content.
Input Parameters
Input Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
instruction_prompt | string | Yes | Instructions for what type of content to write |
content | string | Yes | Source content or context for writing |
Output Data
Output Data
Returns a string containing the AI-generated content based on your instructions and input content.
Use Cases
Use Cases
Email Writing
Email Writing
Example: Generate personalized emails
Content Summarization
Content Summarization
Example: Summarize research findings
Report Generation
Report Generation
Example: Create analytical reports
Extract Tool Parameters
Specialized tool for extracting parameters needed by other tools using theextract_tool_params function. This is essentially a wrapper around extract_structured_output optimized for tool parameter extraction.
Input Parameters
Input Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
instruction | string | Yes | Instructions for parameter extraction |
query | string | Yes | User query to extract parameters from |
tool_params_type | type[BaseModel] | Yes | Pydantic model defining expected parameters |
Example Usage
Example Usage
Gemini-Powered Understanding Tools
Document Understanding
Analyze and extract structured data from documents (HTML, PDF, Text) using Google’s Gemini model with thedoc_understanding_tool.
Input Parameters
Input Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
instruction_prompt | string | Yes | Instructions for data extraction and constraints |
url | string | Yes | Document URL (HTML/PDF/Text) |
file_type | string | Yes | Document type: PDF, HTML, or Text |
structured_output_type | type[BaseModel] | Yes | Pydantic model for structured output |
Supported Formats
Supported Formats
- PDF: Application documents, reports, research papers
- HTML: Web pages, online articles, documentation
- Text: Plain text files, transcripts, notes
Example Usage
Example Usage
Image Understanding
Analyze images and extract structured data using theimage_understanding_tool. Supports JPEG and PNG formats.
Input Parameters
Input Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
instruction_prompt | string | Yes | Instructions for image analysis |
url | string | Yes | Image URL (JPEG/PNG only) |
file_type | string | Yes | Image type: JPEG, JPG, or PNG |
structured_output_type | type[BaseModel] | Yes | Pydantic model for structured output |
Use Cases
Use Cases
Chart Analysis
Chart Analysis
Extract data from charts and graphs
Document Scanning
Document Scanning
Extract text and data from scanned documents
Audio Understanding
Analyze audio files and extract structured information using theaudio_understanding_tool. Supports MP3 and WAV formats.
Input Parameters
Input Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
instruction_prompt | string | Yes | Instructions for audio analysis |
url | string | Yes | Audio file URL |
file_type | string | Yes | Audio type: MP3 or WAV |
structured_output_type | type[BaseModel] | Yes | Pydantic model for structured output |
Use Cases
Use Cases
Meeting Transcription
Meeting Transcription
Extract key information from meeting recordings
Customer Call Analysis
Customer Call Analysis
Analyze customer service calls
Video Understanding
Analyze YouTube videos and extract structured data using thevideo_understanding_tool. Currently supports YouTube URLs only.
Input Parameters
Input Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
instruction_prompt | string | Yes | Instructions for video analysis |
url | string | Yes | YouTube video URL |
structured_output_type | type[BaseModel] | Yes | Pydantic model for structured output |
Supported URLs
Supported URLs
youtube.com/watch?v=...youtu.be/...youtube.com/shorts/...
Use Cases
Use Cases
Content Analysis
Content Analysis
Extract key information from educational or business videos
Competitive Analysis
Competitive Analysis
Analyze competitor product demos
Configuration & Performance
Model Configuration
- OpenAI Tools: Uses GPT-5 series for optimal balance of speed and accuracy
- Gemini Tools: Uses Gemini for multimodal understanding
- Temperature: Set to 0.1-0.2 for consistent, factual outputs
- Response Format: Enforced JSON structure for reliable parsing
Rate Limits & Usage
- Rate Limiting: 60 calls per minute across all AI tools
- Daily Limits: 1,000 calls per day per tool
- Credit System: 1 credit per request
- Caching: 1-hour TTL for repeated queries
- Retry Logic: 3 attempts with exponential backoff
Best Practices
Effective Prompting
Guidelines for Better Results:
- Be specific and detailed in instructions
- Provide clear examples when possible
- Define constraints and expected formats
- Use consistent terminology
- Test with sample data first
Pydantic Models
Model Design Tips:
- Use descriptive field names
- Add field descriptions and constraints
- Include default values where appropriate
- Use appropriate data types
- Consider optional vs required fields
Error Handling
Robust Implementation:
- Validate input parameters
- Handle API timeouts gracefully
- Implement fallback strategies
- Log errors for debugging
- Test edge cases thoroughly
Performance Optimization
Efficiency Tips:
- Cache repeated operations
- Batch similar requests
- Use appropriate model sizes
- Optimize prompt length
- Monitor usage patterns
Integration Examples
Multi-Step Processing Pipeline
Content Processing Workflow
Troubleshooting
Model validation errors
Model validation errors
Common causes:
- Pydantic model doesn’t match extracted data structure
- Required fields are missing from the content
- Data types don’t match field definitions
- Review and adjust your Pydantic model
- Make fields optional when data might be missing
- Add validation and default values
- Test with simpler models first
Timeout errors with large files
Timeout errors with large files
Optimization strategies:
- Reduce file size when possible
- Use more specific instruction prompts
- Process in smaller chunks
- Increase timeout settings for large files
- Documents: Recommended < 10MB
- Images: Recommended < 5MB
- Audio: Recommended < 25MB
API key configuration issues
API key configuration issues
Setup requirements:
- OpenAI tools require
OPENAI_API_KEY - Gemini tools require
GEMINI_API_KEYorGOOGLE_API_KEY - Ensure API keys have proper permissions
- Check rate limits and quotas
What’s Next?
Web Research
Combine AI tools with web research capabilities
LinkedIn Tools
Use AI tools to process LinkedIn data
Use Cases
See real-world AI automation examples
Deployment
Deploy AI workflows as production APIs