URLs & PDFs
Pass in a URL somewhere in the prompt. This will get crawled and scraped. You can also specify page tags/elements for better accuracy. Consider this prompt:
user_content='https://www.houzz.com/professionals/landscape-contractors/southern-turf-co-nashville-pfvwus-pf~382005468 {contractor name, phone_number, website_url}, website_url should be the pro/contractor/business site. Targeting <main>'
This is likely to be faster and more accurate than simply relying on the full page scrape.
Additionally, max_tokens requires special treatment: for URLs, PDFs, YouTube links, etc max_tokens dictates TOTAL tokens including all scraped content plus the final output. This will be an approximate token boundary, but you must pass at least 6000 + desired response max to allow for crawled data.
You may require a larger max token count to ensure adequate crawled input data depending on desired response.
Due to built in timeouts, super long-running requests (typically in the >30-40k range) are handled best via HTTP request, not an OpenAI package (but the endpoint schema etc is the same).