SPARKSPARKSPARKSOLVESSPARK
    • CHAT ONLINE!
    • WELCOME
    • PACKAGES
    • TEAM TOOLS
    • BLOGS
    • CONTACT
    • 312-315-7059
      Text Support

      ໐pēຖคi

      • AI: Projects must prove their worth to anxious boards or risk defunding, and LLMs will go small to reduce operating costs and environmental impact.
      • Cybersecurity: “materiality” will become a reality, but political uncertainty across the globe will usher in a new era of regulatory discord and complexity.
      • Resilience: Organisations will collaborate in new and creative ways with their vendors to avoid getting caught up in the next mass outage or breach.
      • Focus on ROI: AI projects must prove their value to avoid funding cuts. Practical applications and measurable returns from AI investments are in demand.
      • LLM Optimisation: Large language models (LLMS) will become more efficient. Scaling down LLMs reduces operational expenses and environmental impact. This suggests a move toward sustainable and cost-effective AI. 


      OpenAI o3 and o4-mini

      The latest in a o-series of models trained to think for longer before responding. These are the smartest models we’ve released to date, representing a step change in ChatGPT's capabilities for everyone from curious users to advanced researchers.

      For the first time, reasoning models can agentically use and combine every tool within ChatGPT—this includes searching the web, analyzing uploaded files and other data with Python, reasoning deeply about visual inputs, and even generating images. Critically, these models are trained to reason about when and how to use tools to produce detailed and thoughtful answers in the right output formats, typically in under a minute, to solve more complex problems. This allows them to tackle multi-faceted questions more effectively, a step toward a more agentic ChatGPT that can independently execute tasks on your behalf.

      The combined power of state-of-the-art reasoning with full tool access translates into significantly stronger performance across academic benchmarks and real-world tasks, setting a new standard in both intelligence and usefulness.

      What’s changed

      OpenAI o3 is most powerful reasoning model that pushes the frontier across coding, math, science, visual perception, and more. It sets a new SOTA on benchmarks including Codeforces, SWE-bench (without building a custom model-specific scaffold), and MMMU. It’s ideal for complex queries requiring multi-faceted analysis and whose answers may not be immediately obvious. It performs especially strongly at visual tasks like analyzing images, charts, and graphics. In evaluations by external experts, o3 makes 20 percent fewer major errors than OpenAI o1 on difficult, real-world tasks—especially excelling in areas like programming, business/consulting, and creative ideation. Early testers highlighted its analytical rigor as a thought partner and emphasized its ability to generate and critically evaluate novel hypotheses—particularly within biology, math, and engineering contexts.

      OpenAI o4-mini is a smaller model optimized for fast, cost-efficient reasoning—it achieves remarkable performance for its size and cost, particularly in math, coding, and visual tasks. It is the best-performing benchmarked model on AIME 2024 and 2025. Although access to a computer meaningfully reduces the difficulty of the AIME exam, we also found it notable that o4-mini achieves 99.5% pass@1 (100% consensus@8) on AIME 2025 when given access to a Python interpreter. While these results should not be compared to the performance of models without tool access, they are one example of how effectively o4-mini leverages available tools; o3 shows similar improvements on AIME 2025 from tool use (98.4% pass@1, 100% consensus@8).

      In expert evaluations, o4-mini also outperforms its predecessor, o3‑mini, on non-STEM tasks as well as domains like data science. Thanks to its efficiency, o4-mini supports significantly higher usage limits than o3, making it a strong high-volume, high-throughput option for questions that benefit from reasoning. External expert evaluators rated both models as demonstrating improved instruction following and more useful, verifiable responses than their predecessors, thanks to improved intelligence and the inclusion of web sources. Compared to previous iterations of our reasoning models, these two models should also feel more natural and conversational, especially as they reference memory and past conversations to make responses more personalized and relevant.

      Share
      54

      Related posts

      2025

      CISCO 2025


      Read more
      2025

      Trevor


      Read more
      2024

      Social Media Platforms


      Read more