Articles on: Data, Security, Compliance

What data does Taiyō.AI use?

Taiyō.AI is built on a patent-pending Data-Mesh architecture designed specifically for infrastructure and construction. Instead of relying on a few static databases, we continuously ingest, standardize, and connect data from tens of thousands of official and third-party sources into one AI-ready layer for the industry. arXiv

At a high level, Taiyō’s Data-Mesh combines:

  • Official project and procurement sources (governments at all levels)
  • Plans and pipelines (master plans, Capital Improvement Plans, STIPs, etc.)
  • Risk, macro and hazard datasets
  • Building permits and local activity data
  • Corporate, financial, and research content
  • News, web, and satellite data
  • All standardized, linked in infrastructure knowledge graphs, and exposed to multi-agent, business-task reasoning in ConstructChat. arXiv

Below is a deeper breakdown.


1. The Taiyō Data-Mesh: core principles

Our Data-Mesh is described in detail in our technical paper, “An AI-Driven Data Mesh Architecture Enhancing Decision-Making in Infrastructure Construction and Public Procurement.” arXiv

Three design principles drive what data we use and how we handle it:

  1. Original sources first
    • We prioritize official government project sources, public procurement portals, and primary documents rather than scraping secondary aggregators.
    • This includes national, state, provincial, city, county, metropolitan, road, water and energy authorities.
  1. Standardize a fragmented universe
    • The construction ecosystem has no global standard for project or procurement data. Fields, formats, languages, and update practices vary wildly. arXiv
    • Taiyō’s Data-Mesh rebuilds this foundational data with a common schema so that millions of heterogeneous records behave like one coherent dataset.
  1. Automate, then enrich with AI
    • We use web-scale automation to ingest data daily from thousands of sources.
    • We then use LLMs, classical ML, and knowledge graphs to fill gaps, normalize entities, and improve completeness while keeping humans in the loop for quality. arXiv


2. Types of data Taiyō.AI ingests

2.1 Official government project sources

We systematically collect from official government project and program sites across the world, for example:

  • National and federal infrastructure portals
  • State / provincial DOTs and infrastructure agencies
  • City and county capital projects pages
  • Special purpose authorities (ports, airports, transit, water, power, etc.)

From these, we extract:

  • Project announcements and descriptions
  • Sponsoring agencies and departments
  • Locations and geospatial hints
  • Status and phases (planned, tendering, under construction, completed, etc.)
  • Budget, funding source, schedule, major milestones (where available)

These are the “ground truth” records on which much of our Data-Mesh is built. arXiv


2.2 Public procurement and tender data

We collect and standardize public procurement data from:

  • National and regional procurement portals
  • Sector-specific tender systems (e.g., transport, energy, water)
  • Multilateral and development bank procurement systems
  • Other structured bid and award feeds

Data typically includes:

  • RFP / RFQ announcements and tender notices
  • Contract awards, winning bidders, shortlists
  • Procurement method and delivery model (e.g., DB, DBB, PPP)
  • Estimated and awarded values (where disclosed)
  • Procurement timelines and key dates
  • Awarded entities and their roles (prime, JV, subs, designers, etc.)

According to our technical paper, the Data-Mesh tracks tens of millions of tender records over time, forming a historical archive of procurement behavior. arXiv


2.3 Plans, pipelines, and capital programs

In addition to “live” tenders, Taiyō ingests forward-looking planning documents, including:

  • Master plans and sector plans (transport, energy, water, social infrastructure)
  • Capital Improvement Plans (CIPs)
  • State Transportation Improvement Plans (STIPs) and similar planning frameworks arXiv
  • Long-term investment frameworks, resilience plans, and climate adaptation strategies

These documents help us infer:

  • Future project pipelines and potential tenders
  • Policy and funding priorities over 5–20 year horizons
  • Risk, resilience, and climate dimensions embedded in planning

LLMs and knowledge-graph pipelines parse unstructured PDFs and web pages to extract project-like objects, spending programs, and priority corridors, turning narrative plans into structured, queryable data. arXiv


2.4 Multi-hazard risk, macro, and conditions data

Infrastructure decisions are inseparable from risk and macro conditions. Taiyō’s Data-Mesh therefore incorporates:

  • Multi-hazard and climate risk data (flood, wildfire, sea level, storm, heat, etc.)
  • Price and cost indices (PPI, CPI, construction materials indices)
  • Trade and tariff data (e.g., for steel, cement, equipment)
  • Labor market indicators (wages, shortages, demographic pressures)
  • Event and disruption data (storms, outages, forced shutdowns)

This data is linked to projects, regions, and asset types, allowing tools like GetTaiyoRiskData to provide risk-aware context for pricing, scheduling, or macro conditions in ConstructChat.


2.5 Building permits and local activity

Where available, we incorporate building permits and local project approvals, especially for:

  • Private and mixed-use developments which intersect with public infrastructure
  • Local road, water, and utility works not always visible in national systems
  • Industrial and logistics projects (e.g., data centers, manufacturing plants) that shape demand for infrastructure

This layer helps users understand on-the-ground construction activity beyond formal megaprojects and PPPs.


2.6 Corporate, financial, and institutional data

To connect projects with money and organizations, Taiyō collects:

  • Public company data related to contractors, EPCs, suppliers, investors
  • SEC EDGAR filings, FCA NSM filings, fund documents, REIT disclosures, etc., via tools like SecEdgarSearch and FcaNsmSearch
  • Deal and M&A information around infrastructure assets and platforms
  • Institutional investors and fund strategies where disclosed

These data feed into tools like FinfraSearch and GetMunicipalBonds to link projects and markets with the financial ecosystem behind them.


2.7 Scientific, engineering, and standards knowledge

For deeper technical and engineering context, Taiyō integrates:

  • Engineering journals and conference proceedings relevant to civil, structural, geotechnical, environmental, and energy engineering arXiv
  • ASCE and similar professional society content (via tools like SearchASCE)
  • Academic literature via SearchScholar (design practices, new materials, construction methods)

This allows ConstructChat to move from “what is being built?” to “how should we design and build it?” with references to engineering best practice.


2.8 News, web, and media content

We also use news and web sources to capture information that official systems do not systematically track, such as:

  • Project disputes or cancellations
  • Political or community opposition
  • ESG controversies or labor issues
  • Corporate announcements, JV formations, and strategic moves

Tools like SearchNews, SearchWeb, BrowseWeb, and AnalyzeUrl let agents pull very recent, unstructured information into decisions before it is reflected in structured databases.


2.9 Satellite, geospatial, and time-series data

Through tools like GeeTimeseriesData, Taiyō can incorporate Google Earth Engine and other geospatial datasets to analyze:

  • Land-use change around corridors and assets
  • NDVI (vegetation) trends around ports, corridors, or urban expansions
  • Flood risk changes, subsidence, and coastal impacts over time

This enables multi-modal AI (tabular + time series + maps) for infrastructure-relevant spatial analysis.


3. Scale, history, and depth of coverage

At source level, Taiyō’s Data-Mesh integrates:

  • 35,000+ government, corporate, and institutional sources
  • 65 years of procurement, spend, and project history and futures (where historical records permit)
  • Daily ingest and refresh cycles

As described in our technical paper, the mesh tracks on the order of:

  • 1.5M+ project records
  • Tens of millions of tender and procurement records
  • Hundreds of thousands of brownfield assets
  • Billions of risk and activity signals and research references arXiv

Together, this forms one of the largest AI training and retrieval datasets ever built for construction and infrastructure, feeding both Mechanical AI (data structuring, prediction, enrichment) and Thinking AI (reasoning over complex, multi-modal inputs). arXiv


4. How Taiyō keeps data “live” when official sources do not

A key problem in this industry is that no single government or news system reliably updates all its project information:

  • Many portals are fragmented by region, sector, or agency
  • Update frequencies range from daily to yearly
  • Formats change without notice; records disappear or move
  • There is no universal standard for status or fields

Taiyō’s Data-Mesh is explicitly designed to solve this:

  1. Web-scale source discovery and scheduling
    • Automated crawlers and scrapers monitor thousands of project and procurement sites.
    • Each source has its own update schedule based on observed behavior and criticality.
  1. Change detection and incremental updates
    • We detect what changed since the last scan, not just re-download everything.
    • New tenders, updates to statuses, additional documents, or changed dates are captured and versioned.
  1. AI-assisted enrichment of incomplete records
    • When official sources omit fields (e.g., missing value ranges, sectors, roles), we use LLMs and knowledge graphs to infer and enrich where appropriate, while preserving original raw data. arXiv
  1. Normalization across thousands of schemas
    • Different agencies use different field names, formats, and taxonomies.
    • Our standardization layer maps them onto a unified schema (projects, tenders, entities, risks, geography, docs).
  1. Human-in-the-loop quality control
    • Domain experts periodically review samples and edge cases (e.g., very large projects, key clients, or new regions) to validate and refine AI-generated augmentations. arXiv
  1. Daily updates, not static snapshots
    • The result is a living, breathing industry dataset that updates daily as official sources, financial filings, news, and risk signals change.

Where the public web is noisy or inconsistent, Taiyō’s architecture makes the dataset more consistent and more complete over time, without breaking links to the original official sources.


5. How the data is structured and connected

Once ingested, all of this data is turned into standardized data products and connected via an infrastructure knowledge graph: arXiv

  • Project nodes – representing physical assets (roads, rail, ports, pipelines, plants, social infrastructure, digital infrastructure, etc.)
  • Tender and contract nodes – representing procurement events and relationships
  • Entity nodes – owners, sponsors, EPCs, designers, consultants, suppliers, investors, funds, agencies
  • Risk and condition nodes – hazards, macro indicators, policy changes, events
  • Document nodes – plans, PDFs, reports, filings, news articles, research papers

Relationships capture:

  • Who is involved with whom
  • Which projects belong to which plans or programs
  • Which risks affect which regions and sectors
  • How entities are linked through deals, JVs, and repeated collaborations

This graph is what allows ConstructChat tools like CanvasSearch, CanvasMarketBreakdown, EntityGetDetails, GetTaiyoRiskData, and others to answer complex, multi-hop questions in natural language.


6. AI, multi-modal reasoning, and workflows

Taiyō’s data is not just “stored”; it is designed for AI:

  • Multi-modal AI
    • Tabular: project records, tender tables, risk series
    • Time series: prices, hazards, workloads, macro indices
    • Text: plans, filings, research, news, internal documents
    • Maps: geocoded projects, corridors, risk layers
  • Multi-agent, business-task reasoning
    • A swarm of tools and agents (Market Report, Project Research, Risk scans, etc.) operate over the Data-Mesh.
    • Agents are optimized for real infra workflows: bidding, BD, investment screening, policy research—not generic chat.

Together, this creates a standardized, connected, AI-ready data + workflow layer for the world’s largest industry.


7. Adding and requesting new sources

We know every client has unique needs, so Taiyō is built to grow with you:

  • Public sources
    • If you rely on a public portal we don’t yet cover, you can request it.
    • Our team can onboard it into the Data-Mesh, with ongoing automated updates.
  • Private / internal sources (Enterprise)
    • For enterprise customers, we can connect your internal systems (documents, data warehouses, APIs, S3 buckets, SFTP, etc.) under strict access and privacy controls, so your own data becomes part of your workspace’s private mesh.
  • Regional and sector expansions
    • If you are entering a new country or sector, we can prioritize coverage of relevant project, procurement, and risk sources as part of a joint roadmap.

If you have a specific source or dataset you’d like us to integrate, you can contact our team with the URL or description, and we’ll assess how quickly it can be brought into the Taiyō Data-Mesh.


In short:

Taiyō.AI doesn’t depend on a single database or a static feed. It is a live, automated, AI-enriched Data-Mesh that unifies official government project sources, procurement data, plans, risk and hazard data, building permits, financial filings, research, news, and geospatial signals into one end-to-end, AI-ready representation of the construction and infrastructure economy.

Updated on: 25/11/2025

Was this article helpful?

Share your feedback

Cancel

Thank you!