Job Data Web Crawling & Legal Compliance
Introduction
At Mantiks, we are committed to providing transparent, compliant, and ethically sourced job market data. This page outlines the legal and operational framework that governs our collection and processing of publicly available job postings (“job data web crawling”). Its purpose is to give procurement, legal, and compliance teams a clear understanding of how Mantiks obtains, structures, enriches, and distributes job data in full alignment with European and international regulations.
Job postings are, by nature, public, factual, and voluntarily disclosed by companies to attract applicants and promote open positions. Mantiks collects only information that is intentionally made public by employers on their recruitment channels — such as corporate career pages, job boards, and other publicly accessible websites — without bypassing authentication mechanisms or accessing any private, restricted, or confidential content.
Our data infrastructure is designed around a simple principle: collect only what companies have explicitly decided to publish, transform it into structured intelligence, and deliver it securely to our clients. This page details the legal foundations, compliance safeguards, and responsible-use guidelines that underpin our daily data refresh operations and ensure full integrity of our service.
Legal Framework for Web Crawling in Europe
The collection of publicly accessible online information is governed in Europe by a combination of EU directives, national laws, and well-established jurisprudence. Importantly, web crawling is not prohibited by default, provided that it respects specific principles relating to access, intellectual property, and technical protections. Mantiks operates fully within this legal perimeter.
2.1. EU Directive 2001/29/EC (InfoSoc Directive)
The InfoSoc Directive defines the scope of copyright protection in the digital environment. Under this framework:
Facts are not protected by copyright, including factual elements of job postings such as job titles, company names, locations, posting dates, or contract types.
Copyright applies only to original, creative expressions (e.g., long-form descriptions or editorial content).
Extracting and structuring factual data is therefore lawful when done without reproducing copyrighted text.
Mantiks processes only factual, non-original information necessary to identify and classify job opportunities. We do not republish or redistribute copyrighted content verbatim.
2.2. European Database Directive (96/9/EC)
This directive governs the protection of databases within the EU. Under this framework:
Databases without substantial investment in structure may not be protected.
Restrictions written in Terms of Use (TOU) do not apply when accessing content that is publicly available without account creation or login.
Extracting individual elements of a database is permitted when it does not constitute “substantial extraction.”
This is particularly relevant for job boards and corporate sites that publish offers freely and openly. Mantiks extracts only a limited set of factual attributes and does not perform bulk duplication of full databases.
2.3. Compliance with French National Law
France does not prohibit automated extraction of public data. Applicable rules require that:
No technical protection measures (e.g., login walls, captchas, rate-limit bypassing) are circumvented.
No confidential or non-public data is accessed.
Copyright is respected (i.e., text is not copied verbatim beyond what is legally allowed).
Mantiks’ crawlers fully comply with these principles and operate in strict adherence to publicly accessible content only.
2.4. Key Jurisprudence Supporting the Legality of Public Data Crawling
Ryanair Ltd v. PR Aviation BV — Court of Justice of the EU (2015)
The CJEU ruled that when a website does not protect its content through database rights and offers free access without user authentication, its Terms & Conditions cannot prohibit the extraction of data for external use.
Implication: Publicly accessible job listings can be lawfully collected when no technical access restriction is in place.
hiQ Labs Inc. v. LinkedIn Corp — U.S. Ninth Circuit Court of Appeals (2022)
Although a U.S. case, it is widely referenced by global compliance teams. The court affirmed that accessing public data does not constitute unauthorized access under the Computer Fraud and Abuse Act.
Implication: Crawling public professional information is lawful as long as it does not involve breaching access controls.
Together, these rulings solidify a clear international precedent:
accessing information intentionally made public is lawful, provided no authentication barrier or technical protection is bypassed.
Respect for Platforms and Job Boards
Mantiks is committed to collecting job data in a responsible, compliant, and technically respectful manner. Our crawling infrastructure is designed to ensure that we only access content that companies have intentionally made public, without interfering with the normal operation of any job board, ATS, or corporate website.
3.1. Public-Only Data Collection
Mantiks collects job postings exclusively from publicly accessible pages, meaning:
No login or account creation is required
No paywall or restricted area is accessed
No credentials are used
No automated bypass of access controls or security mechanisms
If a website requires authentication to view job postings, Mantiks does not crawl it.
3.2. No Circumvention of Technical Protections
Our systems strictly avoid any method that could be interpreted as circumventing technical barriers, such as:
CAPTCHA solving
Session hijacking
Rate-limit evasion techniques
Accessing hidden endpoints
Interacting with private APIs not intended for public use
Mantiks only retrieves information in the same way a standard user could, through normal browser-like requests.
3.3. Load-Friendly and Non-Disruptive Crawling
We implement industry-standard best practices to ensure that our activity does not impact the availability or performance of external sites:
Distributed crawling with low request frequency
Adaptive rate limiting based on site response
Respect for infrastructure constraints
Compliance with exclusion mechanisms when technically enforced
Our objective is to remain fully transparent and respectful of the ecosystem.
3.4. No Redistribution of Copyrighted Text
Mantiks does not redistribute long-form or creative textual content found within job descriptions.
Instead, we extract and classify only factual elements, such as:
Job title
Location
Company
Contract type
Posting date
Source URL
Metadata derived from the posting
Non-substantial text snippets or normalized summaries when needed
This approach ensures compliance with EU copyright law, which does not protect factual data but may protect creative expressions.
3.5. Purpose of the Data Extraction
Our extraction processes serve a clear and legitimate purpose:
recruitment market intelligence,
lead identification for staffing and recruiting firms,
workforce analytics,
structured datasets for enterprise systems (ATS, CRM, BI).
The data we provide is designed to improve transparency and efficiency in the labor market while fully respecting intellectual property and platform rules.
4. Intellectual Property Compliance
Mantiks operates within a strict intellectual property (IP) framework to ensure that all collected and distributed data complies with European copyright law and database rights legislation. Our methodology focuses exclusively on factual information derived from publicly available job postings, while avoiding any reproduction of copyrighted or proprietary material.
4.1. Facts Are Not Protected by Copyright
Under EU law (Directive 2001/29/EC), copyright does not extend to facts or purely informational elements.
This means the following data points from a job posting are not subject to copyright protection:
Job title
Company name
Location
Industry or sector
Posting date
Contract type
Seniority level
Job category or keywords
Source URL
Metadata generated by Mantiks (classification, tags, labels, inferred attributes)
These elements may be freely extracted, processed, and redistributed, as they do not constitute original creative expression.
4.2. Avoidance of Creative or Copyrighted Expressions
While the factual components of job postings are not copyrighted, the creative wording of certain descriptions may be protected. Mantiks avoids this risk by:
Not redistributing full job descriptions, paragraphs, or proprietary wording
Using short, non-substantial excerpts only when necessary for context
Transforming textual content into structured fields, classifications, or summaries
Creating derived metadata that does not reproduce the original content
This ensures full compliance with copyright requirements and avoids reproducing any creative editorial content.
4.3. Creation of an Original Mantiks Database
The datasets delivered to clients are not replicas of any job board’s database. Mantiks creates a new, original database through:
Aggregation of multiple public sources
Normalization and deduplication
Enrichment with inferred metadata
Cross-referencing with company datasets
Classification via proprietary algorithms
Temporal tracking and daily refresh logic
Under the EU Database Directive (96/9/EC), this enrichment process grants Mantiks sui generis rights over the structure and content of the resulting database, which is then licensed to clients.
4.4. No Substantial Extraction of Third-Party Databases
The EU Database Directive prohibits the “substantial extraction” of a protected database. Mantiks remains compliant by:
Extracting only factual elements, not the full textual body
Not replicating internal structures or taxonomy of any job board
Combining heterogeneous sources rather than relying on a single database
Limiting extraction to fields publicly displayed on job postings
Applying transformative processing, making the output materially different from the source
Therefore, Mantiks does not copy or reproduce third-party databases; it builds an independent and enriched dataset.
4.5. Licensing and Client Usage
All datasets provided by Mantiks are delivered under a license that specifies:
Permitted internal use by the client
Prohibition of resale or redistribution without authorization
Compliance with the IP rights retained by Mantiks for the structured dataset
Full transparency on the provenance and nature of the data
This structure ensures that clients can safely use Mantiks data within their systems (ATS, CRM, BI tools) while remaining fully compliant with IP law.
Distinguishing Legal Web Crawling from Illegal Scraping
Not all forms of data extraction are equivalent. Mantiks follows a strict compliance framework that clearly separates lawful, responsible web crawling from practices that may infringe on intellectual property, violate terms of access, or compromise system integrity.
This section clarifies the distinction to ensure full transparency for enterprise clients.
5.1. What Constitutes Legal Web Crawling
Legal web crawling refers to the automated extraction of publicly accessible, non-protected, and non-confidential information, provided that no technical or security measures are bypassed.
Mantiks’ operations follow all recognized criteria of lawful crawling:
Accessing only publicly available pages
No login, no account creation, no credential-based access
No circumvention of paywalls, captchas, or rate limits
Extraction limited to factual, non-copyrighted elements
No duplication of proprietary or creative content
No harmful impact on website performance
Transparency in data transformation and redistribution
Compliance with EU copyright and database laws
These principles are aligned with European jurisprudence (e.g., CJEU Ryanair vs PR Aviation) and international case law (e.g., hiQ vs LinkedIn).
5.2. What Constitutes Illegal or Non-Compliant Scraping
Scraping becomes non-compliant when it involves any form of unauthorized access, content duplication, or harm to a third-party system. Practices generally considered illegal include:
Bypassing login or authentication mechanisms
Circumventing captchas or technical protections
Extracting content hidden behind paywalls or restricted zones
Reproducing large portions of copyrighted text
Rebuilding or duplicating a third-party database in full
Engaging in high-volume crawling that disrupts website availability
Violating computer misuse or system intrusion laws
Accessing APIs not intended for public use
Using spoofing or obfuscation techniques to hide automated access
Mantiks strictly prohibits and technically prevents all such practices.
5.3. Clear Side-by-Side Comparison
| Practice | Legal | Illegal |
|---|---|---|
| Accessing publicly visible job postings | ✔ | |
| Extracting factual elements (title, company, location) | ✔ | |
| Respecting rate limits and infrastructure constraints | ✔ | |
| No login, no bypassing of authentication | ✔ | |
| Reproducing full job descriptions or proprietary text | ❌ | |
| Circumventing CAPTCHAs or authentication barriers | ❌ | |
| Downloading entire protected databases wholesale | ❌ | |
| Accessing private APIs or hidden endpoints | ❌ |
5.4. A Responsible and Transparent Approach
Mantiks ensures compliance by:
Designing crawling methods that mimic standard browser access
Limiting request frequency to avoid system overload
Monitoring for HTTP status codes indicating restricted access
Immediately stopping crawling when encountering technical protections
Documenting data provenance with full auditability
Offering enterprise clients a clear legal framework and contractual assurance
This responsible approach ensures that our job data collection remains lawful, stable, and aligned with best practices in data ethics and compliance.
Mantiks’ Compliance Commitments
Mantiks is built on a foundation of transparency, responsibility, and legal compliance. Our approach to collecting, structuring, and distributing job market data follows industry best practices and the strictest applicable regulations. This section outlines the commitments we make to all enterprise clients, procurement teams, and legal departments.
6.1. Compliance by Design
Mantiks’ infrastructure is engineered from the ground up to ensure lawful data collection:
Access limited to publicly available information
No circumvention of technical protection measures
No reproduction of copyrighted textual content
Full alignment with European IP and database law
Purpose limitation and minimal data capture
Daily monitoring of crawling activity to ensure ongoing compliance
Compliance is not an afterthought — it is embedded into every layer of our architecture.
6.2. Transparent Data Sourcing
We maintain full transparency regarding:
Where the data comes from (public job postings)
How it is collected (public requests only)
What transformations are applied (normalization, enrichment, classification)
What is not collected (no personal data, no private content)
Enterprise clients may request documentation describing our sources, methods, and compliance rationale at any time.
6.3. Respect for External Platforms
Mantiks follows ethical and respectful crawling practices designed to protect the stability and integrity of external systems:
Low-frequency distributed crawling
Adaptive throttling to limit impact on remote servers
Monitoring for HTTP response codes indicating rate-limiting or access restrictions
Immediate cessation of crawling upon detection of technical barriers
We aim to be a responsible actor within the digital ecosystem.
6.4. Secure Data Processing and Delivery
All job data collected by Mantiks is processed securely using:
Encrypted data transfers (TLS 1.2+)
Secure cloud infrastructure with strict access controls
Regular security reviews and monitoring
Data isolation per client when required
Clients receive structured datasets through secure delivery mechanisms (API, S3 bucket, encrypted files, etc.).
6.5. Right to Exclusion and Publisher Requests
Mantiks respects the rights of organizations whose job postings appear on public sites. Any employer may request exclusion from our indexing and:
We provide a simple, documented removal process
Exclusion can be applied at the domain or company level
Requests are processed swiftly and transparently
This ensures that companies retain control over how their public information is surfaced and used.
6.6. Licensing Framework for Safe Client Use
To ensure full legal clarity for our clients:
All datasets are provided under a structured licensing agreement
The agreement defines permitted uses and redistribution limits
Clients gain broad rights for internal use (ATS, CRM, analytics, BI)
Mantiks retains IP rights over its enriched database structure and derived metadata
No personal data is involved, reducing GDPR obligations for clients
Our licensing ensures that downstream usage remains safe, compliant, and unambiguous.
6.7. Commitment to Continual Compliance Monitoring
Legal frameworks and platform policies evolve. Mantiks is committed to:
Ongoing legal review
Monitoring regulatory developments
Updating internal practices as needed
Communicating transparently with clients about relevant changes
This proactive stance ensures long-term compliance and reduces legal risk for all stakeholders.
Client Usage Guidelines & Licensing
Mantiks provides access to its job data under a clear and structured licensing framework, designed to give clients broad operational flexibility while protecting intellectual property and ensuring compliant downstream use.
7.1. Permitted Uses
Under a standard Mantiks data license, clients are generally allowed to:
Ingest the dataset into internal systems (ATS, CRM, CDP, data warehouse, BI tools)
Use the data for recruitment, sales, market analysis, and strategic planning
Build internal dashboards, reports, and analytics based on the job data
Combine Mantiks data with internal datasets for enriched insights
Share insights and aggregated statistics internally across teams and entities
These uses are considered internal business use and are fully covered by the standard license.
7.2. Restricted Uses
To protect both Mantiks’ IP and the broader ecosystem, some uses are restricted or require a dedicated agreement. These may include:
Resale, sublicensing, or redistribution of the raw dataset to third parties
Building a competing data product or public job feed based primarily on Mantiks data
Making the raw data publicly accessible (e.g., in a public API or open dataset)
Using the data in ways that could misrepresent its origin or ownership
Where needed, Mantiks can offer custom licensing terms to accommodate specific partnership or resale models.
7.3. Data License Scope
Mantiks grants clients a non-exclusive, non-transferable license to:
Use the data for the duration of the subscription
Store historical snapshots for internal analysis and benchmarking
Maintain archives of past datasets for auditability and longitudinal studies
Mantiks retains all rights to:
The underlying database
The structure, schema, and metadata
All enrichment logic and derived classifications
7.4. Data Refresh and Retention
Job data is refreshed on a daily basis, ensuring that:
New postings are added
Expired or closed roles are updated or flagged
Historical records can be used to observe trends over time
Clients may choose to:
Maintain only the latest snapshot
Or build an internal historical repository for long-term analysis
Both approaches are compatible with Mantiks’ licensing model, provided that the data remains for internal use only, unless otherwise agreed.
7.5. Compliance Responsibility
Mantiks ensures that the collection and delivery of the data is lawful and compliant. Clients are responsible for:
How the data is used inside their organization
Ensuring that any combination with other datasets respects applicable regulations
Respecting the license terms agreed with Mantiks
Mantiks can support legal and compliance teams with documentation, clarifications, and joint reviews where needed.
FAQ for Procurement, Legal & Compliance Teams
This section addresses the most common questions raised by enterprise stakeholders evaluating Mantiks job data.
8.1. Is the data collected from public sources?
Yes. Mantiks collects job postings exclusively from publicly accessible web pages, without bypassing logins, paywalls, or technical access barriers. No private, confidential, or restricted content is accessed.
8.2. Does the dataset contain personal data?
No. Mantiks focuses on job postings, which are factual, professional listings published by companies. The data does not include sensitive personal data about individuals, and does not target private profiles.
8.3. Is this compliant with EU copyright and database law?
Yes. Mantiks:
Extracts factual elements that are not protected by copyright
Does not redistribute full textual job descriptions
Does not replicate any third-party database in its entirety
Builds an original, enriched dataset that is licensed to clients
This approach is aligned with EU copyright law and the EU Database Directive.
8.4. Is this comparable to “screen scraping” or hacking?
No. Mantiks does not engage in any intrusive or deceptive techniques. We do not:
Bypass authentication mechanisms
Circumvent captchas or rate limits
Access private APIs or internal endpoints
Perform high-volume requests that could disrupt a website
Our operations are equivalent to a careful, automated reading of public pages.
8.5. Can we safely integrate Mantiks data into our ATS, CRM, or BI tools?
Yes. The dataset is designed specifically for:
Recruitment use cases
Sales and business development
Strategic workforce and market analysis
Because it does not contain sensitive personal data and is composed of public, factual information, integration into internal tools is typically low-risk from a compliance perspective, subject to your internal policies.
8.6. Can we share this documentation with our legal and procurement teams?
Yes. This page is intended precisely for procurement, legal, and compliance stakeholders. Mantiks can also provide:
A short compliance summary
A formal position paper in PDF format
Additional clarifications on request
8.7. What happens if a publisher or company asks to be excluded?
Mantiks respects such requests. We have a documented process to:
Exclude specific domains or employers from future indexing
Remove or stop refreshing their job postings in our dataset
Confirm exclusion to the requesting party
Conclusion & Contact
Mantiks provides a legally compliant, ethically sourced, and technically robust view of the job market, refreshed daily and ready to be integrated into enterprise systems. Our approach is built on three pillars:
Public, factual data only
Strict respect for intellectual property and technical protections
Clear licensing and transparent documentation for enterprise clients
For procurement, legal, and compliance teams, this means:
A clear understanding of how data is collected and processed
A reduced risk profile compared with unverified data providers
A solid legal and contractual framework for long-term usage
If you require additional information, a formal legal memo, or a joint review with your internal teams, Mantiks can provide tailored documentation and support.
→ For further questions or to request our detailed compliance and licensing documentation, please contact:
Email: contact@mantiks.io (exemple, à adapter)
Subject: “Job Data Web Crawling – Compliance & Licensing”

