Skip to content

DATA

Download data

The patentCity database is available as an open access dataset (CC-BY-4).

Download patentCity database

Citation

If you use the data, please cite Bergeaud and Verluise (2021) and De Rassenfosse, Kozak and Seliger (2019)

@techreport{bergeaudVerluise2021,
  title={One Century of Innovation in Europe and the US},
  author={Bergeaud, Antonin and Verluise, Cyril},
  year={2021}
}
@article{deRassenfosse2019,
  title={Geocoding of worldwide patent data},
  author={De Rassenfosse, Ga{\'e}tan and Kozak, Jan and Seliger, Florian},
  journal={Scientific data},
  volume={6},
  number={1},
  pages={1--15},
  year={2019},
  publisher={Nature Publishing Group}
}
Bergeaud, Antonin and Cyril Verluise. "One Century of Innovation in Europe and the US". 2021
De Rassenfosse, Gaetan, Jan Kozak, and Florian Seliger. "Geocoding of worldwide patent data." Scientific data 6, no. 1 (2019): 1-15.

Database schema

The database is available in 2 formats: csv and jsonl. Both contain the same data and are compatible with most SQL database engine and cloud data wharehouses such as BigQuery (GCP) and Athena (AWS) which should be your preferred way to work the patentcity full database.

jsonl or csv

The jsonl format allows nested data, hence is more compact. The csv format can be read off-the-shell by any library supporting tabular/structured data (e.g. pandas for python, dplyr for R, etc).

Nested fields

  • Dotted variables (e.g. patentee.is_inv) indicate nested fields.
  • In the csv flavour of the data, the prefix (e.g. patentee.) is not reported. The database has already been flattened.
name description mode type
publication_number Publication number. NULLABLE STRING
publication_date Publication date (yyyymmdd). NULLABLE INTEGER
country_code Country code of the patent office. NULLABLE STRING
pubnum Publication number. NULLABLE STRING
kind_code Kind code. NULLABLE STRING
family_id Family ID (DOCDB). NULLABLE STRING
cpc_code Comma-separated list of cpc-codes. NULLABLE STRING
origin Indicates the origin of the patentee data (PC: patentcity, WGP25: Worldwide Geocoding of Patent - slot 25, WGP45: Worldwide Geocoding of Patent - slot 45, EXP: expansion ). NULLABLE STRING
patentee.is_inv True if the patentee is an inventor, else False. NULLABLE BOOLEAN
patentee.is_asg True if the patentee is an assignee, else False. NULLABLE BOOLEAN
patentee.name_text Name. NULLABLE STRING
patentee.person_id Person ID (PATSTAT). NULLABLE INTEGER
patentee.name_start Name start. NULLABLE INTEGER
patentee.name_end Name end. NULLABLE INTEGER
patentee.occ_text Occupation text. NULLABLE STRING
patentee.occ_start Occupation start. NULLABLE INTEGER
patentee.occ_end Occupation end. NULLABLE INTEGER
patentee.cit_text Citizenship text. NULLABLE STRING
patentee.cit_code Citizenship code. NULLABLE STRING
patentee.cit_start Citizenship start. NULLABLE INTEGER
patentee.cit_end Citizenship end. NULLABLE INTEGER
patentee.loc_text Location text. NULLABLE STRING
patentee.loc_start Location start. NULLABLE INTEGER
patentee.loc_end Location end. NULLABLE INTEGER
patentee.loc_addressLines Formatted address lines built out of the parsed address components. NULLABLE STRING
patentee.loc_locationLabel Assembled address value for displaying purposes. NULLABLE STRING
patentee.loc_country ISO 3166-alpha-3 country code. NULLABLE STRING
patentee.loc_state First subdivision level(s) below the country. Where commonly used, this is a state code (for instance, CA for California). NULLABLE STRING
patentee.loc_county Second subdivision level(s) below the country. Use of this field is optional if a second subdivision level is not available. NULLABLE STRING
patentee.loc_city Locality of the address. NULLABLE STRING
patentee.loc_district Subdivision level below the city. Use of this field is optional if a second subdivision level is not available. NULLABLE STRING
patentee.loc_subdistrict Subdivision level below the district. Used only for India. NULLABLE STRING
patentee.loc_postalCode Postal code. NULLABLE STRING
patentee.loc_street Street name. NULLABLE STRING
patentee.loc_building Building name. NULLABLE STRING
patentee.loc_houseNumber House number. NULLABLE STRING
patentee.loc_longitude Longitude. NULLABLE FLOAT
patentee.loc_latitude Latitude. NULLABLE FLOAT
patentee.loc_relevance Indicates the relevance of the results found; the higher the score the more relevant the alternative. The score is a normalized value between 0 and 1. NULLABLE FLOAT
patentee.loc_matchType Quality of the location match. pointAddress: Location matches exactly as point address. interpolated: Location was interpolated. NULLABLE STRING
patentee.loc_matchCode Code indicating how well the result matches the request. Enumeration [exact, ambiguous, upHierarchy, ambiguousUpHierarchy]. NULLABLE STRING
patentee.loc_matchLevel The most detailed address field that matched the input record. NULLABLE STRING
patentee.loc_matchQualityCountry MatchQuality provides detailed information about the match quality of a result at attribute level. Match quality is a value between 0.0 and 1.0. 1.0 represents a 100% match. Here, matchQuality is defined at country level. NULLABLE FLOAT
patentee.loc_matchQualityState Same at state level. NULLABLE FLOAT
patentee.loc_matchQualityCounty Same at county level. NULLABLE FLOAT
patentee.loc_matchQualityCity Same at city level. NULLABLE FLOAT
patentee.loc_matchQualityDistrict Same at district level. NULLABLE FLOAT
patentee.loc_matchQualityPostalCode Same at postalCode level. NULLABLE FLOAT
patentee.loc_matchQualityStreet Same at street level. NULLABLE FLOAT
patentee.loc_matchQualityHouseNumber Same at houseNumber level. NULLABLE FLOAT
patentee.loc_matchQualityBuilding Same at building level. NULLABLE FLOAT
patentee.loc_key Key used for statistical area mapping (internal use). NULLABLE STRING
patentee.loc_statisticalArea1 Name of the high level Statistical Area. NULLABLE STRING
patentee.loc_statisticalArea1Code Code of the high level Statistical Area. NULLABLE STRING
patentee.loc_statisticalArea2 Name of the mid level Statistical Area. NULLABLE STRING
patentee.loc_statisticalArea2Code Code of the mid level Statistical Area. NULLABLE STRING
patentee.loc_statisticalArea3 Name of the low level Statistical Area. NULLABLE STRING
patentee.loc_statisticalArea3Code Code of the low level Statistical Area. NULLABLE STRING
patentee.loc_recId Identifier of the input address in the response. NULLABLE STRING
patentee.loc_seqLength Number of results for the corresponding input record. NULLABLE INTEGER
patentee.loc_seqNumber Consecutively numbers the different results for the corresponding input record starting with 1. NULLABLE INTEGER
patentee.loc_source Geocoding source (in [HERE, GMAPS, MANUAL]). NULLABLE STRING
patentee.is_duplicate True if a patentee with the 'same' name has been detected in the same patent. Only one of the two is marked as duplicate. NULLABLE BOOLEAN
has_a Whether the patent's family features an A kind code in the database [DOCDB family level]. NULLABLE BOOLEAN
has_b Whether the patent's family features a B kind code in the database [DOCDB family level]. NULLABLE BOOLEAN
N Number of patents in the family [DOCDB family level]. NULLABLE INTEGER
name description mode type
publication_number Publication number. NULLABLE STRING
publication_date Publication date (yyyymmdd). NULLABLE INTEGER
country_code Country code of the patent office. NULLABLE STRING
pubnum Publication number. NULLABLE STRING
kind_code Kind code. NULLABLE STRING
family_id Family ID (DOCDB). NULLABLE STRING
cpc_code Comma-separated list of cpc-codes. NULLABLE STRING
origin Indicates the origin of the patentee data (PC: patentcity, WGP25: Worldwide Geocoding of Patent - slot 25, WGP45: Worldwide Geocoding of Patent - slot 45, EXP: expansion ). NULLABLE STRING
is_inv True if the patentee is an inventor, else False. NULLABLE BOOLEAN
is_asg True if the patentee is an assignee, else False. NULLABLE BOOLEAN
name_text Name. NULLABLE STRING
person_id Person ID (PATSTAT). NULLABLE INTEGER
name_start Name start. NULLABLE INTEGER
name_end Name end. NULLABLE INTEGER
occ_text Occupation text. NULLABLE STRING
occ_start Occupation start. NULLABLE INTEGER
occ_end Occupation end. NULLABLE INTEGER
cit_text Citizenship text. NULLABLE STRING
cit_code Citizenship code. NULLABLE STRING
cit_start Citizenship start. NULLABLE INTEGER
cit_end Citizenship end. NULLABLE INTEGER
loc_text Location text. NULLABLE STRING
loc_start Location start. NULLABLE INTEGER
loc_end Location end. NULLABLE INTEGER
loc_addressLines Formatted address lines built out of the parsed address components. NULLABLE STRING
loc_locationLabel Assembled address value for displaying purposes. NULLABLE STRING
loc_country ISO 3166-alpha-3 country code. NULLABLE STRING
loc_state First subdivision level(s) below the country. Where commonly used, this is a state code (for instance, CA for California). NULLABLE STRING
loc_county Second subdivision level(s) below the country. Use of this field is optional if a second subdivision level is not available. NULLABLE STRING
loc_city Locality of the address. NULLABLE STRING
loc_district Subdivision level below the city. Use of this field is optional if a second subdivision level is not available. NULLABLE STRING
loc_subdistrict Subdivision level below the district. Used only for India. NULLABLE STRING
loc_postalCode Postal code. NULLABLE STRING
loc_street Street name. NULLABLE STRING
loc_building Building name. NULLABLE STRING
loc_houseNumber House number. NULLABLE STRING
loc_longitude Longitude. NULLABLE FLOAT
loc_latitude Latitude. NULLABLE FLOAT
loc_relevance Indicates the relevance of the results found; the higher the score the more relevant the alternative. The score is a normalized value between 0 and 1. NULLABLE FLOAT
loc_matchType Quality of the location match. pointAddress: Location matches exactly as point address. interpolated: Location was interpolated. NULLABLE STRING
loc_matchCode Code indicating how well the result matches the request. Enumeration [exact, ambiguous, upHierarchy, ambiguousUpHierarchy]. NULLABLE STRING
loc_matchLevel The most detailed address field that matched the input record. NULLABLE STRING
loc_matchQualityCountry MatchQuality provides detailed information about the match quality of a result at attribute level. Match quality is a value between 0.0 and 1.0. 1.0 represents a 100% match. Here, matchQuality is defined at country level. NULLABLE FLOAT
loc_matchQualityState Same at state level. NULLABLE FLOAT
loc_matchQualityCounty Same at county level. NULLABLE FLOAT
loc_matchQualityCity Same at city level. NULLABLE FLOAT
loc_matchQualityDistrict Same at district level. NULLABLE FLOAT
loc_matchQualityPostalCode Same at postalCode level. NULLABLE FLOAT
loc_matchQualityStreet Same at street level. NULLABLE FLOAT
loc_matchQualityHouseNumber Same at houseNumber level. NULLABLE FLOAT
loc_matchQualityBuilding Same at building level. NULLABLE FLOAT
loc_key Key used for statistical area mapping (internal use). NULLABLE STRING
loc_statisticalArea1 Name of the high level Statistical Area. NULLABLE STRING
loc_statisticalArea1Code Code of the high level Statistical Area. NULLABLE STRING
loc_statisticalArea2 Name of the mid level Statistical Area. NULLABLE STRING
loc_statisticalArea2Code Code of the mid level Statistical Area. NULLABLE STRING
loc_statisticalArea3 Name of the low level Statistical Area. NULLABLE STRING
loc_statisticalArea3Code Code of the low level Statistical Area. NULLABLE STRING
loc_recId Identifier of the input address in the response. NULLABLE STRING
loc_seqLength Number of results for the corresponding input record. NULLABLE INTEGER
loc_seqNumber Consecutively numbers the different results for the corresponding input record starting with 1. NULLABLE INTEGER
loc_source Geocoding source (in [HERE, GMAPS, MANUAL]). NULLABLE STRING
is_duplicate True if a patentee with the 'same' name has been detected in the same patent. Only one of the two is marked as duplicate. NULLABLE BOOLEAN
has_a Whether the patent's family features an A kind code in the database [DOCDB family level]. NULLABLE BOOLEAN
has_b Whether the patent's family features a B kind code in the database [DOCDB family level]. NULLABLE BOOLEAN
N Number of patents in the family [DOCDB family level]. NULLABLE INTEGER