<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.2 20190208//EN"
                  "JATS-archivearticle1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="1.2" article-type="other">
<front>
<journal-meta>
<journal-id></journal-id>
<journal-title-group>
</journal-title-group>
<issn></issn>
<publisher>
<publisher-name></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<permissions>
</permissions>
</article-meta>
</front>
<body>
<sec id="how-to-use-public-private-databases-in-insurance-risk-management-geography-climate-and-people-in-motor-insurance">
  <title>HOW TO USE PUBLIC-PRIVATE DATABASES IN INSURANCE RISK
  MANAGEMENT: GEOGRAPHY, CLIMATE AND PEOPLE IN MOTOR INSURANCE</title>
</sec>
<sec id="cómo-usar-bases-de-datos-público-privadas-en-la-gestión-de-riesgos-aseguradores-geografía-clima-y-personas-en-el-seguro-de-automóviles">
  <title>CÓMO USAR BASES DE DATOS PÚBLICO-PRIVADAS EN LA GESTIÓN DE
  RIESGOS ASEGURADORES: GEOGRAFÍA, CLIMA Y PERSONAS EN EL SEGURO DE
  AUTOMÓVILES</title>
  <p>Luis Enrique Cespedes Coimbra</p>
  <p>Universitat de Barcelona Business School. Barcelona, España.</p>
  <p>ORCID:
  <ext-link ext-link-type="uri" xlink:href="https://orcid.org/0009-0002-6491-7830">https://orcid.org/0009-0002-6491-7830</ext-link></p>
  <p><email>Lcespeco7@alumnes.ub.edu</email></p>
  <p>(Autor para correspondencia)</p>
  <p>Mercedes Ayuso Gutiérrez</p>
  <p>Departamento de Econometría, Estadística y Economía Aplicada.
  Universitat de Barcelona. Barcelona, España.</p>
  <p>ORCID:
  <ext-link ext-link-type="uri" xlink:href="https://orcid.org/0000-0001-6127-4572">https://orcid.org/0000-0001-6127-4572</ext-link></p>
  <p><email>mayuso@ub.edu</email></p>
  <p>Miguel Ángel Santolino Prieto</p>
  <p>Departamento de Econometría, Estadística y Economía Aplicada.
  Universitat de Barcelona. Barcelona, España.</p>
  <p>ORCID:
  <ext-link ext-link-type="uri" xlink:href="https://orcid.org/0000-0002-0286-3673">https://orcid.org/0000-0002-0286-3673</ext-link></p>
  <p>msantolino@ub.edu</p>
  <p>Reception date: July 13th 2024</p>
  <p>Acceptance date: December 2nd 2024</p>
  <p>ABSTRACT</p>
  <p>This work focuses on the use of public information sources in the
  application of relational models in insurance companies, for a better
  understanding of risks and assisting decision-making in new
  sustainability environments. Firstly, we propose using Eurostat's
  degree of urbanization methodology to group motor claims or policies
  into potentially more homogeneous categories in the insurance sector
  (urban / suburban / rural) for segmentation and analysis. Secondly, we
  analyze how insurance companies can use local weather information in
  conjunction with the degree of urbanization to model the number of
  motor claims in a specific geographic area. Finally, we apply
  relational models to databases with anonymized information on
  passengers in traffic accidents provided by the Spanish General
  Traffic Directorate for the purpose of better defining the
  characteristics of the claim based on the profile of the people inside
  the vehicle. It is about knowing, for example, the profile of the
  passengers in vehicles driven by elderly people, also in conjunction
  with sex and the geographical area. Insurance companies know the
  enormous potential of data analytics and must focus on the search for
  relationships using information that may be dispersed in multiple
  databases, including those that are for public use and that can
  facilitate the homogenization and comparison of results, together to
  the design of preventive and risk management policies. We also include
  the R codes making them available to the insurance sector and academia
  for use.</p>
  <p><bold>Keywords:</bold> Data analytics, Relational models,
  Sustainability</p>
  <p>RESUMEN</p>
  <p>Este trabajo se centra en la utilización de fuentes de información
  pública en la aplicación de modelos relacionales en las entidades
  aseguradoras, para el mejor conocimiento de las características de los
  riesgos y asistir a la toma de decisiones en nuevos entornos de
  sostenibilidad. Primero, proponemos utilizar la metodología de grado
  de urbanización de Eurostat para agrupar siniestros o pólizas de
  automóviles en categorías potencialmente más homogéneas en el sector
  asegurador (urbano / suburbano / rural) para su segmentación y
  análisis. Segundo, analizamos como las compañías aseguradoras pueden
  utilizar información climatológica local conjuntamente con el grado de
  urbanización para modelizar el número de siniestros de automóviles en
  una zona geográfica específica. Finalmente, aplicamos modelos
  relacionales a bases de datos con información anonimizada de pasajeros
  en accidentes de tráfico proporcionadas por la Dirección General de
  Tráfico de España con el objetivo de definir mejor las características
  de los siniestros en función del perfil de las personas que se
  encuentran dentro del vehículo. Se trata de conocer, por ejemplo, el
  perfil de los pasajeros de vehículos conducidos por personas mayores,
  también en relación con el sexo y la zona geográfica. Las compañías
  aseguradoras conocen la enorme potencialidad del análisis de datos y
  deben apostar por la búsqueda de relaciones usando información que
  puede estar dispersa en múltiples bases de datos, incluyendo aquella
  que es de uso público y que puede facilitar la homogeneización y
  comparación de resultados, junto al diseño de políticas preventivas y
  de gestión de riesgos. Incluimos los códigos en R poniéndolos a
  disposición del sector asegurador y de la academia para su uso.</p>
  <p><bold>Palabras clave:</bold> Análisis de datos, Modelos
  relacionales, Sostenibilidad</p>
  <sec id="introduction">
    <title>INTRODUCTION</title>
    <p>Data analysis has been at the core of the insurance industry
    since its inception. Insurance companies are ongoing an arms race to
    understand and apply advances in data science (Denuit et al., 2020;
    Wüthrich &amp; Merz, 2023). Data science is a field of applied
    mathematics and statistics that extract information based on large
    amounts of complex data or big data. Data volume has exponentially
    grown in last years in the world. As more and more diverse data
    sources become available to insurance companies, techniques to link
    different databases to extract useful information for the insurance
    business become more important. However, there are cost-effective
    methods that enhance the understanding and pricing of risks that are
    not fully taken advantage of yet. One of them is the use of
    relational databases with internal private data and public available
    one.</p>
    <p>In general, a data model is the formal way of expressing data
    relationships to a database management system (DBMS). The relational
    data model was introduced in 1970 by Edgar Frank Codd (1970). This
    model describes the world as a collection of inter-related tables,
    named relations (Watt &amp; Eng, 2014). Databases that adhere to a
    relational data model are named relational databases. Therefore, a
    relational database is a database whose logical structure is made up
    of a collection of relations (Harrington, 2016). Relational
    databases work with base tables, i.e., actually stored tables, and
    virtual tables, which are a product of relational operations and
    only exist in main memory. Their structure is registered in a data
    dictionary or catalogue, which mirror data storage relations. The
    data within the data dictionary are referred to as metadata.</p>
    <p>The aim of this study is to show that insurers have access to
    alternative data sources that are useful in pricing and risk
    management. We claim that relational models that integrate those
    data sources with internal data can be used by insurers in their
    risk analysis. There is abundant literature that provides
    interesting insights by combining various data sources with a
    relational model. Different aspects of mobility have been analyzed,
    all of which are of interest to insurance companies. There are
    several reviews that include research studies that have used a
    relational model to integrate the data. Ziakopoulos and Yannis
    (2020) wrote a literary review of spatial analysis approaches on
    road safety, where one can find studies that combine different data
    sources such as Liu &amp; Sharma (2018), Moeinaddini et al. (2014)
    and Alarifi et al. (2017), as well as Ziakopoulos (2024) recently
    published research. In Zheng et al. (2021) we can find a review of
    studies modelling traffic conflicts that combine various data
    sources, including among others Xie et al. (2019) and Zheng et al.
    (2019). Finally, Wang et al. (2013) examined the impact of traffic
    and road characteristics, and referenced some research studies that
    combined alternative data sources such as Haynes et al. (2008) and
    Lord et al. (2005).</p>
    <p>To illustrate the usefulness of leveraging public data in
    combination with private data in a relational data model, this paper
    will show three fields where they can be used for a better
    understanding of risks. The first application focuses on attributing
    an urbanization degree to the claims and/or the policyholders,
    specifically the Eurostat methodology, in line with research that
    have leveraged ZIP codes to study the relationship between crash
    characteristics and those injured (Clark &amp; Cushing, 2004; Lee et
    al., 2014; Lerner et al., 2001). The second application uses
    climatic information from AEMET (Spanish Meteorology Agency) to
    model the number of claims in a municipality (AEMET, 2024). Finally,
    we examine the characteristics of all occupants inside crashed
    vehicles, with the focus on the severity of the injuries of the
    passengers involved in a motor crash. Note that occupants are
    defined as all persons who were in the vehicle at the time of the
    accident, and passengers as the persons other than the drivers who
    were in the vehicle. Our idea is to obtain a more accurate estimate
    of the total bodily injury (BI) cost associated with an accident
    based on the profile of the driver and the expected passengers’
    profiles accompanying him.</p>
    <p>This paper is structured into five sections. First, the
    methodology of relational databases is presented. Then, the three
    proposed applications are discussed. In the first one, we use
    relational data models in spatial analysis, specifically in
    segmenting geographical locations based on the European standard
    urbanization degree categorization (also considering the relevance
    of the analysis of geographical areas and their population
    structure). In the second application, we show how to use files with
    georeferencing raster climatic data to model the number of claims in
    a geographical area. Finally, relational databases are presented as
    a tool to link more data sets with the aim of better understanding
    the characteristics of occupant BI (driver plus passengers). The
    paper ends with the main conclusions on the relevance of using
    relational models in the insurance field. We also include the R
    codes making them available to the insurance sector and academia for
    use.</p>
  </sec>
  <sec id="relational-databases">
    <title>RELATIONAL DATABASES</title>
    <p>Methodologically speaking, it is necessary to distinguish
    different components in a relational model, see: a relation, also
    named table, defined as a subset of the Cartesian product of a list
    of domains characterized by a name (Wijnen et al., 2019); its
    columns or attributes; its domains, i.e., the set of permissible
    values a column can contain; and its rows or tuples, where each one
    represents a group of related data values. These relation’s rows and
    columns have some special properties:</p>
    <list list-type="bullet">
      <list-item>
        <p>For columns: each one must have a unique name, its values
        must be drawn for only one domain, and viewing it in any order
        must not affect the meaning of the data.</p>
      </list-item>
      <list-item>
        <p>For rows: there must not be duplicate rows, there can only be
        one value at the intersection of a column and a row, and viewing
        rows in any order must not be affected the meaning of the
        data.</p>
      </list-item>
    </list>
    <p>Each table row is identified with a primary key, a value that
    uniquely identifies a specific row, stored in a column. If well
    specified, with unique primary keys and no null keys, only the table
    and column names, and the primary key of the row suffice to retrieve
    any specific data.</p>
    <p>The relationships between the tables that conform the database
    can be of three types: one-to-one, where each row in one table is
    linked to at most one row in another table; one-to-many, where a
    single row in Table A can be related to one or more rows in Table B
    but each row in table B is only related to one row in table A; and
    many-to-many, where multiple rows in one table can relate to
    multiple rows in another table, using a third table named junction
    or join table to manage the relationship between the two tables.
    However, these relationships are not mandatory and will not be
    enforced by the DBMS unless specified. The most common relationship
    type in relational databases is one-to-many. In them, there are two
    tables (A and B), each containing a column with identifier variables
    from the same domain, one with primary keys, which uniquely identify
    each row within a table, and the other with foreign keys, which link
    one table to another by referencing a primary key. A foreign key is
    a column with the same primary keys as some table in the database.
    The relationship DBMS will use the relationship by matching data
    between primary and foreign keys to retrieve associated data, i.e.,
    items from other columns. It is important to remark that relational
    data models place a constraint in one-to-may relationships, they
    require that each non-null foreign key value corresponds precisely
    to an existing primary key value. This is the most important
    constraint because it ensures the coherence of inter-table
    references.</p>
    <p>To represent data relationship in a relational database we use
    entity-relationship (ER) diagram, which in practice is a diagram
    that shows relationship types among different tables (figure 1).</p>
    <table-wrap>
      <table>
        <colgroup>
          <col width="100%" />
        </colgroup>
        <thead>
          <tr>
            <th><inline-graphic mimetype="image" mime-subtype="jpeg" xlink:href="vertopal_4859c46cb1eb401bb81b3ee2c29ce114/media/image1.jpg" /></th>
          </tr>
        </thead>
        <tbody>
          <tr>
            <td><p specific-use="wrapper">
              <disp-quote>
                <p>Figure 1. Entity-relationship diagram in Power BI.
                Source: Own elaboration. Note: in the figure, the
                different variables codes appear solely for illustrative
                purposes.</p>
              </disp-quote>
            </p></td>
          </tr>
        </tbody>
      </table>
    </table-wrap>
  </sec>
  <sec id="cartographic-location-and-degree-of-urbanization-in-motor-insurance">
    <title>CARTOGRAPHIC LOCATION AND DEGREE OF URBANIZATION IN MOTOR
    INSURANCE</title>
    <p>In European countries, according to Wijnen et al. (2019), the
    total costs of road crashes are equivalent to 0.4–4.1% of GDP. In
    the case of Spain, as the authors point out, the cost of road
    crashes stands at approximately 1% of the GDP. Moreover, Spanish
    insurance companies have seen their motor vehicle claims’ soar in
    the last years, as shown by their net combined loss ratio (Table
    1).</p>
    <disp-quote>
      <p>Table 1. Evolution of the Spanish Net Combined Loss Ratio for
      Motor Third-Party liability Q4-2020 to Q4-2023. Source: Own
      elaboration based on “Boletín de Información Trimestral de Seguros
      y Fondos de Pensiones Cuarto Trimestre 2023”.</p>
    </disp-quote>
    <table-wrap>
      <table>
        <colgroup>
          <col width="25%" />
          <col width="11%" />
          <col width="11%" />
          <col width="11%" />
          <col width="11%" />
          <col width="11%" />
          <col width="11%" />
          <col width="11%" />
        </colgroup>
        <thead>
          <tr>
            <th>Year</th>
            <th>2020</th>
            <th>2021</th>
            <th>2021</th>
            <th>2021</th>
            <th>2021</th>
            <th>2022</th>
            <th>2022</th>
          </tr>
        </thead>
        <tbody>
          <tr>
            <td>Quarter</td>
            <td>4</td>
            <td>1</td>
            <td>2</td>
            <td>3</td>
            <td>4</td>
            <td>1</td>
            <td>2</td>
          </tr>
          <tr>
            <td>Net Combined LR (%)</td>
            <td>94</td>
            <td>95.6</td>
            <td>96.6</td>
            <td>98.3</td>
            <td>100.1</td>
            <td>103.8</td>
            <td>100.2</td>
          </tr>
          <tr>
            <td></td>
            <td></td>
            <td></td>
            <td></td>
            <td></td>
            <td></td>
            <td></td>
            <td></td>
          </tr>
          <tr>
            <td>Year</td>
            <td>2022</td>
            <td>2022</td>
            <td>2023</td>
            <td>2023</td>
            <td>2023</td>
            <td>2023</td>
            <td></td>
          </tr>
          <tr>
            <td>Quarter</td>
            <td>3</td>
            <td>4</td>
            <td>1</td>
            <td>2</td>
            <td>3</td>
            <td>4</td>
            <td></td>
          </tr>
          <tr>
            <td>Net Combined LR (%)</td>
            <td>100.6</td>
            <td>103</td>
            <td>104.6</td>
            <td>105.7</td>
            <td>105.1</td>
            <td>107.1</td>
            <td></td>
          </tr>
        </tbody>
      </table>
    </table-wrap>
    <p>The increases in their loss ratios underline the importance of
    understanding crashes better to reestablish profitability in the
    Motor Third Party Liability insurance. Incorporating additional
    spatial analysis could enhance their analytical process, allow for
    more geographical-based measures and contribute to the improvement
    of risk selection and pricing.</p>
    <p>The riskiness of a driver is conditioned by the environment where
    he/she moves (Pljakić et al., 2022). Population density and its
    corresponding infrastructures not only heavily determines drivers’
    maneuvers and behaviors, but also the expected consequences of their
    mistakes (Abdel-Aty et al., 2013; Bil et al., 2019; Prato et al.,
    2018; Keeves et al., 2019). Several studies have shown that even if
    urban areas tend to concentrate a bigger proportion of crashes,
    occupants suffer worse injuries in rural areas crashes (Keeves et
    al., 2019, Peura et al., 2015). In Spain, the profile of the driver
    in rural areas is also different, older driver represent a bigger
    share of the census, according to the Spanish driver census for
    2017-2019.</p>
    <p>The intersection of rurality, rural depopulation and population
    ageing is important for Spain. At the same time, it is one of the
    European countries with the highest life expectancy, with the
    highest percentage of population living in cities, and the highest
    percentage of older people concentrated in rural areas (Gutiérrez et
    al., 2023; Casado-Sanz et al., 2019; European Commission, 2023).
    Rural municipalities represent 84% of Spain's surface area but only
    the 16% of the Spanish population lives in rural municipalities. The
    25% of the rural population is over 65 years old and almost a third
    of those over 65 are over 80 years old. From the point of view of
    road safety, this context represents a huge challenge when it comes
    to ensuring safe and sustainable mobility in rural areas, which are
    increasingly depopulated and where the incidence of population aging
    is higher. In Spain, the greatest number of traffic accidents occur
    in urban environments but 52% of traffic fatalities occurs on rural
    environments (Harland et al., 2014).</p>
    <p>Incorporating these structural demographic changes into insurance
    companies’ models is important. However, adding new variables to a
    model is not trivial. It requires deciding how much resources to
    invest into making adjustments and ensuring accurate categorization.
    The difficulty increases with variables that do not have clear-cut
    classifications. Insurance companies are forced to choose between
    more variable customization or, when available, using standardized
    criteria. There are situations where opting for homogeneous criteria
    offers certain advantages, it makes market comparisons easier, and
    enables leveraging public available data if matched. In these
    situations, insurance companies could extract more insights from
    their policy holders and claims if they integrate their own
    (proprietary) data with information accessible from public
    sources.</p>
    <p>An illustrative example is separating policyholders or claims by
    urbanization degree, i.e., whether it is urban or rural area.
    Although the use of geographical areas is very common in insurance,
    there is no harmonized widely-used criteria to determine the degree
    of urbanization of a location. Researchers and practitioners from
    different countries tend to use multiple measures (Harland et al.,
    2014), mostly related to population density or size, with different
    thresholds to determine urbanization and reflect their perspective
    on the urban – rural area dichotomy (Keeves et al., 2019). To ease
    international comparison and offer standardized classifications,
    Eurostat developed its own methodology (OECD, 2021). This European
    statistical institution defines cities, towns and rural areas based
    on a combination of population size, density and proximity and
    attributes a classification at the Local Administrative Unit
    (municipalities in the case of Spain). Using this methodology, it’s
    possible to map the level of urbanization across EU member
    countries, enabling standardized comparison among them.</p>
    <p>In Figure 2 we show results of applying the Eurostat's
    methodology in our country, including two maps at the municipal
    level. The first map indicates urbanization levels, while the second
    uses a logarithmic scale to show vehicle crash numbers involving
    injured victims in each municipality for the 2016-2019 period. Data
    of the numbers of crashes were provided by the Spanish General
    Traffic Directorate (DGT). The comparison suggests that a higher
    degree of urbanization correlates with an increase in traffic
    accidents. Mapping different factors can help to reveal potential
    relationship that provide insights into traffic crash dynamics, even
    helping to predict crash occurrences based on certain criteria and
    to evaluate the potential correlation heterogeneity across different
    territories. For example, the impact of a factor on the number of
    crashes may vary, as it is the case in the maps of figure 2, where
    the rural areas (depicted in light green) correlate with a different
    magnitude in the northern and southern halves of Spain.</p>
    <graphic mimetype="image" mime-subtype="jpeg" xlink:href="vertopal_4859c46cb1eb401bb81b3ee2c29ce114/media/image2.jpg" />
    <disp-quote>
      <p>Figure 2. Map of Spanish peninsular municipalities by degree of
      urbanization in 2018 (right) and natural logarithm of the number
      of motor vehicle crashes 2016–2019 (left). Source: Cespedes et al.
      (2024b). Own elaboration.</p>
    </disp-quote>
    <p>The analysis of drivers’ behavior patterns can also be enhanced
    by combining proprietary data with public information, such as that
    coming from Geographic Information Systems (GIS). The number of
    kilometers traveled influences the risk of accident (Boucher et al.,
    2013). Building from the segmentation into urbanization degrees,
    companies could extract the distances between the location of
    crashes and the residence of their policyholders, to study the
    differences according to the degree of urbanization, by querying
    distances between municipalities by means of the Open Source Routing
    Machine (OSRM). OSRM is an open-source routing engine that provides
    shortest routes in road network. An interface between R and OSRM is
    available by using the OSRM package (Giraud, 2022). Some patterns
    that emerge, for DGT data for the years 2016-2019, are that while
    the proportion of drivers crashing in their municipality of
    residence is equal to 63.9% in urban areas, it only is 26% in rural
    areas. If we consider drivers that crashed outside their
    municipality of residence and no further than 150 km from their
    homes, they represent a 30.1% of urban drivers, and 68.7% of rural
    drivers, with similar average distances (urban 32.4 km, rural 32.9
    km) and standard deviations (urban 31.9 km, rural 29 km) (Cespedes
    et al., 2024a).</p>
    <p>Using a simply relational data model that matches the postal code
    of the location of a claim or the residence of a driver to the
    degree of urbanization can give an edge to characterize more
    accurately these risks. Not only it allows companies to know the
    share of urban/suburban/rural drivers in the portfolio, to change
    their risk appetites and target desired proportion, but also to find
    commonalities in crashes or claims by urbanization degree, such that
    some of them can be bundled and analyzed together by pricing and
    reserving departments. Furthermore, other publicly available data
    could also be considered, such as the meteorology, as shown below,
    and enrich our understanding. These analyses open up a springboard
    to develop the skills needed to exploit telematics data in the
    future (Ayuso et al., 2014).</p>
  </sec>
  <sec id="climatological-information-to-explain-insurance-claims-in-municipalities">
    <title>CLIMATOLOGICAL INFORMATION TO EXPLAIN INSURANCE CLAIMS IN
    MUNICIPALITIES</title>
    <p>The climate of an area can be an important element in explaining
    the claims for insurance companies (Ashley et al., 2015; Eisenberg,
    2004; Naik et al., 2016). Some examples in which weather may play a
    key role are motor insurance or home insurance, among others. Most
    of insurers already take meteorological information into account in
    their analysis of claim frequency and severity. In this section we
    show how insurers may incorporate public meteorological information
    in their claim analysis.</p>
    <p>In this application we consider the annual number of accidents
    with victims per municipality in Spain for the year 2019. In total,
    there were 8,131 municipalities with at least 1 accident with
    victims. Figure 3 shows the number of accidents in the Spanish
    municipalities in the year 2019. Data are scaled per each thousand
    inhabitants to be comparable municipalities of different population
    size.</p>
    <disp-quote>
      <p><inline-graphic mimetype="image" mime-subtype="jpeg" xlink:href="vertopal_4859c46cb1eb401bb81b3ee2c29ce114/media/image3.jpg" />Figure
      3. Number of motor crashes with victims per each 1,000 inhabitants
      in the Spanish municipalities in the year 2019. Source: Own
      elaboration.</p>
    </disp-quote>
    <p>Our objective in the example is to investigate whether weather
    information can be useful for insurers to explain the number of
    motor accidents with casualties. The accident data of the
    municipalities are adjusted to the number of inhabitants of the
    municipality. To avoid possible variability in the municipal motor
    accident rate due to the small population size of the municipality,
    we select municipalities with at least 500 inhabitants or more. The
    size of the dataset is now 4,146 Spanish municipalities with more
    than 500 inhabitants in which at least one motor crash with injured
    victims occurred in the year 2019. Table 3 shows descriptive
    statistics of the numerical variables of the dataset. Note that in
    this and the next sections we follow the definition of injury
    severity used by the DGT in which a victim is seriously injured if
    at least one day of hospitalization was required. Otherwise, injured
    victims are classified as casualties with slight
    injuries.<xref ref-type="fn" rid="fn1">1</xref> The degree of
    urbanization of the municipality following the European methodology
    used in the previous section is considered in the analysis. Table 4
    shows the relative frequency of the degree of urbanization of the
    municipality (categorical variable with three categories).</p>
    <p>Table 3. Descriptive statistics for numerical variables in
    municipalities with more than 500 inhabitants. Source: Own
    elaboration based on DGT data (year 2019) and section 3.</p>
    <table-wrap>
      <table>
        <colgroup>
          <col width="32%" />
          <col width="11%" />
          <col width="10%" />
          <col width="11%" />
          <col width="11%" />
          <col width="11%" />
          <col width="14%" />
        </colgroup>
        <thead>
          <tr>
            <th></th>
            <th>Min.</th>
            <th>1st Qu.</th>
            <th>Median</th>
            <th>Mean</th>
            <th>3rd Qu.</th>
            <th>Max.</th>
          </tr>
        </thead>
        <tbody>
          <tr>
            <td>Crashes</td>
            <td>0</td>
            <td>0.724</td>
            <td>1.459</td>
            <td>2.035</td>
            <td>2.574</td>
            <td>33.499</td>
          </tr>
          <tr>
            <td>Involved vehicles</td>
            <td>0</td>
            <td>1.004</td>
            <td>2.179</td>
            <td>3.177</td>
            <td>3.979</td>
            <td>58.313</td>
          </tr>
          <tr>
            <td>Casualties</td>
            <td>0</td>
            <td>0.906</td>
            <td>1.971</td>
            <td>2.962</td>
            <td>3.702</td>
            <td>52.109</td>
          </tr>
          <tr>
            <td>Fatalities</td>
            <td>0</td>
            <td>0</td>
            <td>0</td>
            <td>0.083</td>
            <td>0</td>
            <td>7.005</td>
          </tr>
          <tr>
            <td>Seriously injured casualties</td>
            <td>0</td>
            <td>0</td>
            <td>0</td>
            <td>0.291</td>
            <td>0.297</td>
            <td>16.304</td>
          </tr>
          <tr>
            <td>Slight injured casualties</td>
            <td>0</td>
            <td>0.724</td>
            <td>1.736</td>
            <td>2.588</td>
            <td>3.299</td>
            <td>45.906</td>
          </tr>
          <tr>
            <td>Average age of vehicles (in the municipality)</td>
            <td>2.004</td>
            <td>12.363</td>
            <td>13.361</td>
            <td>13.247</td>
            <td>14.275</td>
            <td>17.037</td>
          </tr>
          <tr>
            <td>Population (in thousands)</td>
            <td>0.501</td>
            <td>1.021</td>
            <td>2.334</td>
            <td>11.276</td>
            <td>6.861</td>
            <td>3280.782</td>
          </tr>
          <tr>
            <td>Percentage of male population</td>
            <td>0.466</td>
            <td>0.562</td>
            <td>0.590</td>
            <td>0.593</td>
            <td>0.62</td>
            <td>0.780</td>
          </tr>
        </tbody>
      </table>
    </table-wrap>
    <p>Table 4. Relative frequency of the degree of urbanization for
    municipalities with more than 500 inhabitants. Source: Own
    elaboration based on DGT data (year 2019) and section 3.</p>
    <table-wrap>
      <table>
        <colgroup>
          <col width="30%" />
          <col width="21%" />
          <col width="27%" />
          <col width="21%" />
        </colgroup>
        <thead>
          <tr>
            <th></th>
            <th>Urban</th>
            <th>Intermediate</th>
            <th>Rural</th>
          </tr>
        </thead>
        <tbody>
          <tr>
            <td>Degree of urbanization</td>
            <td>5.21</td>
            <td>26.58</td>
            <td>68.21</td>
          </tr>
        </tbody>
      </table>
    </table-wrap>
    <p>Now, we incorporate the climatic information of the
    municipalities in our dataset. We will use open data from the
    Spanish Agency of Meteorology (AEMET). In particular, we will use
    the normal climatological values corresponding to the period
    1981-2010 for Spain for different climatic variables (AEMET,
    2024).</p>
    <p>The climatic variables of municipalities that can be consulted
    and that may be of interest to the insurance sector are:</p>
    <list list-type="bullet">
      <list-item>
        <p>Average maximum daily rainfall (mm).</p>
      </list-item>
      <list-item>
        <p>Average annual and monthly accumulated precipitation
        (mm).</p>
      </list-item>
      <list-item>
        <p>Seasonal average accumulated precipitation (mm).</p>
      </list-item>
      <list-item>
        <p>Average annual number of days with precipitation greater than
        or equal to 0.1 mm.</p>
      </list-item>
      <list-item>
        <p>Average annual number of days with precipitation greater than
        or equal to 1 mm.</p>
      </list-item>
      <list-item>
        <p>Average annual number of days with precipitation greater than
        or equal to 10 mm.</p>
      </list-item>
      <list-item>
        <p>Average annual number of days with precipitation greater than
        or equal to 30 mm.</p>
      </list-item>
      <list-item>
        <p>Average annual and monthly temperature (ºC).</p>
      </list-item>
      <list-item>
        <p>Average annual and monthly minimum temperature (ºC).</p>
      </list-item>
      <list-item>
        <p>Average annual and monthly maximum temperature (ºC).</p>
      </list-item>
      <list-item>
        <p>Köppen-Geiger climate classification (Kottek et al.,
        2006).</p>
      </list-item>
      <list-item>
        <p>Mean annual number of snow days.</p>
      </list-item>
      <list-item>
        <p>Mean annual number of storm days.</p>
      </list-item>
      <list-item>
        <p>Mean annual number of fog days.</p>
      </list-item>
      <list-item>
        <p>Mean annual number of sunshine hours (Insolation).</p>
      </list-item>
    </list>
    <p>The information available in AEMET is stored in GeoTIFF file
    format (.tif extension) that allows storing georeferenced
    information in an image file with TIFF format. Each GeoTIFF file
    corresponds to a raster image that refers to a climatological
    variable and period (monthly, annual or seasonal). The list of
    available files and the meteorological variables can be consulted in
    the appendix of Chazarra et al. (2018).</p>
    <p><inline-graphic mimetype="image" mime-subtype="jpeg" xlink:href="vertopal_4859c46cb1eb401bb81b3ee2c29ce114/media/image4.jpeg" />A
    raster image consists of a matrix of cells (pixels) organized in
    rows and columns in which each cell is represented by a color
    (Figure 4).</p>
    <disp-quote>
      <p>Figure 4. Example of a raster image. Source:
      <ext-link ext-link-type="uri" xlink:href="https://desktop.arcgis.com/es/arcmap/latest/manage-data/raster-and-images/what-is-raster-data.htm">own
      elaboration</ext-link>.</p>
    </disp-quote>
    <p>In the GeoTIFF files available at AEMET each cell is
    georeferenced. The images are projected according to the geographic
    coordinate system EPSG: 4326 (WGS 84 - WGS84 - World Geodetic System
    1984). That is the most commonly used geographic coordinate system
    (used in Google Earth and GSP systems, for instance) and allows the
    geographic location of each cell of the image. The color of the cell
    represents the information of the value of the meteorological
    variable in that location. The colors of the image cells are defined
    in RGBA (Red, Green, Blue, Alpha) scale. Each parameter (Red, Green,
    and Blue) defines the intensity of the color between 0 and 255. The
    Alpha parameter represents the opacity/transparency, where 0
    represents the maximum level of opacity (black), and 255 represents
    the maximum level of transparency. In our application, the alpha
    parameter would be particularly useful to distinguish between cells
    representing the land of the peninsula (or the island) and those
    that represent the sea. Finally, the GeoTIFF file provides scale
    information that links the RGBA colors of the cells with the values
    of the meteorological variable of interest.</p>
    <p>To illustrate the use of meteorological information, we display
    the map of the average maximum daily rain precipitation in the
    1981-2010 period in Canary Islands in Figure 5. The rainfall
    information was obtained from the GeoTIFF file downloaded from the
    AEMET website (AEMET, 2024). The equivalence between the RGBA space
    and the interval of the numerical values of the meteorological
    variable is provided at the bottom of the image, as a footnote.</p>
    <disp-quote>
      <p><inline-graphic mimetype="image" mime-subtype="jpeg" xlink:href="vertopal_4859c46cb1eb401bb81b3ee2c29ce114/media/image5.jpg" />Figure
      5. Average maximum daily rainfall (in mm) in the Canary Islands
      for the 1981-2010 period (in RGBA space). Source: Map of the
      AEMET. Scale: [{'Values': [140, ''], 'RGBA': ['255', '210', '255',
      '255']}, {'Values': [120, 140], 'RGBA': ['255', '138', '255',
      '255']}, {'Values': [100, 120], 'RGBA': ['162', '23', '253',
      '255']}, {'Values': [80, 100], 'RGBA': ['0', '106', '213',
      '255']}, {'Values': [70, 80], 'RGBA': ['41', '145', '248',
      '255']}, {'Values': [60, 70], 'RGBA': ['130', '191', '253',
      '255']}, {'Values': [50, 60], 'RGBA': ['128', '255', '255',
      '255']}, {'Values': [40, 50], 'RGBA': ['128', '255', '72',
      '255']}, {'Values': [30, 40], 'RGBA': ['201', '253', '130',
      '255']}, {'Values': [20, 30], 'RGBA': ['255', '255', '164',
      '255']}]}</p>
    </disp-quote>
    <p>Figure 5 was plotted downloading and reading the rainfall
    information from the raster image in RGBA space. The next step is to
    convert the RGBA information into a numerical value of the
    meteorological variable that insurers can incorporate in their
    insurance claim analysis. To achieve it, we previously convert the
    RGBA scale to the HSV scale. HSV space is a cylindrical-coordinate
    representation of points in an RGB color model. HSV stands for hue
    (type of color), saturation (quantity of color to be added), and
    value (brightness of the saturation of the color). Doing it, each
    color is represented by a single value and the numerical conversion
    of color values to the meteorological values becomes easier. Figure
    6 represent the average maximum daily rainfall (in mm) in the Canary
    Islands for the 1981-2010 period after converting the color scale in
    a numerical value. The R code for downloading of the georeferencing
    raster image of the average maximum daily rainfall in Canary Islands
    and the steps to convert the information included in the GeoTIFF
    file into a numerical value of the meteorological variable of
    interest is provided in the Annex.</p>
    <graphic mimetype="image" mime-subtype="jpeg" xlink:href="vertopal_4859c46cb1eb401bb81b3ee2c29ce114/media/image6.jpg" />
    <disp-quote>
      <p>Figure 6. Average maximum daily rainfall (in mm) in the Canary
      Islands for the 1981-2010 period (in numerical value). Source: Own
      elaboration from GeoTIFF file of AEMET.</p>
    </disp-quote>
    <p>We carry out the same steps for the raster file containing rain
    precipitation information in the Peninsula and Balearic Islands.
    Finally, the numerical values of rain precipitations of referenced
    cells representing the Peninsula, the Balearic Islands and Canarian
    Islands are merged in a single file. Now, the information of rain
    precipitations can be incorporated into the database of motor
    crashes in Spanish municipalities with more than 500 inhabitants. We
    have longitude and latitude coordinates of municipalities, so we
    assign to the municipality the meteorological value of their
    longitude-latitude coordinates. Table 5 shows descriptive statistics
    for the variable of interest <italic>average maximum daily rainfall
    (mm) in the period 1981-2010</italic> for the municipalities with
    more than 500 inhabitants in which motor crashes involving victims
    occurred.</p>
    <p>Table 5. Descriptive statistics of the meteorological variable
    <italic>Average maximum daily rainfall (mm) in the period
    1981-2010</italic> for municipalities with more than 500
    inhabitants. Source: Own elaboration.</p>
    <table-wrap>
      <table>
        <colgroup>
          <col width="24%" />
          <col width="13%" />
          <col width="13%" />
          <col width="13%" />
          <col width="13%" />
          <col width="13%" />
          <col width="13%" />
        </colgroup>
        <thead>
          <tr>
            <th></th>
            <th>Min.</th>
            <th>1st Qu.</th>
            <th>Median</th>
            <th>Mean</th>
            <th>3rd Qu.</th>
            <th>Max.</th>
          </tr>
        </thead>
        <tbody>
          <tr>
            <td>Rain precipitations</td>
            <td>25.00</td>
            <td>45.00</td>
            <td>55.00</td>
            <td>59.58</td>
            <td>75.00</td>
            <td>150.00</td>
          </tr>
        </tbody>
      </table>
    </table-wrap>
    <p>For illustrative purposes, to demonstrate the explanatory
    capacity of the meteorological information on the number of motor
    crashes with casualties, a Poisson regression model has been fitted.
    We select as dependent variable the number of motor crashes
    involving victims in Spanish municipalities with more than 500
    inhabitants. We include as regressors the average age of vehicles in
    the municipality where the accident took place, the percentage of
    male population in this municipality, the degree of urbanization of
    the municipality following the Eurostat methodology described in the
    previous section, and the average maximum daily rainfall (mm), as
    calculated in this section. Estimated coefficients are shown in
    Table 6. The variable associated with rain precipitations has an
    estimated positive coefficient statistically significant at 1%
    significance level. So, the amount of rain precipitations in a
    municipality seems to be positively related with the expected number
    of motor accidents involving victims in the municipality.</p>
    <p>Table 6. Modeling the number of motor crashes involving victims
    in Spanish municipalities with more than 500 inhabitants
    (Generalized Linear Model-Poisson regression). Note: Population (in
    thousands) is included as offset of the regression model; Null
    deviance: 45,020; Residual deviance: 35,433.</p>
    <table-wrap>
      <table>
        <colgroup>
          <col width="54%" />
          <col width="11%" />
          <col width="13%" />
          <col width="11%" />
          <col width="11%" />
        </colgroup>
        <thead>
          <tr>
            <th>Variable</th>
            <th>Coef.</th>
            <th>Std. Error</th>
            <th>z value</th>
            <th>p-value</th>
          </tr>
        </thead>
        <tbody>
          <tr>
            <td>Intercept</td>
            <td>3.539</td>
            <td>0.143</td>
            <td>24.767</td>
            <td>&lt;0.001</td>
          </tr>
          <tr>
            <td>Average age of vehicles (in the municipality)</td>
            <td>-0.138</td>
            <td>0.003</td>
            <td>-52.043</td>
            <td>&lt;0.001</td>
          </tr>
          <tr>
            <td>Percentage of male population</td>
            <td>-3.533</td>
            <td>0.283</td>
            <td>-12.467</td>
            <td>&lt;0.001</td>
          </tr>
          <tr>
            <td>Degree of urbanization – Rural</td>
            <td>0.438</td>
            <td>0.012</td>
            <td>36.489</td>
            <td>&lt;0.001</td>
          </tr>
          <tr>
            <td>Degree of urbanization – Urban</td>
            <td>0.384</td>
            <td>0.009</td>
            <td>42.008</td>
            <td>&lt;0.001</td>
          </tr>
          <tr>
            <td>Rain precipitations (mm)</td>
            <td>0.006</td>
            <td>&lt;0.001</td>
            <td>39.042</td>
            <td>&lt;0.001</td>
          </tr>
        </tbody>
      </table>
    </table-wrap>
    <p>Following with the example, a limitation of the use of
    Generalized Linear Models is that they are not flexible in the
    specification of the linear predictor (i.e., linear combination of
    parameters and regressors). In that sense, to allow for a nonlinear
    effect of the regressor associated with rain precipitations, we fit
    a Generalized Additive Model to investigate the functional form of
    the effect of rainfall on the (log) number of traffic accidents
    (Hastie, 1992). A Penalized spline (P −spline) is used to estimate
    the smooth function associated to the rainfall regressor (Eilers
    &amp; Marx, 1996). The remaining regressors are included as a linear
    combination of parameters and regressors. Estimated coefficients of
    the linear predictor part are now shown in Table 7.</p>
    <p>Table 7. Modelling the number of motor crashes involving victims
    in Spanish municipalities with more than 500 inhabitants with a
    P-spline for the rain precipitations (Generalized Additive
    Model-Poisson regression). Note: Population (in thousands) is
    included as offset of the regression model. R-sq.(adj) = 0.896;
    Deviance explained = 23.2%.</p>
    <table-wrap>
      <table>
        <colgroup>
          <col width="54%" />
          <col width="11%" />
          <col width="13%" />
          <col width="11%" />
          <col width="11%" />
        </colgroup>
        <thead>
          <tr>
            <th>Variable</th>
            <th>Coef.</th>
            <th>Std. Error</th>
            <th>z value</th>
            <th>p-value</th>
          </tr>
        </thead>
        <tbody>
          <tr>
            <td>Intercept</td>
            <td>3,982</td>
            <td>0,148</td>
            <td>26,835</td>
            <td>&lt;0.001</td>
          </tr>
          <tr>
            <td>Average age of vehicles (in the municipality)</td>
            <td>-0,133</td>
            <td>0,003</td>
            <td>-48,499</td>
            <td>&lt;0.001</td>
          </tr>
          <tr>
            <td>Percentage of male population</td>
            <td>-3,886</td>
            <td>0,291</td>
            <td>-13,351</td>
            <td>&lt;0.001</td>
          </tr>
          <tr>
            <td>Degree of urbanization – Rural</td>
            <td>0,433</td>
            <td>0,012</td>
            <td>35,825</td>
            <td>&lt;0.001</td>
          </tr>
          <tr>
            <td>Degree of urbanization – Urban</td>
            <td>0,352</td>
            <td>0,009</td>
            <td>38,029</td>
            <td>&lt;0.001</td>
          </tr>
        </tbody>
      </table>
    </table-wrap>
    <p><inline-graphic mimetype="image" mime-subtype="jpeg" xlink:href="vertopal_4859c46cb1eb401bb81b3ee2c29ce114/media/image7.jpg" />The
    estimated P-spline for the rain precipitations is shown in Figure 7.
    It can be observed that the effect of rainfall on the (log) number
    of accidents is increasing up to about the municipalities with
    average daily rainfall of 100 mm and then decreases. It is worth
    mentioning that only 3.7% municipalities took a value above 100 mm
    of average maximum daily rainfall. Even if the volume of traffic is
    reduced on wet days (Keay &amp; Simmonds, 2005), the overall effect
    on crash rates depends on the increase in the relative risk of crash
    (Black et al., 2017). Therefore, the decreasing effect of rain
    precipitations in the right tail should be taken with caution and
    more analysis would be recommended.</p>
    <disp-quote>
      <p>Figure 7. Estimated P-spline function for the rain
      precipitations to model the number of motor crashes with
      casualties in municipalities with more than 500 inhabitants
      (Generalized additive model-Poisson regression). Note: The
      effective degree of freedom of the P-spline (edf) is equal to
      6.794.</p>
    </disp-quote>
  </sec>
  <sec id="bodily-injuries-of-all-occupants-of-the-crashed-vehicle">
    <title>BODILY INJURIES OF ALL OCCUPANTS OF THE CRASHED
    VEHICLE</title>
    <p>Insurance companies have access to many data sources and can
    analyze motor claims from multiple perspectives. They can consider
    the characteristics of the location where the crash took place, as
    well as the injuries and characteristics of all occupants of the
    vehicles involved. For claims in which vehicles with passengers are
    involved, important information is lost when they only pay attention
    to aggregated costs and do not consider the variations coming from
    their injuries, especially when there is a recurrent pattern in the
    driver-passenger(s) profiles. Given that crashes represent the
    subset of policies of the company’s portfolio where the risk has
    materialized, crash reports are useful, not just to understand the
    claim, but also to infer traits of policyholders that condition
    their risk. For instance, understanding the heterogeneity in
    passenger injuries could help claims and reserving teams to have a
    more tailored opening reserves that do not distort in excess their
    quarterly average costs estimation.</p>
    <p>Vehicles with passengers involved in injury crashes represent a
    relevant proportion of the total number of vehicles involved in
    injury crashes. In this section we use the dataset of motor crashes
    involving victims in Spanish roads in the period 2017-2019.
    According to the DGT, for the years 2017 to 2019, 19.6% of passenger
    cars that were involved in a crash in which at least one person was
    injured had passengers additional to the driver (Table 8). Given
    that the vehicle involved in the crash has at least one passenger,
    mostly has 1 passenger (65.7%), followed by 2 (20.4%) and 3 (10.1%)
    passengers. If attention is paid to their proportion of injuries
    according to their severity (Table 9), the proportion of serious and
    fatal injuries by number of passengers is very stable, around 2.1%
    and 0.5% correspondingly, while slight or no injuries change
    notably, as illustrated by the rejection of the equal distribution
    of injuries by passenger number in the Pearson’s Chi-squared test.
    The more passengers there are in a car, the less likely they are to
    suffer slight injuries. This illustrates that it is important to
    consider the injury heterogeneity in the passenger vehicles to
    analyze the insured risk as a whole, both in terms of pricing and
    reserving. Understanding the characteristics of individuals with
    whom drivers tend to travel is a key issue, and from our knowledge,
    little treated in the automobile insurance context.</p>
    <p>Table 8. Passenger frequency in motor vehicles. Source: Own
    elaboration from DGT databases 2017-2019.</p>
    <table-wrap>
      <table>
        <colgroup>
          <col width="33%" />
          <col width="33%" />
          <col width="33%" />
        </colgroup>
        <thead>
          <tr>
            <th>Passengers (driver excluded)</th>
            <th>Relative Frequency (%)</th>
            <th>Relative Frequency ex 0 (%)</th>
          </tr>
        </thead>
        <tbody>
          <tr>
            <td>0</td>
            <td>80.4</td>
            <td></td>
          </tr>
          <tr>
            <td>1</td>
            <td>12.9</td>
            <td>65.7</td>
          </tr>
          <tr>
            <td>2</td>
            <td>4.0</td>
            <td>20.4</td>
          </tr>
          <tr>
            <td>3</td>
            <td>2.0</td>
            <td>10.1</td>
          </tr>
          <tr>
            <td>4</td>
            <td>0.7</td>
            <td>3.5</td>
          </tr>
          <tr>
            <td>5</td>
            <td>0.1</td>
            <td>0.3</td>
          </tr>
        </tbody>
      </table>
    </table-wrap>
    <p>Table 9. Proportion of injury types by number of passengers in
    passenger vehicles (%). Source: Own elaboration from DGT databases
    2017-2019.</p>
    <table-wrap>
      <table>
        <colgroup>
          <col width="34%" />
          <col width="16%" />
          <col width="16%" />
          <col width="16%" />
          <col width="17%" />
        </colgroup>
        <thead>
          <tr>
            <th></th>
            <th colspan="4">Number of passengers (driver excluded)</th>
          </tr>
        </thead>
        <tbody>
          <tr>
            <td>Injury Type</td>
            <td>1</td>
            <td>2</td>
            <td>3</td>
            <td>4</td>
          </tr>
          <tr>
            <td>None</td>
            <td>36.9</td>
            <td>41.2</td>
            <td>46.7</td>
            <td>45.8</td>
          </tr>
          <tr>
            <td>Slight</td>
            <td>60.5</td>
            <td>56.2</td>
            <td>50.8</td>
            <td>51.0</td>
          </tr>
          <tr>
            <td>Serious</td>
            <td>2.2</td>
            <td>2.1</td>
            <td>2.1</td>
            <td>2.7</td>
          </tr>
          <tr>
            <td>Fatal</td>
            <td>0.5</td>
            <td>0.4</td>
            <td>0.5</td>
            <td>0.5</td>
          </tr>
          <tr>
            <td colspan="5">Pearson’s Chi-squared test P-value &lt; 2.2
            * e<sup>-16</sup></td>
          </tr>
        </tbody>
      </table>
    </table-wrap>
    <p>Rather than solely evaluating the total cost per vehicle for the
    claim, a more nuanced analysis can be conducted. It involves
    estimating the conditional expected cost of the claim based on both
    the driver’s characteristics and those of its expected passengers.
    For instance, if a crash occurred knowing that the vehicle had only
    one passenger alongside the driver, an initial reserve for bodily
    injury could be established based on the expected probabilities of
    severity levels according to the number of passengers (Table 9).
    Alternatively, if the company wanted to estimate the total expected
    costs of bodily injury claims, it could incorporate the expected
    probability that insured drivers will be accompanied by one or more
    passengers (Table 8) and the estimated severity of injuries in each
    case (Table 9). Matching all the tables can be easily done with a
    relational database, where keys are matched to relate passengers or
    all occupants with crash characteristics, enabling data manipulation
    and analysis. The composition inside the car varies and multiple
    combinations may be considered. For instance, the analysis could be
    made by gender of driver and passengers, by different age groups, or
    by geographical area, linking with research presented at previous
    sections.</p>
    <p>The diversity of passenger profiles is easily appreciated in the
    data, as illustrated for example by Tables 10 and 11, based on DGT
    data. In Table 10, the relative frequency of the gender of passenger
    by injury type and gender of the driver can be seen; e.g., of all
    passengers with fatal injuries and a male driver, 46% were men,
    while the remaining 54% were women. In this table, it can be seen
    that women tend to have relatively better outcomes with a woman
    driver except for fatalities, where they represent a smaller
    proportion of all fatalities. Statistical independence between the
    driver-passenger gender pairing and the passenger’s injury type
    using the Pearson’s chi-square test for independence returned a
    p-value &lt;0.01.</p>
    <p>Table 10. Passenger injury by driver gender (%). Source: Own
    elaboration from DGT databases 2017-2019.</p>
    <table-wrap>
      <table>
        <colgroup>
          <col width="14%" />
          <col width="19%" />
          <col width="15%" />
          <col width="17%" />
          <col width="21%" />
          <col width="14%" />
        </colgroup>
        <thead>
          <tr>
            <th colspan="2">Gender</th>
            <th colspan="4">Passenger Injury</th>
          </tr>
        </thead>
        <tbody>
          <tr>
            <td>Driver</td>
            <td>Passenger</td>
            <td>None</td>
            <td>Slight</td>
            <td>Serious</td>
            <td>Fatal</td>
          </tr>
          <tr>
            <td>Man</td>
            <td>Man</td>
            <td>44.1</td>
            <td>35.8</td>
            <td>40.5</td>
            <td>46.0</td>
          </tr>
          <tr>
            <td>Man</td>
            <td>Woman</td>
            <td>55.9</td>
            <td>64.2</td>
            <td>59.5</td>
            <td>54.0</td>
          </tr>
          <tr>
            <td></td>
            <td></td>
            <td></td>
            <td></td>
            <td></td>
            <td></td>
          </tr>
          <tr>
            <td colspan="2">Gender</td>
            <td colspan="4">Passenger Injury</td>
          </tr>
          <tr>
            <td>Driver</td>
            <td>Passenger</td>
            <td>None</td>
            <td>Slight</td>
            <td>Serious</td>
            <td>Fatal</td>
          </tr>
          <tr>
            <td>Woman</td>
            <td>Man</td>
            <td>49.9</td>
            <td>38.3</td>
            <td>43.6</td>
            <td>36.9</td>
          </tr>
          <tr>
            <td>Woman</td>
            <td>Woman</td>
            <td>50.1</td>
            <td>61.7</td>
            <td>56.4</td>
            <td>63.1</td>
          </tr>
          <tr>
            <td></td>
            <td></td>
            <td></td>
            <td></td>
            <td></td>
            <td></td>
          </tr>
        </tbody>
      </table>
    </table-wrap>
    <p>In Table 11, the relative frequency of the pairing of a driver
    and passenger by age and number of passengers in the crashed
    vehicles can be seen. Statistical independence between the
    driver-passenger age pairing and the passenger’s number using the
    Pearson’s chi-square test for independence returned a p-value
    &lt;0.01. For vehicles with 1 passenger, 87% of them were driven by
    a person aged 18-64 years old, while the remaining 13% were
    accompanied by a driver older than 64 years old. Also, the more
    passengers there are in the vehicle, the bigger the proportion of
    drivers aged 18-64 years old: 93.9% for 2 passengers, 94.6% for 3,
    and 96% for vehicles with 4 passengers. Therefore, the first
    suggestion an observer would consider is that most passengers,
    regardless of age, go on the road with drivers younger than 65 years
    old. However, if passengers are grouped by age and the relative
    frequency by driver age is considered, as presented in Table 12, we
    get a different result. Older adult passengers go on the road with
    drivers older than 64 years old in a significant proportion. In
    68.5% of vehicles with 1 passenger and a passenger aged 65 or more
    years, the driver is also and older adult. In vehicles with 2,3, and
    4 passengers, if there is an older adult passenger, the likelihood
    of having an older driver are 35.6%, 43.3%, and 29.2%
    correspondingly.</p>
    <p>Table 11. Relative frequencies (%) of the passenger - driver age
    by number of passengers. Source: Own elaboration from DGT databases
    2017-2019.</p>
    <table-wrap>
      <table>
        <colgroup>
          <col width="17%" />
          <col width="23%" />
          <col width="13%" />
          <col width="15%" />
          <col width="19%" />
          <col width="13%" />
        </colgroup>
        <thead>
          <tr>
            <th></th>
            <th></th>
            <th colspan="4">Passengers</th>
          </tr>
        </thead>
        <tbody>
          <tr>
            <td>Driver age</td>
            <td>Passenger age</td>
            <td>1</td>
            <td>2</td>
            <td>3</td>
            <td>4</td>
          </tr>
          <tr>
            <td>[18, 65)</td>
            <td>[18, 65)</td>
            <td>82.9</td>
            <td>89.2</td>
            <td>91.2</td>
            <td>92.6</td>
          </tr>
          <tr>
            <td>[18, 65)</td>
            <td>[65,)</td>
            <td>4.1</td>
            <td>4.7</td>
            <td>3.4</td>
            <td>3.4</td>
          </tr>
          <tr>
            <td>[65,)</td>
            <td>[18, 65)</td>
            <td>4.1</td>
            <td>3.5</td>
            <td>2.8</td>
            <td>2.6</td>
          </tr>
          <tr>
            <td>[65,)</td>
            <td>[65,)</td>
            <td>8.9</td>
            <td>2.6</td>
            <td>2.6</td>
            <td>1.4</td>
          </tr>
          <tr>
            <td></td>
            <td>Total</td>
            <td>100%</td>
            <td>100%</td>
            <td>100%</td>
            <td>100%</td>
          </tr>
          <tr>
            <td></td>
            <td></td>
            <td></td>
            <td></td>
            <td></td>
            <td></td>
          </tr>
        </tbody>
      </table>
    </table-wrap>
    <table-wrap>
      <table>
        <colgroup>
          <col width="100%" />
        </colgroup>
        <tbody>
        </tbody>
      </table>
    </table-wrap>
    <p>Table 12. Relative frequencies (%) of the passenger - driver age
    by number of passengers and driver age. Source: Own elaboration from
    DGT databases 2017-2019.</p>
    <table-wrap>
      <table>
        <colgroup>
          <col width="17%" />
          <col width="24%" />
          <col width="15%" />
          <col width="15%" />
          <col width="15%" />
          <col width="15%" />
        </colgroup>
        <thead>
          <tr>
            <th></th>
            <th></th>
            <th colspan="4">Passengers</th>
          </tr>
        </thead>
        <tbody>
          <tr>
            <td>Driver age</td>
            <td>Passenger age</td>
            <td>1</td>
            <td>2</td>
            <td>3</td>
            <td>4</td>
          </tr>
          <tr>
            <td>[18, 65)</td>
            <td>[18, 65)</td>
            <td>95.3%</td>
            <td>96.2%</td>
            <td>97.0%</td>
            <td>97.3%</td>
          </tr>
          <tr>
            <td>[65,)</td>
            <td>[18, 65)</td>
            <td>4.7%</td>
            <td>3.8%</td>
            <td>3.0%</td>
            <td>2.7%</td>
          </tr>
          <tr>
            <td></td>
            <td>Total</td>
            <td>100%</td>
            <td>100%</td>
            <td>100%</td>
            <td>100%</td>
          </tr>
          <tr>
            <td>[18, 65)</td>
            <td>[65,)</td>
            <td>31.5%</td>
            <td>64.4%</td>
            <td>56.7%</td>
            <td>70.8%</td>
          </tr>
          <tr>
            <td>[65,)</td>
            <td>[65,)</td>
            <td>68.5%</td>
            <td>35.6%</td>
            <td>43.3%</td>
            <td>29.2%</td>
          </tr>
          <tr>
            <td></td>
            <td>Total</td>
            <td>100%</td>
            <td>100%</td>
            <td>100%</td>
            <td>100%</td>
          </tr>
          <tr>
            <td></td>
            <td></td>
            <td></td>
            <td></td>
            <td></td>
            <td></td>
          </tr>
          <tr>
            <td colspan="6"></td>
          </tr>
        </tbody>
      </table>
    </table-wrap>
    <p>To exemplify how to combine the different databases we are using
    in this paper, we select now passenger cars with 5 or less
    occupants. The insurer knows some drivers’ characteristics, such as
    residence and gender. If the Eurostat methodology is applied, we
    obtain, in Table 13, the following relative frequency of driver
    gender-residence-age combinations.</p>
    <p>Table 13. Relative Frequencies (%) of the gender, residence
    degree of urbanization, and age of driver in this segmentation.
    Source: Own elaboration.</p>
    <table-wrap>
      <table>
        <colgroup>
          <col width="18%" />
          <col width="39%" />
          <col width="18%" />
          <col width="25%" />
        </colgroup>
        <thead>
          <tr>
            <th>Driver Gender</th>
            <th>Residence Urb. Degree</th>
            <th>Age group</th>
            <th>Rel. Freq. (%)</th>
          </tr>
        </thead>
        <tbody>
          <tr>
            <td>Men</td>
            <td>Rural</td>
            <td>[18, 65)</td>
            <td>10.1%</td>
          </tr>
          <tr>
            <td>Men</td>
            <td>Rural</td>
            <td>[65,)</td>
            <td>2.0%</td>
          </tr>
          <tr>
            <td>Men</td>
            <td>Urban</td>
            <td>[18, 65)</td>
            <td>46.2%</td>
          </tr>
          <tr>
            <td>Men</td>
            <td>Urban</td>
            <td>[65,)</td>
            <td>6.6%</td>
          </tr>
          <tr>
            <td>Women</td>
            <td>Rural</td>
            <td>[18, 65)</td>
            <td>6.0%</td>
          </tr>
          <tr>
            <td>Women</td>
            <td>Rural</td>
            <td>[65,)</td>
            <td>0.3%</td>
          </tr>
          <tr>
            <td>Women</td>
            <td>Urban</td>
            <td>[18, 65)</td>
            <td>27.4%</td>
          </tr>
          <tr>
            <td>Women</td>
            <td>Urban</td>
            <td>[65,)</td>
            <td>1.3%</td>
          </tr>
          <tr>
            <td></td>
            <td></td>
            <td></td>
            <td></td>
          </tr>
          <tr>
            <td colspan="4"></td>
          </tr>
        </tbody>
      </table>
    </table-wrap>
    <p>Now, we consider the proportion of occupants by each combination.
    It can be seen that proportions for the different groups seem to be
    different (Table 14).</p>
    <p>Table 14. Proportion of vehicles with a determined number of
    occupants by driver with a combination of gender, residence
    urbanization degree, and age group. Source: Own elaboration.</p>
    <table-wrap>
      <table>
        <colgroup>
          <col width="12%" />
          <col width="20%" />
          <col width="16%" />
          <col width="12%" />
          <col width="9%" />
          <col width="8%" />
          <col width="8%" />
          <col width="8%" />
          <col width="8%" />
        </colgroup>
        <thead>
          <tr>
            <th></th>
            <th></th>
            <th></th>
            <th colspan="5">Number of vehicle’s occupants</th>
            <th></th>
          </tr>
        </thead>
        <tbody>
          <tr>
            <td>Driver Gender</td>
            <td>Residence Urb. Degree</td>
            <td>Age group</td>
            <td>1</td>
            <td>2</td>
            <td>3</td>
            <td>4</td>
            <td>5</td>
            <td>Total</td>
          </tr>
          <tr>
            <td>Men</td>
            <td>Rural</td>
            <td>[18,65)</td>
            <td>67.9%</td>
            <td>20.7%</td>
            <td>6.6%</td>
            <td>3.5%</td>
            <td>1.2%</td>
            <td>100%</td>
          </tr>
          <tr>
            <td>Men</td>
            <td>Rural</td>
            <td>[65,)</td>
            <td>68.4%</td>
            <td>26.9%</td>
            <td>3.3%</td>
            <td>1.1%</td>
            <td>0.3%</td>
            <td>100%</td>
          </tr>
          <tr>
            <td>Men</td>
            <td>Urban</td>
            <td>[18,65)</td>
            <td>72.5%</td>
            <td>17.8%</td>
            <td>5.8%</td>
            <td>3.0%</td>
            <td>0.9%</td>
            <td>100%</td>
          </tr>
          <tr>
            <td>Men</td>
            <td>Urban</td>
            <td>[65,)</td>
            <td>68.3%</td>
            <td>26.1%</td>
            <td>3.6%</td>
            <td>1.6%</td>
            <td>0.3%</td>
            <td>100%</td>
          </tr>
          <tr>
            <td>Women</td>
            <td>Rural</td>
            <td>[18,65)</td>
            <td>72.2%</td>
            <td>18.3%</td>
            <td>6.4%</td>
            <td>2.2%</td>
            <td>0.8%</td>
            <td>100%</td>
          </tr>
          <tr>
            <td>Women</td>
            <td>Rural</td>
            <td>[65,)</td>
            <td>78.0%</td>
            <td>18.2%</td>
            <td>3.1%</td>
            <td>0.4%</td>
            <td>0.2%</td>
            <td>100%</td>
          </tr>
          <tr>
            <td>Women</td>
            <td>Urban</td>
            <td>[18,65)</td>
            <td>77.6%</td>
            <td>14.9%</td>
            <td>4.9%</td>
            <td>2.0%</td>
            <td>0.6%</td>
            <td>100%</td>
          </tr>
          <tr>
            <td>Women</td>
            <td>Urban</td>
            <td>[65,)</td>
            <td>79.0%</td>
            <td>16.2%</td>
            <td>3.1%</td>
            <td>1.4%</td>
            <td>0.3%</td>
            <td>100%</td>
          </tr>
          <tr>
            <td></td>
            <td></td>
            <td>Total</td>
            <td>73.2%</td>
            <td>18.1%</td>
            <td>5.4%</td>
            <td>2.6%</td>
            <td>0.8%</td>
            <td>100%</td>
          </tr>
          <tr>
            <td></td>
            <td></td>
            <td></td>
            <td></td>
            <td></td>
            <td></td>
            <td></td>
            <td></td>
            <td></td>
          </tr>
          <tr>
            <td colspan="9"></td>
          </tr>
        </tbody>
      </table>
    </table-wrap>
    <p>The possibilities that open up in terms of segmenting risks and
    optimizing pricing and reservation processes are evident. Thanks to
    relational models and the combination of databases, risk
    characterization can now be done with a much higher level of
    detail.</p>
  </sec>
  <sec id="conclusion">
    <title>CONCLUSION</title>
    <p>The access to data has exponentially grown in last years and more
    diverse data sources become available to insurers. Techniques to
    link different databases to extract useful information become more
    important in pricing and risk management. As a previous step to
    delving into complex techniques, it is crucial to explore the
    interrelation of data to understand better the behaviour of our
    policyholders. We have observed there are cost-effective uses of
    resources already available to insurance companies, such as
    relational databases, but to leverage them, they must be creative
    and experiment to find useful relationships for them. Relational
    databases serve as a valuable tool in exploring and extending it to
    all areas of insurance companies.</p>
    <p>There are potential applications all over the industry, combining
    in-house data with public available data can give them an edge, as
    evidenced in this paper by using spatial analysis and attributing a
    standard urbanization degree, or considering climatological
    variables such as rain. In both cases, we were able to add variables
    that helped us to grasp better the heterogeneity in our data,
    allowing for more adjusted general analysis and the possibility to
    split the database into more homogeneous segmentations. Also, the
    use of standardized categorizations can help companies to compare
    themselves against industry benchmarks, making comparisons faster
    and easier, by avoiding arbitrary specifications. Let us highlight
    the relevance of these analyzes in the new Sustainability framework,
    where factors such as climate change or population relocation take
    on a leading role.</p>
    <p>Nevertheless, relational databases with only in-house data can be
    advantageous too, as long as companies are creative and curious, as
    illustrated by the study of the passengers’ heterogeneity of the
    same vehicle. In this study we use motor crash data from the DGT but
    similar analysis can be done by insurers with in-house data.
    Depending on the needs of the company, the depth of analysis can
    vary, but we have seen that there is room for conditional analysis,
    that some important variables have known-values beforehand, such as
    gender, age or residence urbanization degree, that can help to
    forecast crashes’ outcomes and estimate its variability based on
    past experience.</p>
  </sec>
  <sec id="acknowledgments">
    <title>ACKNOWLEDGMENTS</title>
    <p>We are grateful to the Dirección General de Tráfico for access to
    their database. We also are grateful to the Spanish Ministry of
    Science and Innovation grant PID2019-105986GB-C21 and to the
    Departament de Recerca i Universitats, del Departament d'Acció
    Climàtica, Alimentació i Agenda Rural i del Fons Climàtic de la
    Generalitat de Catalunya (2023 CLIMA 00012).</p>
    <list list-type="order">
      <list-item>
        <label>8.</label>
        <p>REFERENCES</p>
        <p>Abdel-Aty, M., Lee, J., Siddiqui, C., &amp; Choi, K. (2013).
        Geographical unit-based analysis in the context of
        transportation safety planning. Transportation Research. Part A,
        <italic>Policy and Practice</italic>, 49, 62–75</p>
        <p>AEMET (2024). Valores climatológicos normales, Agencia
        Estatal de Meteorología.
        <ext-link ext-link-type="uri" xlink:href="https://www.aemet.es/es/serviciosclimaticos/datosclimatologicos/valoresclimatologicos">https://www.aemet.es/es/serviciosclimaticos/datosclimatologicos/valoresclimatologicos</ext-link>.
        Accessed June 12 2024</p>
        <p>Alarifi, S. A., Abdel-Aty, M. A., Lee, J., &amp; Park, J.
        (2017). Crash modeling for intersections and segments along
        corridors: A Bayesian multilevel joint model with random
        parameters. <italic>Analytic Methods in Accident
        Research</italic>, 16, 48–59.
        <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.amar.2017.08.002">https://doi.org/10.1016/j.amar.2017.08.002</ext-link></p>
        <p>Ashley, W. S., Strader, S., Dziubla, D. C., &amp; Haberlie,
        A. (2015). Driving blind: Weather-related vision hazards and
        fatal motor vehicle crashes. <italic>Bulletin of the American
        Meteorological Society</italic>, 96(5), 755-778</p>
        <p>Ayuso, M., Guillén, M., &amp; Pérez-Marín, M. (2014) Los
        hábitos de conducción al volante según el género en los seguros
        pay-as-you-drive o usage-based. <italic>Anales del Instituto de
        Actuarios Españoles</italic>, 20, 17-32</p>
        <p>Ayuso, M., Sanchez, R., &amp; Santolino, M. (2019).
        Longevidad de los conductores y antigüedad de los vehículos:
        impacto en la severidad de los accidentes. <italic>Anales del
        Instituto de Actuarios Españoles</italic>, 25, 33-53</p>
        <p>Ayuso, M., Sanchez, R., &amp; Santolino, M. (2020). Does
        longevity impact the severity of traffic crashes? A comparative
        study of young-older and old-older drivers. <italic>Journal of
        Safety Research</italic>, 73, 37-46</p>
        <p>Bil, M., Andrasik, R., &amp; Sedonik, J. (2019). Which curves
        are dangerous? A network-wide analysis of traffic crash and
        infrastructure data. Transportation Research. Part A,
        <italic>Policy and Practice</italic>, 120, 252–260</p>
        <p>Black, A. W., Villarini, G., &amp; Mote, T. L. (2017).
        Effects of Rainfall on Vehicle Crashes in Six U.S. States.
        <italic>Weather, Climate, and Society</italic>, 9(1), 53–70.
        https://doi.org/10.1175/WCAS-D-16-0035.1</p>
        <p>Boucher, J. P., Pérez-Marín, A. M., &amp; Santolino, M.
        (2013). Pay-As-You-Drive insurance: the effect of the kilometers
        on the risk of accident. <italic>Anales del Instituto de
        Actuarios Españoles</italic>, 3ª época, 19, 135-154</p>
        <p>Casado-Sanz, N., Guirao, B., &amp; Gálvez-Pérez, D. (2019).
        Population ageing and rural road accidents: Analysis of accident
        severity in traffic crashes with older pedestrians on Spanish
        crosstown roads. <italic>Research in Transportation Business
        &amp; Management</italic>, 30, 100377.
        https://doi.org/10.1016/j.rtbm.2019.100377</p>
        <p>Cespedes, L. E., Ayuso, M., &amp; Santolino, M. (2024a).
        <italic>Distance between the driver's residence and the motor
        accident: is an insurance risk factor?</italic> UB RISKcenter
        Working paper, in progress</p>
        <p>Cespedes, L. E., Santolino, M., &amp; Ayuso, M. (2024b).
        <italic>Population density in aging societies and severity of
        motor vehicle crash injuries: the case of Spain</italic>.
        (Submitted)</p>
        <p>Chazarra Bernabé, A., Flórez García, E., Peraza Sánchez, B.,
        Tohá Rebull, T., Lorenzo Mariño, B., Criado Pinto, E., Moreno
        García, J.V., Romero Fresneda, R., &amp; Botey Fullat, R.
        (2018). <italic>Mapas climáticos de España (1981-2010) y ETo
        (1996-2016)</italic>. Ministerio para la Transición Ecológica,
        Agencia Estatal de Meteorología, Madrid</p>
        <p>Clark, D. E., &amp; Cushing, B. M. (2004). Rural and urban
        traffic fatalities, vehicle miles, and population density.
        <italic>Accident Analysis and Prevention</italic>, 36(6),
        967–972. https://doi.org/10.1016/j.aap.2003.10.006</p>
        <p>Codd, E. F. (1970). A relational model of data for large
        shared data banks. In <italic>Communications of the
        ACM</italic>, 13(6), 377-387</p>
        <p>Denuit, M., Hainaut, D., &amp; Trufin, J. (2020). Effective
        statistical learning methods for actuaries II : tree-based
        methods and extensions (1st ed. 2020). <italic>Springer
        International Publishing</italic>.
        <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1007/978-3-030-57556-4">https://doi.org/10.1007/978-3-030-57556-4</ext-link></p>
        <p>Eilers, P. H. C., &amp; Marx, B. D. (1996). Flexible
        Smoothing with B-splines and Penalties. <italic>Statistical
        Science</italic>, 11(2):89-121.
        <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1214/ss/1038425655">https://doi.org/10.1214/ss/1038425655</ext-link></p>
        <p>Eisenberg, D. (2004). The mixed effects of precipitation on
        traffic crashes. <italic>Accident Analysis and
        Prevention</italic>, 36(4), 637–647.
        <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/S0001-4575(03)00085-X">https://doi.org/10.1016/S0001-4575(03)00085-X</ext-link></p>
        <p>European Commission (2023). <italic>Road safety in the EU:
        fatalities below pre-pandemic levels but progress remains too
        slow</italic>. Available at:
        <ext-link ext-link-type="uri" xlink:href="https://transport.ec.europa.eu/news-events/news/road-safety-eu-fatalities-below-pre-pandemic-levels-progress-remains-too-slow-2023-02-21_en">https://transport.ec.europa.eu/news-events/news/road-safety-eu-fatalities-below-pre-pandemic-levels-progress-remains-too-slow-2023-02-21_en</ext-link>.
        Accessed 29 February 2024</p>
        <p>Giraud, T. (2022). osrm: Interface Between R and the
        OpenStreetMap-Based Routing Service OSRM. <italic>Journal of
        Open Source Software</italic>, 7(78), 4574.
        doi:10.21105/joss.04574, https://doi.org/10.21105/joss.04574</p>
        <p>Gutiérrez, E., Moral‐Benito, E., Oto‐Peralías, D., &amp;
        Ramos, R. (2023). The spatial distribution of population in
        Spain: An anomaly in European perspective. <italic>Journal of
        Regional Science</italic>, 63(3), 728–750.
        <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1111/jors.12638">https://doi.org/10.1111/jors.12638</ext-link></p>
        <p>Harland, K. K., Greenan, M., &amp; Ramirez, M. (2014). Not
        just a rural occurrence: Differences in agricultural equipment
        crash characteristics by rural–urban crash site and proximity to
        town. <italic>Accident Analysis and Prevention</italic>, 70,
        8–13.
        <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.aap.2014.02.013">https://doi.org/10.1016/j.aap.2014.02.013</ext-link></p>
        <p>Harrington, J. L. (2016). <italic>Relational database design
        and implementation</italic> (Fourth edition). Morgan Kaufmann,
        an imprint of Elsevier</p>
        <p>Hastie, T. J. (1992). <italic>Generalized Additive Models.
        Statistical Models in S</italic> (1st ed.). Routledge.
        <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1201/9780203738535">https://doi.org/10.1201/9780203738535</ext-link></p>
        <p>Haynes, R., Lake, I. R., Kingham, S., Sabel, C. E., Pearce,
        J., &amp; Barnett, R. (2008). The influence of road curvature on
        fatal crashes in New Zealand. <italic>Accident Analysis and
        Prevention</italic>, 40(3), 843–850.
        <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.aap.2007.09.013">https://doi.org/10.1016/j.aap.2007.09.013</ext-link></p>
        <p>Keay, K., &amp; Simmonds, I. (2005). The association of
        rainfall and other weather variables with road traffic volume in
        Melbourne, Australia. <italic>Accident Analysis and
        Prevention</italic>, 37(1), 109–124.
        <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.aap.2004.07.005">https://doi.org/10.1016/j.aap.2004.07.005</ext-link></p>
        <p>Keeves, J., Ekegren, C. L., Beck, B., &amp; Gabbe, B. J.
        (2019). The relationship between geographic location and
        outcomes following injury: A scoping review.
        <italic>Injury</italic>, 50(11), 1826–1838.
        <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.injury.2019.07.013">https://doi.org/10.1016/j.injury.2019.07.013</ext-link></p>
        <p>Kottek, M., Grieser, J., Beck, C., Rudolf, B., &amp; Rubel,
        F. (2006). World Map of the Köppen-Geiger climate classification
        updated. <italic>Meteorologische Zeitschrift</italic>, Vol. 15,
        No. 3, 259-263</p>
        <p>Lee, J., Abdel-Aty, M., &amp; Choi, K. (2014). Analysis of
        residence characteristics of at-fault drivers in traffic
        crashes. <italic>Safety Science</italic>, 68, 6–13.
        <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.ssci.2014.02.019">https://doi.org/10.1016/j.ssci.2014.02.019</ext-link></p>
        <p>Lerner, E. B., Jehle, D. V. K., Billittier, A. J., Moscati,
        R. M., Connery, C. M., &amp; Stiller, G. (2001). The influence
        of demographic factors on seatbelt use by adults injured in
        motor vehicle crashes. <italic>Accident Analysis and
        Prevention</italic>, 33(5), 659–662.
        <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/S0001-4575(00)00080-4">https://doi.org/10.1016/S0001-4575(00)00080-4</ext-link></p>
        <p>Liu, C., &amp; Sharma, A. (2018). Using the multivariate
        spatio-temporal Bayesian model to analyze traffic crashes by
        severity. <italic>Analytic Methods in Accident
        Research</italic>, 17, 14–31.
        <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.amar.2018.02.001">https://doi.org/10.1016/j.amar.2018.02.001</ext-link></p>
        <p>Lord, D., Manar, A., &amp; Vizioli, A. (2005). Modeling
        crash-flow-density and crash-flow-V/C ratio relationships for
        rural and urban freeway segments. <italic>Accident Analysis and
        Prevention</italic>, 37(1), 185–199.
        <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.aap.2004.07.003">https://doi.org/10.1016/j.aap.2004.07.003</ext-link></p>
        <p>Moeinaddini, M., Asadi-Shekari, Z., &amp; Zaly Shah, M.
        (2014). The relationship between urban street networks and the
        number of transport fatalities at the city level. <italic>Safety
        Science</italic>, 62, 114–120.
        <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.ssci.2013.08.015">https://doi.org/10.1016/j.ssci.2013.08.015</ext-link></p>
        <p>Naik, B., Tung, L. W., Zhao, S., &amp; Khattak, A. J. (2016).
        Weather impacts on single-vehicle truck crash injury severity.
        <italic>Journal of Safety Research</italic>, 58, 57–65.
        <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.jsr.2016.06.005">https://doi.org/10.1016/j.jsr.2016.06.005</ext-link></p>
        <p>OECD (2021). <italic>Applying the Degree of Urbanisation A
        Methodological Manual to Define Cities, Towns and Rural Areas
        for International Comparisons</italic>. 2021 edn.OECD
        Publishing. Paris.</p>
        <p>Peura, C., Kilch, J. A., &amp; Clark, D. E. (2015).
        Evaluating adverse rural crash outcomes using the NHTSA State
        Data System. <italic>Accident Analysis and Prevention</italic>,
        82, 257–262.
        <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.aap.2015.06.005">https://doi.org/10.1016/j.aap.2015.06.005</ext-link></p>
        <p>Pljakić, M., Jovanović, D., &amp; Matović, B. (2022). The
        influence of traffic-infrastructure factors on pedestrian
        accidents at the macro-level: The geographically weighted
        regression approach. <italic>Journal of Safety
        Research</italic>, 83, 248–259.
        <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.jsr.2022.08.021">https://doi.org/10.1016/j.jsr.2022.08.021</ext-link></p>
        <p>Prato, C. G., Kaplan, S., Patrier, A., &amp; Rasmussen, T. K.
        (2018). Considering built environment and spatial correlation in
        modelling pedestrian injury severity. <italic>Traffic Injury
        Prevention</italic>, 19(1), 88-93 (2018).
        <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1080/15389588.2017.1329535">https://doi.org/10.1080/15389588.2017.1329535</ext-link></p>
        <p>R Core Team (2024). R: <italic>A language and environment for
        statistical computing</italic>. R Foundation for Statistical
        Computing, Vienna, Austria.
        <ext-link ext-link-type="uri" xlink:href="https://www.R-project.org/">https://www.R-project.org/</ext-link></p>
        <p>Wang, C., Quddus, M. A., &amp; Ison, S. G. (2013). The effect
        of traffic and road characteristics on road safety: A review and
        future research direction. <italic>Safety Science</italic>, 57,
        264–275.
        <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.ssci.2013.02.012">https://doi.org/10.1016/j.ssci.2013.02.012</ext-link></p>
        <p>Watt, A., &amp; Eng, N. (2014). <italic>Database
        Design</italic>. 2nd edn. BCcampus, British Columbia</p>
        <p>Wijnen, W., Weijermars, W., Schoeters, A., van den Berghe,
        W., Bauer, R., Carnis, L., Elvik, R., &amp; Martensen, H.
        (2019). An analysis of official road crash cost estimates in
        European countries. <italic>Safety Science</italic>, 113,
        318–327.
        <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.ssci.2018.12.004">https://doi.org/10.1016/j.ssci.2018.12.004</ext-link></p>
        <p>Wüthrich, M. V., &amp; Merz, M. (2023). <italic>Statistical
        Foundations of Actuarial Learning and its Applications</italic>
        (1st ed. 2023.). Springer Nature.
        <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1007/978-3-031-12409-9">https://doi.org/10.1007/978-3-031-12409-9</ext-link></p>
        <p>Xie, K., Yang, D., Ozbay, K., &amp; Yang, H. (2019). Use of
        real-world connected vehicle data in identifying high-risk
        locations based on a new surrogate safety measure.
        <italic>Accident Analysis and Prevention</italic>, 125, 311–319.
        <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.aap.2018.07.002">https://doi.org/10.1016/j.aap.2018.07.002</ext-link></p>
        <p>Zheng, L., Sayed, T., &amp; Essa, M. (2019). Validating the
        bivariate extreme value modeling approach for road safety
        estimation with different traffic conflict indicators.
        <italic>Accident Analysis and Prevention</italic>, 123, 314–323.
        <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.aap.2018.12.007">https://doi.org/10.1016/j.aap.2018.12.007</ext-link></p>
        <p>Zheng, L., Sayed, T., &amp; Mannering, F. (2021). Modeling
        traffic conflicts for use in road safety analysis: A review of
        analytic methods and future directions. <italic>Analytic Methods
        in Accident Research</italic>, 29, 100142-.
        <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.amar.2020.100142">https://doi.org/10.1016/j.amar.2020.100142</ext-link></p>
        <p>Ziakopoulos, A. (2024). Analysis of harsh braking and harsh
        acceleration occurrence via explainable imbalanced machine
        learning using high-resolution smartphone telematics and traffic
        data. <italic>Accident Analysis and Prevention</italic>, 207,
        107743-.
        <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.aap.2024.107743">https://doi.org/10.1016/j.aap.2024.107743</ext-link></p>
        <p>Ziakopoulos, A., &amp; Yannis, G. (2020). A review of spatial
        approaches in road safety. <italic>Accident Analysis and
        Prevention</italic>, 135, 105323–105323.
        <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.aap.2019.105323">https://doi.org/10.1016/j.aap.2019.105323</ext-link></p>
      </list-item>
      <list-item>
        <label>9.</label>
        <p>APPENDIX</p>
      </list-item>
    </list>
    <p>R Software was used in the applications (R Core Team, 2024). R
    code used to plot Figures 3 and 4 is provided below.</p>
    <disp-quote>
      <preformat># Download georeferenced data.
# AEMET (2024), Normal climatological values, Agencia Estatal de Meteorología https://www.aemet.es/es/serviciosclimaticos/datosclimatologicos/valoresclimatologicos [Access: June 12, 2024].

# Description: download of all climatic variables for Peninsula and Balearic Islands, and Canary Islands is available.
# SGR: EPSG:4326 (WGS84 - World Geodetic System 1984).
# Download unit: each raster image corresponds to a variable and period (monthly, annual or seasonal).
# Format: GeoTIFF (.tif)
# Additional information: the &quot;SCALE&quot; field provides the correspondence between the RGBA color scale and the range of values of the meteorological variable.

# Chazarra Bernabé, A., Flórez García, E., Peraza Sánchez, B., Tohá Rebull, T.,  Lorenzo Mariño, B., Criado Pinto, E., Moreno García, J.V., Romero Fresneda, R. y Botey Fullat, R. (2018) Mapas climáticos de España (1981-2010) y ETo (1996-2016) Ministerio para la Transición Ecológica, Agencia Estatal de Meteorología, Madrid
# Allows to identify file name with climatological variable and image of peninsula or canary islands. </preformat>
      <preformat>getRversion()</preformat>
      <preformat>## [1] '4.3.0'</preformat>
      <preformat># Packages
library(raster) #read tif file</preformat>
      <preformat>library(rgdal) #from source file (tar.gz). Used to obtain scale information</preformat>
      <preformat>library(ggplot2)</preformat>
      <preformat>library(grDevices) #convert RGB (red/green/blue) values in HSV (hue/saturation/value). </preformat>
    </disp-quote>
    <p>To read the raster file and consult contained information.</p>
    <disp-quote>
      <preformat># Working directory
setwd(&quot;~\\anales\\clima\\descarga_clima&quot;)

# Rain precipitations in Canary Islands

# Information of the scale and other infromation
rgdal::GDALinfo(&quot;down_vn8110pxdmww13c_20170512.tif&quot;) </preformat>
      <preformat>Canarias &lt;- stack(&quot;down_vn8110pxdmww13c_20170512.tif&quot;) #create a multi-layer raster object
df &lt;- as.data.frame(Canarias, xy= TRUE) #data frame includes longitude, latitude and RGBA information
names(df)&lt;-c(&quot;long&quot;,&quot;lat&quot;,&quot;R&quot;,&quot;G&quot;,&quot;B&quot;,&quot;A&quot;) #change name</preformat>
    </disp-quote>
    <p>To plot Figure 5.</p>
    <disp-quote>
      <preformat>dfclean&lt;-df[df$A&gt;100,] #values less than 100 are removed (cells indicating the sea)

Fig3&lt;-ggplot(data = dfclean, aes(x = long, y =lat))+                   
  geom_raster(aes(fill=rgb(R,G,B, maxColorValue = 255))) +
  scale_fill_identity()+xlab(&quot;Longitude&quot;)+ylab(&quot;Latitude&quot;)</preformat>
    </disp-quote>
    <p>To convert RGBA values in numerical values of the meteorological
    variable of interest.</p>
    <disp-quote>
      <preformat># Scale in RGBA space

escala&lt;-matrix(c(
255, 210, 255, 255,
255, 138, 255, 255,
162, 23, 253, 255,
0, 106, 213, 255,
41, 145, 248, 255,
130, 191, 253, 255,
128, 255, 255, 255,
128, 255, 72, 255,
201, 253, 130, 255,
255, 255, 164, 255), ncol=4, byrow=T)

# Convert to HSV space
escalhsv&lt;-t(rgb2hsv(r=escala[,1],g=escala[,2],b=escala[,3])) 

# Function that assigns discrete numerical value of the climatic variable to the HSV color value.

Meteo&lt;-function(A){
C&lt;-t(rgb2hsv(A[,1], A[,2] , A[,3])) 
valor&lt;-cbind(C,NA)
colnames(valor)[4]&lt;-&quot;val&quot;

valor&lt;-as.matrix(as.data.frame(valor))
cond1&lt;-I(valor[,1]&lt;=0.10)
valor[(cond1==T),4]&lt;-1

cond2&lt;-I(valor[,1]&gt;0.10 &amp; valor[,1]&lt;=0.22)
valor[(cond2==T),4]&lt;-25

cond3&lt;-I(valor[,1]&gt;0.22 &amp; valor[,1]&lt;=0.28)
valor[(cond3==T),4]&lt;-35

cond4&lt;-I(valor[,1]&gt;0.28 &amp; valor[,1]&lt;=0.49)
valor[(cond4==T),4]&lt;-45

cond5&lt;-I(valor[,1]&gt;0.49 &amp; valor[,1]&lt;=0.55)
valor[(cond5==T),4]&lt;-55

cond6&lt;-I(valor[,1]&gt;0.55 &amp; valor[,1]&lt;=0.65)
cond6A&lt;-I(valor[,2]&gt;0.83)
cond6B&lt;-I(valor[,2]&gt;0.48 &amp; valor[,2]&lt;=0.83)
cond6C&lt;-I(valor[,2]&lt;=0.48)
valor[(cond6==T)&amp;(cond6A==T) ,4]&lt;-85
valor[(cond6==T)&amp;(cond6B==T),4]&lt;-75
valor[(cond6==T)&amp;(cond6C==T),4]&lt;-65

cond7&lt;-I(valor[,1]&gt;0.65 &amp; valor[,1]&lt;=0.76)
valor[(cond7==T),4]&lt;-90

cond8&lt;-I(valor[,1]&gt;0.76 &amp; valor[,1]&lt;=0.83)
valor[(cond8==T),4]&lt;-110

cond9&lt;-I(valor[,1]&gt;0.83 &amp; valor[,1]&lt;=1)
cond9A&lt;-I(valor[,2]&lt;=45)
cond9B&lt;-I(valor[,2]&gt;45)
valor[(cond9==T)&amp;(cond9A==T),4]&lt;-150
valor[(cond9==T)&amp;(cond9B==T),4]&lt;-130
valor[valor[,3]&lt;0.4,4]&lt;-NA  # Values of v close to zero (black) are considered missing
return(valor)
}</preformat>
    </disp-quote>
    <p>To plot Figure 6.</p>
    <disp-quote>
      <p><monospace>A</monospace><monospace>&lt;-</monospace><monospace>cbind</monospace><monospace>(</monospace><monospace>dfclean</monospace><monospace>[, </monospace><monospace>3</monospace><monospace>:</monospace><monospace>5</monospace><monospace>]) </monospace><monospace># matrix with color values </monospace>
      <monospace>lluvia</monospace><monospace>&lt;-</monospace><monospace>Meteo</monospace><monospace>(A) </monospace>
      <monospace>dfclean</monospace><monospace>$</monospace><monospace>Lluvia</monospace><monospace>&lt;-</monospace><monospace>lluvia</monospace><monospace>[,</monospace><monospace>4</monospace><monospace>]</monospace><monospace> </monospace><monospace># numerical value</monospace>
      <monospace>Fig4</monospace><monospace>&lt;-</monospace><monospace> </monospace><monospace>ggplot</monospace><monospace>(</monospace><monospace>data =</monospace><monospace> </monospace><monospace>dfclean</monospace><monospace>, </monospace><monospace>aes</monospace><monospace>(</monospace><monospace>x =</monospace><monospace> long, </monospace><monospace>y =</monospace><monospace>lat</monospace><monospace>))</monospace><monospace>+</monospace><monospace>                   </monospace>
      <monospace>  </monospace><monospace>geom_raster</monospace><monospace>(</monospace><monospace>aes</monospace><monospace>(</monospace><monospace>fill=</monospace><monospace>Lluvia), </monospace><monospace>show.legend</monospace><monospace> =</monospace><monospace> T) </monospace><monospace>+</monospace>
      <monospace>  </monospace><monospace>scale_fill_identity</monospace><monospace>()</monospace><monospace>+</monospace><monospace>xlab</monospace><monospace>(</monospace><monospace>&quot;Longitude&quot;</monospace><monospace>)</monospace><monospace>+</monospace><monospace>ylab</monospace><monospace>(</monospace><monospace>&quot;Latitude&quot;</monospace><monospace>)</monospace><monospace>+</monospace><monospace> </monospace><monospace>scale_fill_gradientn</monospace><monospace>(</monospace><monospace>colours</monospace><monospace> =</monospace><monospace> </monospace><monospace>terrain.colors</monospace><monospace>(</monospace><monospace>3</monospace><monospace>))</monospace><monospace>+</monospace><monospace> </monospace><monospace>labs</monospace><monospace>(</monospace><monospace>fill =</monospace><monospace> </monospace><monospace>&quot;Rainfall&quot;</monospace><monospace>)</monospace></p>
    </disp-quote>
  </sec>
</sec>
</body>
<back>
<fn-group>
  <fn id="fn1">
    <label>1</label><p>See Ayuso et al. (2019, 2020) for applications
    using the same bodily injury severity classification.</p>
  </fn>
</fn-group>
</back>
</article>
