xml 和csv的区别_使用CSV和XML导入方法来填充，更新和增强您的InfoSphere Business Glossary内容

InfoSphere Business Glossary（业务词汇表）使您可以使用受控词汇表来创建，管理和共享业务和组织概念的标准定义。 Business Glossary产品使用类别层次结构，其中类别包含术语。您可以根据组织的需要使用术语对元数据存储库中的数据资产进行分类。使用数据填充业务词汇表是使用它的第一步。在版本8.1.1中，业务词汇表引入了一些使用CSV和XML格式的新导入...

cuyi7076

4266人浏览 · 2020-06-23 08:52:22

cuyi7076 · 2020-06-23 08:52:22 发布

xml 和csv的区别

InfoSphere Business Glossary（业务词汇表）使您可以使用受控词汇表来创建，管理和共享业务和组织概念的标准定义。 Business Glossary产品使用类别层次结构，其中类别包含术语。您可以根据组织的需要使用术语对元数据存储库中的数据资产进行分类。

使用数据填充业务词汇表是使用它的第一步。在版本8.1.1中，业务词汇表引入了一些使用CSV和XML格式的新导入和导出方法，以便词汇表管理员可以在这些类型的外部文件之间导入和导出词汇表内容。这些方法在创建词汇表内容时引入了更大的灵活性，使用户可以更轻松，更全面地填充其业务词汇表。

本教程描述并说明了如何使用这些新的导入和导出功能。本教程包括最佳实践，技巧和示例，以帮助您有效地实现这些功能以填充业务词汇表。

本教程中的示例使用来自IBM行业模型电信业务词汇表内容包的数据。 IBM为各种行业提供InfoSphere Business Glossary内容包，包括银行，保险，电信，零售和医疗保健。

选择导入方法

CSV和XML是常见的导入方法。根据多种因素来决定使用哪种导入方法来填充业务词汇表，包括现有词汇表的内容和业务词汇表管理人员的技能水平。

CSV

CSV格式很简单。它具有包含类别和术语及其属性值（例如描述，缩写和自定义属性值）的功能。它还使您能够定义管家关系。

XML格式

XML格式更全面，更复杂。它具有定义术语，类别和其他对象类型（包括与其他术语有关的术语，与术语有关的类别，链接到已分配资产的术语）之间的每种可能关系的功能。

您可能已经具有某种格式的词汇表版本，并且想要使用该内容来开始填充新的业务词汇表。或者，您可以选择从头开始构建业务词汇表。因为这些方案从不同的角度出发，所以推荐的导入方法可能有所不同。

如果数据在电子表格中，则可能应将现有电子表格转换为Business Glossary CSV格式。如果从头开始，请根据需要导入的数据类型和技术技能水平选择导入方法。

表1和表2描述了CSV文件和XML文件中可以包含的值。

表1. CSV和XML文件中可用于类别的值

类别属性	CSV文件	XML文件
名字	是的，如果仅针对新类别添加；现有术语或类别的名称不能更改。	是
详细说明	是	是
简短说明	是	是
子类别	是	是
父类别	是的，如果仅针对新类别添加；现有术语或类别的父类别不能更改。	是
参考条款	没有	是
包含的条款	是	是
自定义属性	没有	是
自定义属性值	是，如果目标元数据存储库中已经存在定制属性。如果自定义属性不存在，则导入失败。	是
链接到管家	是的，如果目标元数据存储库中已经存在管理者。如果管理者不存在，则导入失败。	是，如果目标元数据存储库中已经存在该用户或用户组。如果用户或用户组不存在，则不会创建管家关系，但是会成功导入其他内容。

表2. CSV和XML文件中可用于术语的值

期限财产	CSV文件	XML文件
名字	是的，如果仅是为一个新术语添加的话；现有术语或类别的名称不能更改。	是
详细说明	是	是
简短说明	是	是
缩略语	是	是
用法	是	是
例子	是	是
状态	是	是
相关条款	没有	是
属性IsModifier	是	是
属性类型	是	是
同义字	没有	是
包含（父）类别	是的，如果仅是为一个新术语添加的话；现有术语或类别的父类别不能更改。	是
自定义属性	没有	是
自定义属性值	是，如果目标元数据存储库中已经存在定制属性。如果自定义属性不存在，则导入失败。	是
链接到管家	是的，如果目标元数据存储库中已经存在管理者。如果管理者不存在，则导入失败。	是，如果目标元数据存储库中已经存在该用户或用户组。如果用户或用户组不存在，则不会创建管家关系，但是会成功导入其他内容。
链接到分配的资产（例如列，作业，表）	没有	是，如果资产已存在于目标元数据存储库中。如果资产不存在，则不会创建关系，但是会成功导入其他内容。
引用已分配的外部资产（例如业务流程模型或Web服务）	没有	是

注意：如果需要将业务词汇表内容从版本8.1.1之前的版本传输到较新的业务词汇表实例，则唯一可用的方法是词汇表存档的导入和导出（格式为XMI），可用于传输全部或部分内容。词汇表实例之间的词汇表数据，而无需编辑其内容。在Business Glossary版本8.1中，如果您只想转移具有基本属性的类别和术语，则还可以使用CSV导入和导出。

使用CSV格式导入

创建业务词汇表逗号分隔值（CSV）格式是为了提供一种将基本业务词汇表数据导入元数据存储库的简便方法。表1和2描述了可以包含在CSV文件中的类别和术语的属性。

注意：此CSV导入与Metabrokers和网桥类别和术语CSV导入（与Business Glossary的8.0版本一起发布）不同。它们是两个独立的应用程序，它们具有不同的格式。

可从InfoSphere Information Server Web控制台内的“词汇表”选项卡访问Business Glossary CSV导入功能，如图1所示。

图1. InfoSphere Information Server Web控制台词汇表“导入CSV”页面

屏幕图：术语表选项卡突出显示，窗格上显示“导入CSV”选项卡，突出显示示例CSV文件

完成以下步骤以导入CSV文件：

单击词汇表选项卡。
点击左侧“导入和导出”部分中的导入CSV 。
点击浏览。
找到CSV文件，然后点击导入。

要开始创建CSV文件，请单击“ 下载示例CSV文件”链接，如图1所示，以获取一个简单的CSV文件，您可以在其中输入测试数据。该文件中包含一些示例类别和术语，并且包含说明，您可以输入CSV文件中的数据类型。

使用CSV模板

编写CSV文件的两种最佳做法是从上述示例文件开始，或者从其中包含一些初始数据的业务词汇表中导出CSV文件。从样本文件或现有词汇表内容开始的关键原因是要确保文件包含所有必需的行，因为如果缺少任何这些行，导入将失败。从导出现有内容开始的一个优点是文件将包含现有的自定义属性作为可导入属性。您可以选择使用CSV文件导入类别和/或术语。

导入类别

如果您只想导入类别而不导入任何术语，则只需在CSV文件中具有“类别”部分，如图2所示。

图2. CSV中定义的类别，未定义术语

在图2中，显示了CSV文件中的两个类别。文件中的第一类称为“ 业务概念” 。它没有定义父类别，这意味着在导入时，该类别将成为顶级类别。第二个类别称为Location ，它的类别Business Concepts定义为其父类别。图2还包含一个术语部分，但此部分中尚未定义任何术语，这是允许的。

导入此CSV文件时，将创建两个类别：一个名为Business Concepts的顶级类别，以及一个名为Location的Business Concepts子类别。

汇入条款

如果只想导入术语而不是类别，则只需在CSV文件中包含术语部分。您还可以在CSV中保留一个空的类别部分。

图3.在CSV中定义的术语，未定义类别

最好定义一个术语的父类别，而不要留空。如果该术语的父类别保留为空白，则会创建其下方的术语。但是在导入期间自动生成的类别中，类别名称将无法使用，例如Uncategorized_1273649004500 ，如图4所示。

图4.没有父类别定义的术语

指定文件格式

CSV文件的文件扩展名为.csv 。如果您的文本编辑器没有.csv扩展名选项，则可以指定.txt ，然后再编辑扩展名。如果您的文本编辑器的文件格式类型列表中没有.csv，请键入.csv ，然后使用该扩展名保存文件。

如果您使用的是Microsoft®Excel，请确保将文件另存为CSV（逗号分隔）（* .csv），而不是其他任何CSV类型或标准.xls类型，如图5所示。

图5.另存为.csv时使用的Microsoft Excel CSV格式

指派管理员

管理员是企业主或主题专家。可以将管理者分配给术语或类别。请注意，导入之前，管理者需要在存储库中存在，否则导入将失败。您可以通过在“管理者”列中键入管理者的用户名，将管理者分配给CSV文件中的类别或术语，如图6所示。

注意：管理员可以是用户或用户组。

图6.在CSV文件中为类别和术语分配管理员

屏幕截图：包含导入术语和导入类别的CSV，每个字段都突出显示了Steward robertj

用户名与管理者的名字，姓氏或用户组名不同。该值必须是用户名的正确值，否则导入将失败。

您可以通过查看管理者来在InfoSphere Information Server Web控制台中找到用户或用户组的用户名值，如图7所示。

图7. InfoSphere Information Server Web控制台中的Stewards页面

在此控制台中，此用户名值是您需要在管理员的CSV文件中输入的值。

创建自定义属性

定制属性是您可以创建的用于扩展标准词汇表模板的类别和术语的属性，例如名称，简短描述和冗长描述。表1列出了可用属性的完整列表。

您可以在Information Server Web控制台的InfoSphere Business Glossary管理界面中定义自定义属性，如图8所示。

图8. InfoSphere Information Server Web控制台中的“ Custom Attributes”页面

屏幕截图：“词汇”选项卡，“定制属性”选项卡显示了4个属性中每个属性的“术语”或“类别”

可以将每个自定义属性定义为用于类别或术语。每个自定义属性可以是String类型或Enumerated类型。如果属性的类型为枚举，则在定义自定义属性时需要定义可能的枚举值。

在CSV文件中，可以通过在最后一个必填列之后向文件中添加更多列，来为类别和术语自定义属性分配自定义属性值。每个自定义属性及其关联的值必须位于单独的列中。

在图9中，在Categories部分中，CSV文件中有两个新列，它们与名为Industry和Priority的自定义属性相对应。在“条款”部分中，有两个新列与名为ID和Position的自定义属性相对应。

图9.为CSV中的类别和术语分配定制属性值

屏幕截图：包含电信行业和类别中优先级的CSV，以及BCLCAD和BCLCAT作为术语的位置2

导入此CSV文件时，将创建两个类别和两个术语。类别和术语将在导入时为其定义自定义属性值。

必须先在信息服务器中定义自定义属性，然后才能导入它们的任何值，否则导入将失败。例如，如果您尝试导入图9中的CSV文件，并且未在Information Server中为类别定义自定义属性Industry，则导入将失败，并显示一条错误消息，说明先前未定义自定义属性Industry。

注意：如果自定义属性的类型为Enumerated，则在CSV中输入的值必须是为该枚举列表定义的有效值之一。如果在CSV中定义了无效值，则导入将失败，并显示一条错误消息，说明特定值无效，并列出有效值。

探索区分大小写并在CSV中合并

Information Server中的类别，术语和管理者名称区分大小写。这意味着类别Location与类别location ，术语地址与术语地址不同，并且管家ROBERTJ与管家robertj不同。如果您在CSV中为这些值使用了错误的大小写，则视大小写而定，导入将失败或无法获得预期的结果。如果大小写一致，则将类别或术语标识为存储库中已存在，并且将对该类别或术语进行值合并。

以下各节介绍了合并方案及其结果。

类别或术语的名称

如果类别或术语的名称值与现有类别或术语的大小写不同，则将创建新的类别或术语，而不是更新现有的类别或术语。以下是基本方案及其结果：

表3.区分大小写和CSV合并的基本方案

情境	CSV	导入之前的存储库	导入后的存储库
大小写不同的类别名称	类别：商业概念	类别：经营理念	类别：业务概念+业务概念
大写字母类别中具有不同大写字母的术语名称	术语：“业务概念” >>“类别”中的地址	术语：“业务概念” >>“类别”中的地址	条款：地址+地址类别“业务概念” >>“地址”
大小写相同的类别名称	类别：商业概念	类别：商业概念	类别：业务概念（类别“业务概念”的CSV中的值将覆盖该类别的存储库中存在的所有值）
大小写相同类别名称中大小写相同的术语名称	术语：“业务概念” >>“类别”中的地址	术语：“业务概念” >>“类别”中的地址	术语：“业务概念”>“位置”类别中的地址（术语“地址”的CSV中的值将覆盖该术语在存储库中存在的所有值。如果CSV对于某个属性没有值，并且存储库具有用于该属性，则该值将保留；并且在导入时不会删除值。

父类别

如果CSV中某个类别或术语的父类别的大小写与现有父类别的大写不同，则不会在导入时建立连接，如表4所示。

表4.区分大小写和合并CSV中的父类别的基本方案

情境	CSV	导入之前的存储库	导入后的存储库
大小写不同的类别的父类别名称	类别：地点父类别：经营理念	类别：地点父类别：商业概念	分类：业务概念>>位置+业务概念>>位置
术语的父类别名称具有不同的大小写	期限：定义为经营理念父类>>地址位置（如果CSV 没有 >>地点定义的类别经营理念，预计它存在于存储库）	所属分类：经营理念	导入失败，并显示一条错误消息，指出无法导入术语“地址”，因为找不到其父类别“业务概念>>位置”。

管家

如果某个类别或术语在CSV中将管理者定义为robertj，并且存储库中有一个使用用户名ROBERTJ的管理者，而没有使用用户名robertj的管理者，则导入将失败，并显示一条消息，提示未找到该管理者，如表5所示。

表5.区分大小写和合并的托管方案

情境	CSV	导入之前的存储库	导入后的存储库
大小写不同的类别的管家名称	类别：与管家robertj在一起的地点	类别：与管家ROBERTJ在一起的地点	导入失败，并显示一条错误消息，指出无法导入类别位置，因为找不到其管家robertj。
术语的管理员名称（大小写不同）	任期：与管家罗伯特（Robertj）联系	任期：与管家罗伯特（Robertj）	导入失败，并显示一条错误消息，指出无法导入术语“地址”，因为找不到它的管理员robertj。

使用XML格式导入

从8.1.1版开始，新的Business Glossary XML文件导入功能可以导入类别或术语的每种类型的值和关系。因为它支持每种类型的值，所以编写起来比CSV文件要复杂得多。但是，创建此XML的模式的设计是为了使其尽可能易于阅读和理解，以便用户可以实际对其进行编辑。

XML导入过程允许合并数据。 XML文件中的对象可以在导入时连接到存储库中已有的对象，包括对技术资产的引用。

从Information Server Web控制台内的Glossary选项卡访问InfoSphere Business Glossary XML导入，如图10所示。

图10. Information Server Web控制台“词汇表导入XML”页面

屏幕截图：“词汇表”选项卡，“使用文件导入XML”是BG 8.1 XML格式，突出显示了“下载参考文件”

完成以下步骤以导入XML文件：

单击词汇表选项卡。
在左侧的“导入和导出”部分中，单击“ 导入XML” 。
单击浏览，然后选择一个文件。
选择四种合并方法之一。合并选项将在后面详细讨论。
单击导入以完成导入。

您可以单击“ 下载示例XML文件”链接来查看示例XML文件，该示例XML文件包含XML可以包含的许多不同类型的数据，而不是浏览到其中一个文件。您也可以下载XML模式。

在8.1.1及更高版本中仍支持用于上载Business Glossary 8.0.1和8.1中的类别和术语的XML格式。为了在8.1.1或更高版本中使用较早的XML格式，您需要选中File is Business Glossary 8.1或更早的XML格式复选框，如图10所示。因为8.1.1 XML导入中的所有选项都与较早的XML导入无关，所以当选中此复选框时，将不会显示这些选项。

注意：旧的XML格式与新的XML格式具有不同的架构。较新的XML格式（8.1.1及更高版本）比旧的XML格式要全面得多，并且可以合并数据。编写新的业务词汇表XML文件时，请使用新格式。

导入XML模式

Business Glossary 8.1.1 XML模式是根据XML标准设计的，其中XML属性定义对象属性，而XML元素定义对象关系。但是，在设计此XML模式时，与标准存在偏差：类别和术语不是层次结构。即使术语包含在类别中，类别和术语也被定义为同一层次级别上的单独元素。这种设计可以容纳8.1.1 XML导入支持部分导入。例如，您可以导入不带类别的术语。部分导入将在后面的部分中详细介绍。

XML模式旨在防止用户在编写XML文件时犯错误。有许多双向的词汇表关系，其中定义可以在两个对象中的任何一个上。如果未正确编写XML数据，则可能会导致导入错误。在大多数情况下，模式仅允许这些对象之一定义关系。具体情况在本教程的后面部分中介绍。

为了帮助您编写XML文件，请检查架构以查看可以添加的不同元素和属性，并了解每种类型的要求。

为两个类别和一个术语编写XML

清单1显示了一个简单的示例，该示例说明了两个类别和一个包含的术语的导入。

清单1. XML中的两个基本类别和一个术语

<categories>
    <category name="Business Concepts"/>
    <category name="Location">
        <parentCategory identity="Business Concepts"/>
    </category>
</categories>
<terms>
    <term name="Address">
        <parentCategory identity="Business Concepts::Location"/>
    </term>
</terms>

在清单1中，类别Business Concepts没有父类别，因此它是根类别或顶级类别。

业务词汇表中的类别和术语可以通过名称和父类别来唯一标识。在XML模式中，当类别或术语引用另一个对象（例如父类别）时，可以使用identity属性来保存该引用对象的唯一标识的值。

对于类别或术语的parentCategory标签，XML中的标识值包含其父类别的完整路径，该路径从顶层类别开始，一直到直接父类别。路径中的类别用双冒号（：:)分隔。

在清单1中，类别Location包含在根父类别Business Concepts中，因此Location的父类别的标识值为Business Concepts。术语“地址”包含在“位置”类别中，而“位置”本身也包含在“业务概念”类别中。因此，术语“地址”的父类别的标识值为Business Concepts :: Location。

导入两个类别和一个术语的XML

导入XML时，导入服务将查看每个标识值，以查看是否可以找到具有该标识的对象的匹配项。该服务在XML文件和目标存储库中查找此潜在的引用对象。如果找到匹配项，则建立参考连接。如果找不到匹配项，则有两种可能的结果：

进行引用的对象将添加到目标存储库。如果所引用的对象对于新对象的存在不是必需的，则添加新对象而不添加该引用。
例如，一个类别可以不存在父类别而存在。因此，如果XML用父类别定义了一个新类别，但是该父类别在XML或存储库中不存在，则在导入时将新类别创建为顶级类别。在XML中为其定义的父类别将被忽略。

一个术语可以不带参考术语而存在。因此，如果XML用引用的术语定义了一个术语，但是该引用的术语在XML或存储库中不存在，则在导入时会创建新的术语，但是它没有引用的术语。
进行引用的对象不会添加到目标存储库。如果所引用的对象是新对象必须存在的，则新对象不会添加到资源库中。
例如，如果XML用父类别定义了一个新术语，但是在XML或存储库中不存在该父类别，则在导入时根本不会创建新术语，因为一个术语需要父类别存在。

导入结果示例

本节描述了这些导入结果概念的一些示例。

清单2添加了一个新术语。

清单2.添加一个新术语

<term name="Street">
    <parentCategory identity="Customer"/>
    <replacedByTerm identity="Business Concepts::Location::Address"/>
</term>

清单2包含术语Street。在此示例中，标识为Customer的Street父类别在XML或目标存储库中都不存在。导入时，不会创建术语Street，因为它的父类别是必需的，并且在XML或存储库中都找不到。

在下一个示例中，父类别Customer确实存在于目标存储库中，但XML中或目标存储库中均不存在设置为术语Street的replaceByTerm值的术语Address。请注意，此术语的标识Business Concepts :: Location :: Address包括其类别路径和术语本身，并且潜在的引用在两个术语之间。在这种情况下，在导入时，将创建术语Street并具有其父类别Customer。但是由于在XML或目标存储库中不存在术语“地址”，因此不会创建Street和Address之间的replaceByTerm关系。

由此可以得出结论，如果Business Concepts :: Location和Business Concepts :: Location :: Address都存在于XML或目标存储库中，那么在导入时，术语Street将被添加到目标存储库及其将创建replaceByTerm关系。

这些相同的原理适用于XML中定义的其他类型的关系，其中在XML或存储库中找不到引用的对象。如果XML中的新项目需要某种关系，则如果找不到该引用对象，则根本不会添加新对象。在新对象的存在不要求引用对象的情况下，导入成功，但未建立该关系。

注意：如果某些数据未导入，则需要检查这种情况。拼写错误，字母大小写错误或其他无效的关系定义也可能导致这种导入失败。

匹配进口

在清单2中，该示例向目标存储库添加了一个新术语。以下各节描述了XML导入可以更新目标存储库中已经存在的对象的方案。

在导入期间，服务首先检查以查看XML中的每个对象在目标存储库中是否已经存在。这确定是要添加新对象还是要更新现有对象。执行此检查时，服务首先确定是否为XML中的对象定义了存储库ID（RID）。 RID是由信息服务器在内部生成的唯一ID。如果定义了RID，则该服务将检查目标存储库中是否存在具有该RID的对象。如果在XML中没有为该对象定义RID，或者未找到RID，则该服务将查看该对象的标识以查看是否存在匹配项。

从业务词汇表以XML格式导出数据时，RID将作为每个对象的属性导出。但要注意，在一个XML文件中写入一个新的对象时，要导入到库中，RID属性和价值不应该被加入到该对象的XML文件。只有InfoSphere Information Server可以在内部生成这些值，并且您不能手动创建它们。在这种情况下，必须在相关位置添加一个标识值，而不是RID。

在清单3中，从业务词汇表中导出了现有的术语Address。该术语具有RID属性和值。该值由一长串数字，字母和其他字符组成。

清单3. RID匹配

<term name="Address" 
    rid="b1c497ce.e1b1ec6c.38683868.73b6c473-1dc5-45f6.8e65.550faa5565f8"
    shortDescription="LOCATION ADDRESS TYPE">
    <parentCategory identity="Business Concepts::Location"/>
</term>

在此示例中，该术语同时指定了其RID和其父类别标识。因此，在导入时，检查匹配对象的顺序如下：

将检查目标存储库中b1c497ce.e1b1ec6c.38683868.73b6c473-1dc5-45f6.8e65.550faa5565f8存在RID值为b1c497ce.e1b1ec6c.38683868.73b6c473-1dc5-45f6.8e65.550faa5565f8另一个术语。如果找到，则不添加新术语，并且可以更新目标存储库中的术语。在这种情况下，如果通过RID进行了匹配，则该术语或其父代的名称也会被更新。
如果未找到具有该RID的术语，则将进行身份检查，该检查将检查目标存储库中是否存在类别Business Concepts :: Location中名为Address的术语。如果找到，则不添加新术语，并且目标存储库中的术语将更新。
如果既没有找到RID也没有找到身份，则根据XML数据在目标存储库中创建一个新术语。

您可以通过RID或标识来引用对象。在清单3中，为该术语指定了RID和标识。当您编写XML文件时，实际上仅需要一个RID或标识。

清单4描述了一种场景，其中仅术语的身份位于XML文件中，而没有RID。

清单4.身份匹配

<term name="Address">
    <parentCategory identity="Business Concepts::Location"/>
</term>

在此示例中，在XML文件中没有为该术语指定RID，但是有一个父类别标识。当术语“地址”导入到业务词汇表中时，由于XML中没有RID值，因此导入服务将按身份检查以查看该术语是否已存在于目标存储库中。导入时将按以下顺序执行以下操作：

将进行检查以确定目标存储库中是否存在类别Business Concepts :: Location中名为Address的术语。如果找到该术语，则不添加新术语，并且目标存储库中的术语将更新。
如果找不到身份，则根据XML数据在目标存储库中创建一个新术语。

合并导入的属性

一旦确定目标存储库中已经存在XML文件中的对象，那么问题就变成如何合并其所有属性值。例如，对于一个术语，XML文件中的属性值可能与目标存储库中的属性值不同。本节将更详细地描述合并过程。

清单5导入了Business Concepts :: Location中的地址。

清单5.以XML合并

<term name="Address" 
    status="CANDIDATE"
    example="1222 Park Avenue"
    abbreviation="ADDR">
    <parentCategory identity="Business Concepts::Location"/>
</term>

如果在目标存储库中找到了与RID或标识匹配的术语，则将XML文件中该术语的每个值与目标存储库中该术语的值进行比较。例如，在清单5中，缩写的值为ADDR，而在目标存储库中，缩写的值为ADR。问题就变成了：使用的是XML文件中的值还是目标存储库中的值？如果XML或目标存储库中没有属性值，会发生什么？

答案取决于您选择的合并方法。 InfoSphere Business Glossary XML文件导入具有四个不同的合并选项可供选择。 Besides the four merge methods, the behavior of the merge depends on whether the attribute can contain a single value or if it can contain a list of many values.

Figure 11 shows how these merge methods are presented in the InfoSphere Information Server Web console and some examples.

Figure 11. GUI of XML merge methods

Radio buttons: Ignore the imported, Overwrite the existing, Merge & Ignore imported, or Merge &Overwrite existing

Listing 6 shows how to import the Address term from Listing 5 when an equivalent term exists in the repository.

Listing 6. Merging a single value attribute in XML

<term name="Address" 
    status="CANDIDATE"
    abbreviation="ADR"
    additionalAbbreviation="AD">
    <parentCategory identity="Business Concepts::Location"/>
</term>

XML and target repository both have a value, single value allowed

The abbreviation attribute can have a single value. In the XML, this abbreviation value is ADDR. In the target repository, this abbreviation value is ADR.

When importing this term, the result for the abbreviation attribute for each merge method is as follows:

Ignore

The imported value ADDR is ignored, and the value in the target repository ADR remains unchanged.

Overwrite

The imported value ADDR is overriding the value in the target repository ADR.

Merge and ignore

Because both the imported and existing values are populated, the merge is done the same way it is done for Ignore: the imported value is ignored, and the value in the target repository ADR remains.

Merge and overwrite

Because both imported and existing values are populated, the merge is done the same way it is done for Overwrite: the imported value ADDR is used.

XML has no value, and target repository has a value, single value allowed

The additionalAbbreviation attribute can also have a single value. In the XML there is no value for additionalAbbreviation, while in the target repository there is a value AD.

When importing this term, the result for the additionalAbbreviation attribute for each merge method is as follows:

Ignore

The imported value is ignored, and therefore the existing value AD remains unchanged.

Overwrite

The imported non-value overwrites the existing value AD, therefore in the result, additionalAbbreviation has no value.

Merge and ignore

Because the existing value is populated, the imported non-value is ignored, and the existing value AD remains unchanged.

Merge and overwrite

Because the imported additionalAbbreviation has no value, it does not overwrite the existing value AD.

XML has a value, and target repository has no value, single value allowed

For the example attribute, there is a value in the XML, but there is no value in the target repository. The results are as follows:

Ignore

The imported value is ignored, therefore the existing value remains unchanged, remaining with no value.

Overwrite

The imported value for the example overwrites the existing value, which is no value.

Merge and ignore

Because the existing example has no value, the imported value 1222 Park Avenue is used.

Merge and overwrite

Because the imported example has a value, it is used.

Merging properties with multiple cardinality values

The previous section explained how single attribute values are merged in XML import. This section describes merging values that have multiple cardinality, which means containing a list of more than one value.

In general, these merges are similar to single value merges. The difference is that with the Merge and ignore option and with the merge and overwrite option, the resulting list is a combination of the existing values and the imported values.

XML contains term Address with related term Street

In the example in Listing 7, the XML contains the term Address to merge with the related term Street.

Listing 7. XML value for a multiple value attribute

<term name="Address">
    <parentCategory identity="Business Concepts::Location"/>
    <relatedTerms>
        <termRef identity="Customer::Street"/>
    <relatedTerms/>
</term>

Target repository contains term Address with related term Road

In the example in Listing 8, the target repository contains the term Address to merge with the related term Road.

Listing 8. Target repository value for a multiple value attribute

<term name="Address">
    <parentCategory identity="Business Concepts::Location"/>
    <relatedTerms>
        <termRef identity="Industry::Road"/>
    <relatedTerms/>
</term>

The results for each merge method are as follows:

Ignore

The imported related term is ignored, which leaves the existing related term unchanged, remaining as Road.

Overwrite

The imported related term Street overwrites the existing related term Road.

Merge and ignore

The existing and imported related terms are combined, therefore the resulting related terms list contains both Street and Road.

Merge and overwrite

The existing and imported related terms are combined, therefore the resulting related terms list contains both Street and Road.

Understanding case sensitivity in XML

All names in the Information Server are case sensitive. This means that the term Street is not the same as the term street . When you refer to new or existing glossary data, be sure to use the right case, otherwise an incorrect glossary hierarchy might be created. For example, the target repository might have the term Business Concepts::Location::Street, but not Business Concepts::Location::street. If you are trying to update the term Street, but by mistake in the XML you type this term as Business Concepts::Location:street, then on import, the term street is added to Business Concepts::Location, and Street will not be updated as you expected.

Understanding partial import support

The new XML import introduced the capability to perform a partial import. This means that objects such as categories or terms can be defined in the XML file without adding their referenced objects to the XML file. In this case, the referenced objects need to exist in the target repository before import in order for the connections defined in the XML to be made.

Listing 2 offered an example of this type of import by importing the term Street. In that case, you added a new term to the glossary in which its parent category and replaced-by term were not in the XML, but were expected to be in the target repository. The exact identity or RID for these existing referenced objects needs to be defined in the XML file.

Assigning stewards

The XML import enables you to assign a steward to manage categories or terms. Listing 9 shows an example of how to assign a steward to manage a category or term in an XML. You need to define the steward's userName and type.

Listing 9. XML to assign a steward to a category and a term

<category name="Location" longDescription="Set of Terms relating to:  Location">
    <parentCategory identity="Business Concepts"/> 
    <steward userName="analysts" type="USERGROUP"/>
</category>
		
<term name="Address" shortDescription="LOCATION ADDRESS TYPE" 
      status="CANDIDATE" type="NONE" isModifier="false">
    <parentCategory identity="Business Concepts::Location"/>
    <steward userName="robertj" type="USER"/>
</term>

The userName value is the same as was described in the CSV section about assigning stewards . The type value can be either USER or USERGROUP, which are defined as valid values in the schema.

If the USER or USERGROUP defined in the XML does not exist in the repository, the import succeeds, but the connection from the term or category to the steward is not made. This follows the previously described principle that if a non-required reference is defined in the XML for a new category or term, and that non-required reference (a steward) does not exist in the XML or in the repository, then on import, the new category or term is created, but the steward connection is ignored.

Note: If the steward for a userName in the XML does not exist in the repository, but the USER or USERGROUP for the userName in the XML does exist in the repository, then on import, the steward is created for that userName. And the connection is made to the category or term, as was defined in the XML.

Handling categories that contain terms and subcategories

This tutorial describes several examples in which a term defines its parent category (parentCategory element) in the XML. It was decided when designing the XML schema that this relationship of a category containing terms can be defined only within the term tag, but it cannot be defined from the category side. This decision was made because a term can be imported without its parent category, and a term requires exactly one parent category in order to exist, therefore a parent category needs to be defined for a term. On the other hand, a category does not require contained terms to exist. By allowing this relationship to be defined only on the term level, it minimizes the risk of possible errors in the XML. Listing 10 shows the XML code to define a parent category of a term.

Listing 10. Parent category of a term

<term name="Address">
    <parentCategory identity="Business Concepts::Location"/>
</term>

A similar principle applies to the way that subcategories are defined in the XML schema. A category in the XML can define its parent category, but it cannot define its subcategories. The difference here is that a category is not required to have a parent category or subcategories. But because a category can have only one parent category, it is easier and less problematic to just allow the definition for this relationship to be defined by the parentCategory tag and not by a subCategories tag. Listing 11 shows the XML code to define a parent category of a category.

Listing 11. Parent category of a category

<category name="Location">
    <parentCategory identity="Business Concepts"/>
</category>

Understanding how categories reference terms

In the InfoSphere Business Glossary, a category can reference terms that it does not contain. In the XML you can define this relationship on both the category level and the term level. When defining the schema, it was decided to allow this capability because a category can reference many terms. A term can be referenced by many categories. Both directions have multi-cardinality, so when writing an XML, the chance of error is very low. Listing 12 shows the XML code for a category referencing a term.

Listing 12. Category referencing a term

<category name="Industry">
    <referencedTerms>
        <termRef identity="Business Concepts::Location::Address"/>
    </referencedTerms>
</category>

Creating custom attributes

Custom attributes are properties of categories and terms that can be created to extend the standard glossary template, as described in the CSV section . You can define custom attribute definitions and custom attribute values in the XML file.

Defining custom attributes

The custom attribute definitions are defined in their own section in the XML, separated from the category and term sections. Category custom attribute definitions and term custom attribute definitions are each defined in their own separate section.

Custom attribute definitions can be of type string or enumerated. When defining a string custom attribute definition, only the name is required, and a description is optional. When defining an enumerated custom attribute definition, an enumerated list of valid values is additionally required.

Listing 13 shows an example of category and term custom attribute definition sections in the XML, where the category custom attribute definition is of type string, and the term custom attribute definition is enumerated.

Listing 13. Custom attribute definitions

<categoryCustomAttributes>
    <customAttributeDef name="Priority">
        <validValues>
            <validValue value="high"/>
            <validValue value="medium"/>
            <validValue value="low"/>
        </validValues>
    </customAttributeDef>
</categoryCustomAttributes>
<termCustomAttributes>
    <customAttributeDef name="ID"/>
</termCustomAttributes>

The category custom attribute definition has the name Priority, and it is of type enumerated. Its enumerated values are high, medium, and low.

The term custom attribute definition has the name ID, and it is of type string. Because it is of type string, it can have any value.

On import, if these custom attribute definitions do not exist, they are created. If they already exist, they will remain as they are in the repository.

If an enumerated custom attribute definition already exists in the repository and the list of valid values is different in the repository than in the XML, then the list could be changed by changing the import merge option, similar to other category and term lists.

Defining custom attribute values

Custom attribute values are defined in the XML within the specific category and term sections. Listing 14 shows a custom attribute value defined.

Listing 14. Category with a custom attribute value

<category name="Location" longDescription="Set of Terms relating to:  Location">
    <parentCategory identity="Business Concepts"/>
    <customAttributes>
        <customAttributeValue customAttribute="Priority" value="medium"/>
    </customAttributes>
</category>

In Listing 14, the category Location has a custom attribute value of medium defined for the custom attribute definition Priority. Because the custom attribute definition Priority is of type enumerated, the valid values are high, medium, and low, so the custom attribute values need to be one of those three values. If in the XML a custom attribute value for Priority is defined to be a different value than one of these three, such as very high, then the import fails with an error message that says:

Custom attribute value very high defined for Category
Location does not match its custom attribute definition
Priority. Valid values are: high, low,
medium.

Listing 15 shows a term with a custom attribute value defined.

Listing 15. Term with a custom attribute value

<term name="Address" shortDescription="LOCATION ADDRESS TYPE"
    status="CANDIDATE" abbreviation="ADDR">
    <parentCategory identity="Business Concepts::Location"/>
    <customAttributes>
        <customAttributeValue customAttribute="ID" value="2"/>
    </customAttributes>
</term>

In Listing 15, the term Address has a custom attribute value of 2 defined for the custom attribute definition ID. Because the custom attribute definition Priority is of type string, the value can be any value.

The custom attribute definitions referred to by the custom attribute values need to be defined either in the XML or in the metadata repository in order for the custom attribute values to be imported. If the definitions do not exist, the import succeeds, but these custom attribute values are not created.

Defining synonym groups

In the business glossary, each term can have one or more synonyms. In the database, terms are set as synonyms by defining a synonym group, which contains the terms. Each term is only allowed to be included in a single synonym group. In order to ensure this, synonym groups are defined as separate objects in the XML, rather than being defined within the term tag like other term relationships.

Each synonym group can have up to one preferred synonym. This is also defined in the synonymGroup tag, as shown in Listing 16.

Listing 16. Synonyms and preferred synonym

<synonymGroups>
    <synonymGroup>
        <synonyms>
            <termRef identity="Business Concepts::Location::Address"/>
            <termRef identity="Customer::Street"/>
        </synonyms>
        <preferredSynonym identity="Customer::Street"/>
    </synonymGroup>
</synonymGroups>

In Listing 16, the two terms Street and Address are defined as synonyms. When an XML with synonyms is imported, if one or more of the terms in the synonym group are already in the repository and already have a different synonym, then the three synonyms are merged to be part of one synonym group. Note that for synonyms, this merge takes place in the same way when using all four XML merge options.

In Listing 16, the term Street is defined as being the preferred synonym of this synonym group.

Assigning assets

In the Business Glossary, terms can have a list of assigned assets, which can be many different types, including databases, columns, BI reports, and so on. Assigning assets to terms is one of the important capabilities of the Business Glossary. XML import lets you assign assets to terms in a straightforward manner. Listing 17 shows an example of assigning a host called BUSINESS_SERVER to term Street.

Listing 17. Assigned asset of type host

<term name="Street">
    <parentCategory identity="Business Concepts::Location"/>
    <assignedAssets>
        <hostRef host="BUSINESS_SERVER"/>
    </assignedAssets>
</term>

On import, if the host named BUSINESS_SERVER exists in the target repository, it is assigned to the term. If BUSINESS_SERVER does not exist, the term is imported without the assignment.

In Listing 17, the host is identified by its name. Each asset type has its own set of identifying attributes, which the schema defines. Some identity attributes are required in the XML, and some are not. The key for the match to take place on import is that in the XML it is necessary to specify all the relevant attributes that are defined for this asset in the repository.

Listing 18 shows an example that assigns a database column to a term.

Listing 18. Assigned asset of type column

<term name="Street">
    <parentCategory identity="Business Concepts::Location"/>
    <assignedAssets>
        <columnRef column="CUSTOMER_NAME" table="CUSTOMER" schema="DB1_SC" 
        databaseName="BUSINESS" host="BUSINESS_SERVER" />
    </assignedAssets>
</term>

In Listing 18, for column CUSTOMER_NAME there are five attribute values defined within the columnRef tag:

柱
表
图式
databaseName
主办

All of these attributes are required by the XML schema for an asset of type column. If a column with all of these five attribute values exists in the target repository on import, then the match is made.

Other types of assets might have a hierarchical definition. For example, the object can be contained in a hierarchical structure that can include several instances of the same type. Listing 19 shows an example of this kind of asset.

Listing 19. Assigned asset of type BI report field

<term name="Street">
    <parentCategory identity="Business Concepts::Location"/>
    <assignedAssets>
        <BIReportFieldRef field="Total net. profit">
            <BIReportGroupRef group="Sales">
                <BIReportGroupRef group="Products">
                    <BIReportRef report="Annual Report" />
                </BIReportGroupRef>
            </BIReportGroupRef>
        </BIReportFieldRef>
    </assignedAssets>
</term>

This assigned asset is of type BI report field, which is always contained in a BI report group. The BI report group might be contained in another BI report group, or it might be contained in a BI report. The hierarchy between the BI report group instances is represented by XML nested elements. In Listing 19, the BI report field named Total net. profit is contained in a BI report group called Sales. Sales is contained in a BI report group called Products. Products is then contained in a BI report called Annual Report. If a BI report field with the same name with the same hierarchy exists in the target repository on import, the match is made.

Some asset types have optional attributes defined by the schema.

Listing 20 shows an example of an assigned database in an XML file that has three identification attributes defined.

Listing 20. Assigned asset of type database

<term name="Street">
    <parentCategory identity="Business Concepts::Location"/>
    <assignedAssets>
        <databaseRef 
            databaseName="BUSINESS"
            host="BUSINESS_SERVER" 				
            databaseDBMS="DB2" />
    </assignedAssets>
</term>

For a database asset, according to the schema, only the databaseName and host attributes are required. A database has three other optional attributes:

databaseDBMS
databaseInstance
databasePath

In Listing 20, only one of these optional attributes is defined, databaseDBMS. When importing this XML, if a database with these three attribute values, and only these three attribute values, is found in the repository, the assignment will be made to the term.

If there is a database with the same name and host in the repository, but the databaseDBMS value is not defined there, the assignment is not made.

If there is a database with the same name, host, and databaseDBMS in the repository, but it also has a value defined for databaseInstance in the repository, then on import, the assignment to the term is not made, because the assets are not considered to be identical.

结论

出口

The InfoSphere Business Glossary also supports export of content in CSV and XML formats, which can be used to transfer data between environments and server instances. This exported file can be edited, adding more categories and terms, and then re-imported to the same server from which it was exported to add new data.

Once a business glossary has been richly populated, exporting can be useful to copy or transfer this content from one business glossary to another, which is a method of populating that second repository. In addition to XML and CSV formats, exporting is available in an XMI format.

结论

The CSV and XML import and export features in the InfoSphere Business Glossary, introduced in version 8.1.1, provide users with powerful tools to create and update business glossary data.

致谢

The authors would like to thank Michael Fankhauser, Benny Halberstadt, Roger Hecker, Nancy Navarro, and Erel Sharf for their feedback and review of this tutorial.

翻译自: https://www.ibm.com/developerworks/data/tutorials/dm-1009infospherebusglosscsvxml/index.html

xml 和csv的区别

魔乐社区

魔乐社区（Modelers.cn) 是一个中立、公益的人工智能社区，提供人工智能工具、模型、数据的托管、展示与应用协同服务，为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作，由全产业链共同建设、共同运营、共同享有，推动国产AI生态繁荣发展。

更多推荐

【计算机视觉】Pixel逐像素分类&Mask掩码分类理解摘要

魔乐社区

计算机视觉（opencv）实战三十二——CascadeClassifier 人脸微笑检测（摄像头）

本文从原理到实现，详细介绍了基于 OpenCV Haar 分类器的人脸与微笑检测：讲解了 Haar 特征和级联检测原理。对代码逐行拆解并解释参数含义。画出完整流程图，帮助理解执行过程。给出了常见问题和优化建议，甚至扩展到深度学习方法。这种方法简单、轻量、实时性好，非常适合入门和小型应用项目。但如果需要更高准确率和更强鲁棒性，建议使用深度学习检测器替代 Haar 分类器。