How can I get xml and analyze tag contents? – python
import xml.etree.ElementTree as ET
I would like to analyze xml. What I want to do is to retrieve the contents of the specified tag and output it from the top after acquiring all the contents of the XML file. You can get meta tags, texts, etc. in tags. The code looks like the following
sample.sml
<?xml version='1.0' encoding='utf-8'?>
<tmx version="1.4">
<body>
<tu>
<tuv xml:lang="en">
<seg>Welcome to Wikipedia</seg>
</tuv>
<tuv xml:lang="ca">
<seg>Benvinguts a la Viquipèdia</seg>
</tuv>
</tu>
<tu>
<tuv xml:lang="en">
<seg>The Free Encyclopedia</seg>
</tuv>
<tuv xml:lang="ca">
<seg>L'Enciclopèdia Lliure</seg>
</tuv>
</tu>
</body>
</tmx>
tree.py
# -*- coding: utf-8 -*-
import xml.etree.ElementTree as ET
tree = ET.parse('/Users/path/Desktop/sample.xml')
root = tree.getroot()
# print (root.tag)
# print (root.attrib)
# es = root.findall('.//tu')
# for e in es:
# print (e.tag, e.attrib)
tuv = root.findall('.//seg')
for e in tuv:
print (e.text)
seg = root.findall('.//tuv')
for e in seg:
print (e.attrib)
output
$ python /Users/path/Desktop/tree.py
Welcome to Wikipedia
Benvinguts a la Viquipèdia
The Free Encyclopedia
L'Enciclopèdia Lliure
{'{https://www.w3.org/XML/1998/namespace}lang': 'en'}
{'{https://www.w3.org/XML/1998/namespace}lang': 'ca'}
{'{https://www.w3.org/XML/1998/namespace}lang': 'en'}
{'{https://www.w3.org/XML/1998/namespace}lang': 'ca'}