1461187976-53ac8f36-b2dd-459f-a304-9f5b4af2da93

1. A computer-implemented method for analyzing data representative of media material having a layout, comprising:
identifying block segments associated with columnar body text in the media material; and
determining which of the identified block segments belong to one or more articles in the media material based on language statistics information and layout information,
wherein the data representative of media material comprises pixel data of an image of the media material, and the block segment identifying includes analyzing the pixel data to identify regions having similar pixel value change complexity,
wherein the data representative of media material further includes text data representing text in the media material, and the block segment identifying includes a step of associating the text data with corresponding image regions identified as having similar pixel value change complexity based on the location of the text data and the corresponding regions in the media material, and
wherein the text data associating step includes:
mapping words found in the text data to an initial set of the corresponding image regions identified as having similar pixel value change complexity; and
adjusting the initial set of image regions to obtain a final set of image regions to the regions based on the distribution of words in the word mapping.
2. The method of claim 1, wherein the pixel data analyzing comprises analyzing pixel value changes along horizontal and vertical directions from a pixel being analyzed.
3. The method of claim 1, further comprising:
identifying text sizes in the text data including a text size associated with a columnar body text in the media material.
4. A computer-implemented method for analyzing data representative of media material having a layout, comprising:
identifying block segments associated with columnar body text in the media material; and
determining which of the identified block segments belong to one or more articles in the media material based on language statistics information and layout information,
wherein the determining includes:
calculating language statistics for candidate block segments; and
determining probabilities that compared block segments belong to a same article based on an overlap in language statistics information.
5. The method of claim 4, wherein the language statistics information comprises word frequency information, and the calculating includes calculating a match score for a pair of candidate block segments based on word frequencies in each block segment relative to an entire corpus and cosine distance similarity between the pair of candidate block segments.
6. The method of claim 5, wherein the determining probabilities step includes determining a probability that the pair of candidate block segments belong to the same article in the media material based on the calculated match score and sample data with predetermined positive and negative examples of block segments belonging and not belonging to a same article.
7. The method of claim 6, further comprising selecting the positive and negative data examples from a collection of articles in a training data set.
8. The method of claim 6, further comprising enabling a user to select the positive and negative data examples from a display of text data extracted through optical character recognition from an image of the media material.
9. The method of claim 4, wherein the determining further includes identifying whether the candidate block segments belong to a same article in the media material based on the probabilities determined based on the overlap in language statistics information.
10. The method of claim 4, wherein the determining further includes analyzing layout transition features in candidate block segments and determining whether the candidate block segments belong to a same article in the media material.
11. The method of claim 10, wherein the layout transition analyzing includes finding a pair of candidate block segments aligned in a vertical direction based on vertical layout transition features.
12. The method of claim 11, wherein the layout transition analyzing further includes finding another pair of candidate block segments aligned in a horizontal direction based on horizontal transition features in the layout.
13. A computer-implemented method for analyzing data representative of media material having a layout, comprising:
identifying block segments associated with columnar body text in the media material; and
determining which of the identified block segments belong to one or more articles in the media material based on language statistics information and layout information,
wherein the determining includes analyzing layout transition features in candidate block segments and determining whether the candidate block segments belong to a same article in the media material, and
wherein the layout transition analyzing includes:
calculates the layout transition features from the candidate block segments; and
applying a predetermined layout transition classifier to determine whether the candidate block segments belong to the same article in the media material based on the calculated layout transition features.
14. A computer-implemented method for analyzing data representative of media material having a layout, comprising:
identifying block segments associated with columnar body text in the media material; and
determining which of the identified block segments belong to one or more articles in the media material based on language statistics information and layout information,
wherein the determining includes analyzing layout transition features in candidate block segments and determining whether the candidate block segments belong to a same article in the media material, and
wherein the block segments comprise training data, and the layout transition analyzing includes:
calculating the layout transition features from the candidate block segments; and
building a layout transition classifier that can subsequently be used to determine whether further candidate block segments belong to a same article in the media material.
15. A computer-implemented method for analyzing data representative of media material having a layout, comprising:
identifying block segments associated with columnar body text in the media material; and
determining which of the identified block segments belong to one or more articles in the media material based on language statistics information and layout information,
wherein the article determining comprises:
calculating language statistics for candidate block segments;
determining probabilities that candidate block segments belong to a same article based on an overlap in language statistics information;
analyzing layout transition features in candidate block segments;

determining whether the candidate block segments belong to a same article in the media material; and
identifying whether the candidate block segments belong to a same article in the media material depending upon the probabilities determined based on an overlap in language statistics information and whether the candidate block segments were determined to belong to a same article in the media material.
16. The method of claim 15, further comprising displaying text from one or more block segments determined to be in the same article.
17. A media material analyzer for analyzing data representative of media material having a layout, comprising:
a segmenter that identifies block segments associated with columnar body text in the media material; and
an article composer that determines which of the identified block segments belong to one or more articles in the media material based on language statistics information and layout transition information,
wherein the data representative of media material comprises pixel data of an image of the media material, and the segmenter analyzes the pixel data to identify regions having similar pixel value change complexity,
wherein the data representative of media material further includes text data representing text in the media material, and the segmenter associates the text data with corresponding image regions identified as having similar pixel value change complexity (PVCC) based on the location of the text data and the corresponding regions in the media material, and
wherein the segmenter maps words found in the text data to an initial set of the corresponding image regions identified as having similar pixel value change complexity, and adjusts the initial set of image regions to obtain a final set of image regions based on the distribution of mapped words.
18. The media material analyzer of claim 17, wherein the segmenter analyzes pixel value changes along horizontal and vertical directions from a pixel being analyzed.
19. The media material analyzer of claim 17, wherein the segmenter further identifies text sizes in the text data including a text size associated with a columnar body text in the media material.
20. The media material analyzer of claim 17, wherein the article composer includes a layout transition analyzer that analyzes layout transition features in candidate block segments output by the segmenter and determines whether the candidate block segments belong to a same article in the media material.
21. The media material analyzer of claim 20, wherein the layout transition analyzer finds a pair of candidate block segments aligned in a vertical direction based on vertical layout transition features and determines whether the pair of candidate block segments belong to a same article in the media material.
22. The media material analyzer of claim 21, wherein the layout transition analyzer finds another pair of candidate block segments aligned in a horizontal direction based on horizontal transition features in the layout to determine whether the another pair of candidate block segments belong to a same article in the media material.
23. A media material analyzer for analyzing data representative of media material having a layout, comprising:
a segmenter that identifies block segments associated with columnar body text in the media material; and
an article composer that determines which of the identified block segments belong to one or more articles in the media material based on language statistics information and layout transition information,
wherein the article composer includes a language statistics analyzer that calculates language statistics for candidate block segments output by the segmenter, and determines probabilities that candidate block segments belong to a same article based on an overlap in language statistics information.
24. The media material analyzer of claim 23, wherein the language statistics information comprises word frequency information, and wherein the language statistics analyzer calculates a match score for a pair of candidate block segments based on word frequencies in each block segment relative to an entire corpus and a cosine distance similarity between the pair of candidate block segments.
25. The media material analyzer of claim 23, wherein the language statistics analyzer determines a probability that the pair of candidate block segments belong to the same article in the media material based on the calculated match score and sample data with predetermined positive and negative examples of block segments belonging and not belonging to a same article.
26. The media material analyzer of claim 25, wherein the language statistics analyzer automatically selects the positive and negative data examples from a collection of articles in a training data set.
27. The media material analyzer of claim 25, wherein the predetermined positive and negative data examples are selected by a user at user-interface from a display of text data extracted through optical character recognition from an image of the media material.
28. The media material analyzer of claim 23, wherein the article composer further includes a combiner that identifies whether the candidate block segments belong to a same article in the media material based on the probabilities determined by the language statistics analyzer.
29. A media material analyzer for analyzing data representative of media material having a layout, comprising:
a segmenter that identifies block segments associated with columnar body text in the media material; and
an article composer that determines which of the identified block segments belong to one or more articles in the media material based on language statistics information and layout transition information,
wherein the article composer includes a layout transition analyzer that analyzes layout transition features in candidate block segments output by the segmenter and determines whether the candidate block segments belong to a same article in the media material, and
wherein the layout transition analyzer calculates the layout transition features from the candidate block segments, and applies a predetermined layout transition classifier to determine whether the candidate block segments belong to the same article in the media material based on the calculated layout transition features.
30. A media material analyzer for analyzing data representative of media material having a layout, comprising:
a segmenter that identifies block segments associated with columnar body text in the media material; and
an article composer that determines which of the identified block segments belong to one or more articles in the media material based on language statistics information and layout transition information,
wherein the article composer includes a layout transition analyzer that analyzes layout transition features in candidate block segments output by the segmenter and determines whether the candidate block segments belong to a same article in the media material, and
wherein the block segments comprise training data, and the layout transition analyzer calculates the layout transition features from the candidate block segments, and builds a layout transition classifier that can subsequently be used to determine whether further candidate block segments belong to a same article in the media material.
31. The media material analyzer of claim 30, further comprising:
training data includes labels indicating whether blocks of text belong to the same article.
32. A media material analyzer for analyzing data representative of media material having a layout, comprising:
a segmenter that identifies block segments associated with columnar body text in the media material; and
an article composer that determines which of the identified block segments belong to one or more articles in the media material based on language statistics information and layout transition information,
wherein the article composer comprises:
a language statistics analyzer that calculates language statistics for candidate block segments output by the segmenter and determines probabilities that candidate block segments belong to a same article based on an overlap in language statistics information;
a layout transition analyzer that analyzes layout transition features in candidate block segments output by the segmenter and determines whether the candidate block segments belong to a same article in the media material; and
a combiner that identifies whether the candidate block segments belong to a same article in the media material depending upon the probabilities determined by the language statistics analyzer and whether the candidate block segments belong to a same article in the media material according to the layout transition analyzer.
33. A media material analyzer for analyzing data representative of media material having a layout, comprising:
a segmenter that identifies block segments associated with columnar body text in the media material; and
a language statistics analyzer that calculates language statistics for candidate block segments output by the segmenter and determines probabilities that candidate block segments belong to a same article based on an overlap in language statistics information.

The claims below are in addition to those above.
All refrences to claim(s) which appear below refer to the numbering after this setence.

1. A back contact substrate for a photovoltaic cell, comprising a carrier substrate and an electrode, the electrode comprising:
a conductive coating comprising a metallic thin film which is an alloy thin film based on at least two elements, at least one first element MA chosen among copper (Cu), silver (Ag) and gold (Au), and at least one second element MB chosen among zinc (Zn), titanium (Ti), tin (Sn), silicon (Si), germanium (Ge), zirconium (Zr), hafnium (Hf), carbon (C) and lead (Pb);
a barrier to selenization thin film for protecting the conductive coating and based on at least one among MoxOyNz, WxOyNz, TaxOyNz, NbxOyNz, RexOyNz.
2. The back contact substrate according to claim 1, wherein the barrier to selenization thin film has a compressive stress between 0 and \u221210 GPa.
3. The back contact substrate according to claim 1, wherein the barrier to selenization thin film is nano-crystalline or amorphous with a grain size of at most 10 nm.
4. The back contact substrate according to claim 1, wherein the barrier to selenization thin film has a molar composition O(O+N) of at least 1% and at most 50%.
5. The back contact substrate according to claim 1, wherein the barrier to selenization thin film has a molar composition M\u2032(M\u2032+O+N) of at least 15% and at most 80%, where M\u2032 is chosen from among Mo, W, Ta, Nb or Re.
6. The back contact substrate according to claim 1, wherein the barrier to selenization thin film has a thickness of at least 5 nm and at most 100 nm.
7. The back contact substrate according to claim 1, wherein the electrode comprises a second barrier to selenization thin film for protecting the conductive coating and based on at least one among MoxOyNz, TixOyNz, WxOyNz, TaxOyNz, NbxOyNz, RexOyNz.
8. The back contact substrate according to claim 1, wherein said electrode further comprises an interlayer thin film between the conductive coating and the barrier to selenization thin film, the interlayer thin film being based on at least one of titanium (Ti), tungsten (W), molybdenum (Mo), rhenium (Re), niobium (Nb) or tantalum (Ta).
9. The back contact substrate according to claim 1, wherein said electrode further comprises an ohmic contact thin film based on at least a metal.
10. (canceled)
11. The back contact substrate according to claim 1, wherein the main metallic thin film is based on:
at least one of copper (Cu) and silver (Ag); and
on zinc (Zn).
12. The back contact substrate according to claim 1, wherein the metallic thin film is based on:
at least one of copper (Cu) and silver (Ag); and
on zinc (Zn) and titanium (Ti).
13. A photovoltaic cell comprising a back contact substrate according to claim 1 and at least a thin film of a photoactive material.
14. A process for the manufacture of a back contact substrate for a photovoltaic cell, comprising a step of making an electrode comprising steps of making:
a conductive coating comprising a metallic thin film which is an alloy thin film based on at least two elements, at least one first element MA chosen among copper (Cu), silver (Ag) and gold (Au), and at least one second element MB chosen among zinc (Zn), titanium (Ti), tin (Sn), silicon (Si), germanium (Ge), zirconium (Zr), hafnium (Hf), carbon (C) and lead (Pb);
a barrier to selenization thin film for protecting the conductive coating and based on at least one among MoxOyNz, WxOyNz, TaxOyNz, NbxOyNz, RexOyNz.
15. The process according to claim 14, comprising forming a photoactive thin film during which resistivity of the electrode is decreased, and the obtained sheet resistance after thermal annealing is below 2\u03a9\u25a1.
16. The back contact substrate according to claim 2, wherein the barrier to selenization thin film has a compressive stress between \u22121 and \u22125 GPa.
17. The back contact substrate according to claim 4, wherein the barrier to selenization thin film has a molar composition O(O+N) of at least 2% and at most 20%.
18. The back contact substrate according to claim 6, wherein the barrier to selenization thin film has a thickness of at least 10 nm and at most 60 nm.
19. The back contact substrate according to claim 9, wherein the ohmic contact thin film is based on molybdenum (Mo) andor tungsten (W).
20. The process according to claim 15, wherein the obtained sheet resistance after thermal annealing is below 1\u03a9\u25a1.