表中没有检测到与tabula和柯莱特

Question 1

我试图抽取表的Pdf文件不在适当的格式，我想。该表格在这些Pdf文件有一个表的格式，但不适当地封闭，与竖向的边界。我会附上样本pdf和输出这两个图书馆。当我试图使用tabula对表检测，一个空白datadrame返回的所有网页上以pdf。

该0单页，1，2个用于具体页：2 该网页数：25 没有桌上发现了这个网页通过tabula.

当我使用的卡有同样没有应答时我用的 flovor='lattice'

该0单页,1页，2页表中检测到tabula，3为具体网页：3 该0格或1流：0 该网页数：25 没有桌上找到这页，柯莱特。

当我用 flovor='stream'我得到一个数据框，每个线上读取行通过符合标签的数据分离，但它将包括正常的文本，以及在据框.

该0单页,1页，2页表中检测到tabula，3为具体网页：3 该0格或1流：1 该网页数：25

我只是需要一种有效的方法以检测表和提取相同的数据，如果垂直包围表的线是不存在的。两tabula和柯莱特图书馆工作的罚款，如果表在适当的格式括通过垂直和水平线。

Question 2

这种方法可能会帮助你： https://camelot-py.readthedocs.io/en/master/user/advanced.html#specify-column-separators

你可以找到指定的垂直分离柯莱特通过x坐标，首先你应该使用"。情节()"方法在卡美洛看表内pdf并注意到x坐标在哪里你想要的垂直分离器，然后通过他们在如下：

# to get the x-coordinates
tables = camelot.read_pdf('your_pdf.pdf')
camelot.plot(tables[0], kind='text').show()

#to pass the x-coordinates
camelot.read_pdf('your_pdf.pdf', flavor='stream', columns=['x1,x2'])

Question 3

表中没有检测到与tabula和柯莱特

我已经最近的工作取表从PDF。

Tabula 和 柯莱特 也不会为我工作，但 pdfplumber 得到了我所需要的结果。

import pdfplumber
pdf = pdfplumber.open(filepath)
table = pdf.pages[1].extract_table(table_settings=
{"vertical_strategy": "text", "horizontal_strategy": "text"})
df = pd.DataFrame(table, columns=table)
df.to_csv(outfile2, mode='a', index=False)

Mahmud Alptekin · Answer 1 · 2021-11-22T15:52:19

这种方法可能会帮助你： https://camelot-py.readthedocs.io/en/master/user/advanced.html#specify-column-separators

你可以找到指定的垂直分离柯莱特通过x坐标，首先你应该使用"。情节()"方法在卡美洛看表内pdf并注意到x坐标在哪里你想要的垂直分离器，然后通过他们在如下：

# to get the x-coordinates
tables = camelot.read_pdf('your_pdf.pdf')
camelot.plot(tables[0], kind='text').show()

#to pass the x-coordinates
camelot.read_pdf('your_pdf.pdf', flavor='stream', columns=['x1,x2'])

DS_ShraShetty · Answer 2 · 2021-11-27T11:30:02

表中没有检测到与tabula和柯莱特

我已经最近的工作取表从PDF。

Tabula 和 柯莱特 也不会为我工作，但 pdfplumber 得到了我所需要的结果。

import pdfplumber
pdf = pdfplumber.open(filepath)
table = pdf.pages[1].extract_table(table_settings=
{"vertical_strategy": "text", "horizontal_strategy": "text"})
df = pd.DataFrame(table, columns=table)
df.to_csv(outfile2, mode='a', index=False)

表中没有检测到与tabula和柯莱特

的问题

最好的答案

表中没有检测到与tabula和柯莱特

其他语言

此页面有其他语言版本

受欢迎的此类别

流行的问题，在这个类别