I need to get a few facts from SEC 10-K filings for e.g. gross revenue, gross profit, gross margin, operating expenses etc. along with the corresponding context.
For filings like https://www.sec.gov/Archives/edgar/data/1318605/000156459018002956/tsla-20171231.xml , it seems feasible to just use XPath to find out the few required elements and the values. But there are filings like (https://www.sec.gov/Archives/edgar/data/19617/000001961718000057/jpm-20171231.xml) where total expense is broken up in different segments with an extension taxonomy.
My question is
In any case, if doing it simply with XPath is possible I'd prefer that. Validity of the xbrl document is not important.
The most reliable way to work with XBRL files is to use an XBRL processing library. There are a few in Java, some proprietary (with a fee) and some open source.
There is a maintained list of tools and services on xbrl.org:
https://www.xbrl.org/the-standard/how/tools-and-services/
As far as I know, the SEC documents are reliable, widely consumed by a lot of people and tested on many processors. If there is a problem with UBMatrix such as a null pointer exception, I recommend reaching out to them and letting them know so they can address it.
It is definitely (in theory) possible to use XPath/XQuery/XSLT as well, since XBRL uses XML syntax, but you need to be aware that by resolving the contexts (which is a join in relational terms), you would be in fact re-implementing an incomplete XBRL processor from scratch, with the risks of bugs and sunk costs that go with it. There are a lot of subtleties and an ecosystem of specifications in addition to the core XBRL one (e.g., Dimensions, ...) to take into account in order to not retrieve the wrong values. By using an existing processor, you are building on top of the efforts that other people already invested into doing so, in order to get all the XBRL semantics right: this is a benefit of XBRL being a standard.
As a final remark: the exact XBRL tags used for gross revenue, gross profit, etc, may vary from company to company, because some use their own tags (extensions) and not the US-GAAP tags. Also, some companies omit some facts that need to be computed by consumers based on other facts. This can be addressed using mappings and formulas on top of the XBRL processor. Charles Hoffman shared reports on the matter with a lot of useful advice, and maintains such mappings online (keywords to search for this are: fundamental accounting concepts, report frames).
Depending on what you're looking to do with the data, I would recommend looking at the XBRL US API. This provides API access to all SEC filings, and makes the data available in JSON. You can get a free API key for "private, non-commercial research and development".
I'd also look at the Arelle open source project, which is an XBRL processor written in Python. In particular, there is a plugin for it which will provide the data in xBRL-JSON format, which you will probably find much easier to work with than the raw XML files, and will take care of the complexity of processing these that Ghislain refers to.
There exists a dedicated API returning fundamentals of companies in JSON format. You can use the method getFundamentals(symbol)
provided by the package eodhistoricaldata-api
(https://www.npmjs.com/package/eodhistoricaldata-api).
The method returns quarterly and yearly financials (Balance Sheet, Cash Flow, Income Statements), including gross revenue, gross profit, gross margin, operating expenses, etc.
For example:
"Highlights": {
"MarketCapitalization": 54915055616,
"MarketCapitalizationMln": "54915.0556",
"EBITDA": 616286976,
"PERatio": null,
"PEGRatio": "-1.5700",
"WallStreetTargetPrice": "321.8900",
"BookValue": "26.2790",
"DividendShare": null,
"DividendYield": null,
"EarningsShare": "-4.8500",
"EPSEstimateCurrentYear": "-6.5600",
"EPSEstimateNextYear": "-2.0000",
"EPSEstimateNextQuarter": "-1.6700",
"MostRecentQuarter": "2018-09-30",
"ProfitMargin": "-0.1022",
"OperatingMarginTTM": "-0.0710",
"ReturnOnAssetsTTM": "-0.0271",
"ReturnOnEquityTTM": "-0.3397",
"RevenueTTM": "17523644416.00",
"RevenuePerShareTTM": "103.3240",
"QuarterlyRevenueGrowthYOY": "1.2860",
"GrossProfitTTM": "2222487000.00",
"DilutedEpsTTM": "-10.5600",
"QuarterlyEarningsGrowthYOY": null
},
// ...
"Income_Statement": {
"currency_symbol": "USD",
"quarterly": {
"2018-09-30": {
"date": "2018-09-30",
"filing_date": "2018-11-02",
"researchDevelopment": "350848000.00",
"effectOfAccountingCharges": null,
"incomeBeforeTax": "271320000.00",
"minorityInterest": "1344731000.00",
"netIncome": "311516000.00",
"sellingGeneralAdministrative": "729876000.00",
"grossProfit": "1523665000.00",
"ebit": "442941000.00",
"operatingIncome": "442941000.00",
"otherOperatingExpenses": null,
"interestExpense": "-169858000.00",
"extraordinaryItems": null,
"nonRecurring": null,
"otherItems": null,
"incomeTaxExpense": "16647000.00",
"totalRevenue": "6824413000.00",
"totalOperatingExpenses": "6381472000.00",
"costOfRevenue": "5300748000.00",
"totalOtherIncomeExpenseNet": "-171621000.00",
"discontinuedOperations": null,
"netIncomeFromContinuingOps": "254673000.00",
"netIncomeApplicableToCommonShares": "311516000.00"
},
// ...
"Balance_Sheet": {
"currency_symbol": "USD",
"quarterly": {
"2018-09-30": {
"date": "2018-09-30",
"filing_date": "2018-11-02",
"intangibleAssets": "291476000.00",
"totalLiab": "23409144000.00",
"totalStockholderEquity": "4508838000.00",
"deferredLongTermLiab": "0.00",
"otherCurrentLiab": "2266778000.00",
"totalAssets": "29262713000.00",
"commonStock": "171000.00",
"otherCurrentAssets": "158627000.00",
"retainedEarnings": "-5457315000.00",
"otherLiab": "2285172000.00",
"goodWill": "65226000.00",
"otherAssets": "1233979000.00",
"cash": "2967504000.00",
"totalCurrentLiabilities": "9775324000.00",
"shortLongTermDebt": "2106538000.00",
"otherStockholderEquity": "8271000.00",
"propertyPlantEquipment": "19733969000.00",
"totalCurrentAssets": "7920491000.00",
"longTermInvestments": "17572000.00",
"netTangibleAssets": "4152136000.00",
"shortTermInvestments": "0.00",
"netReceivables": "1155001000.00",
"longTermDebt": "9726589000.00",
"inventory": "3314127000.00",
"accountsPayable": "3596984000.00",
"totalPermanentEquity": "0.00",
"noncontrollingInterestInConsolidatedEntity": "0.00",
"temporaryEquityRedeemableNoncontrollingInterests": "0.00",
"accumulatedOtherComprehensiveIncome": "0.00",
"additionalPaidInCapital": "0.00",
"commonStockTotalEquity": "0.00",
"preferredStockTotalEquity": "0.00",
"retainedEarningsTotalEquity": "0.00",
"treasuryStock": "0.00"
},
// ...
"Cash_Flow": {
"currency_symbol": "USD",
"quarterly": {
"2018-09-30": {
"date": "2018-09-30",
"filing_date": "2018-11-02",
"investments": null,
"changeToLiabilities": "895197000.00",
"totalCashflowsFromInvestingActivities": "-560965000.00",
"netBorrowings": "-221931000.00",
"totalCashFromFinancingActivities": "-84218000.00",
"changeToOperatingActivities": "98770000.00",
"netIncome": "311516000.00",
"changeInCash": "739728000.00",
"totalCashFromOperatingActivities": "1391281000.00",
"depreciation": "502825000.00",
"otherCashflowsFromInvestingActivities": "128600000.00",
"dividendsPaid": "0.00",
"changeToInventory": "-55055000.00",
"changeToAccountReceivables": "-587594000.00",
"salePurchaseOfStock": "0.00",
"otherCashflowsFromFinancingActivities": "42839000.00",
"changeToNetincome": "179168000.00",
"capitalExpenditures": "-559765000.00"
},
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With