Nutrola's Open Food Nutrition Dataset: 500K+ Foods Available for Download
Download Nutrola's open food nutrition dataset with 500K+ verified entries including calories, macros, micronutrients, and serving sizes. Available in CSV and JSON for research, development, and education.
Good nutrition data is hard to find. Researchers waste weeks cleaning government databases. Developers write brittle scrapers that break every month. Students writing thesis papers settle for small, outdated samples because assembling a comprehensive dataset from scratch is not realistic on an academic timeline.
We built Nutrola's food database to power our calorie tracking app, and over the past three years we have invested heavily in making that data accurate, comprehensive, and well-structured. Today we are releasing a curated subset of that database as an open dataset: over 500,000 verified food entries available for free download in CSV and JSON formats.
This post covers everything you need to know about the dataset — what is in it, how to download it, the schema, licensing, quality methodology, and how it compares to other publicly available nutrition data sources.
What Is in the Dataset
The Nutrola Open Food Nutrition Dataset contains 500,000+ food entries spanning raw ingredients, generic foods, branded consumer products, and common restaurant items. Every entry has been verified through our multi-layer quality control pipeline, the same system described in detail in our post on how we built our food database.
Each food entry includes the following data points:
- Food name — the common name of the food item in English, with brand names where applicable
- Calories — energy content in kilocalories (kcal) per 100 grams and per serving
- Macronutrients — protein, total fat, saturated fat, trans fat, total carbohydrates, dietary fiber, total sugars, and added sugars, all in grams
- Micronutrients — 30+ vitamins and minerals including vitamin A, vitamin C, vitamin D, vitamin E, vitamin K, thiamin, riboflavin, niacin, vitamin B6, folate, vitamin B12, calcium, iron, magnesium, phosphorus, potassium, sodium, zinc, copper, manganese, selenium, and more
- Serving sizes — standard serving size description (e.g., "1 medium apple," "1 cup cooked"), serving weight in grams, and up to three alternative serving sizes per food
- Food category — hierarchical classification using our internal taxonomy (e.g., Dairy > Cheese > Hard Cheese)
- Country of origin — the primary country or region where the food product is sold or the ingredient is commonly consumed
- Barcode (where available) — UPC or EAN codes for branded products
- Data source tags — provenance indicators showing whether the entry originated from government databases, manufacturer data, laboratory analysis, or our internal verification team
Sample Data
Here is a selection of entries from the dataset to give you a sense of the structure and detail:
| food_id | food_name | category | country | calories_per_100g | protein_g | fat_g | carbs_g | fiber_g | serving_desc | serving_g |
|---|---|---|---|---|---|---|---|---|---|---|
| NF-001247 | Chicken Breast, Raw, Skinless | Poultry > Chicken | US | 120 | 22.5 | 2.6 | 0.0 | 0.0 | 1 breast (174g) | 174 |
| NF-008391 | Fage Total 0% Greek Yogurt | Dairy > Yogurt > Greek | GR | 54 | 10.3 | 0.0 | 3.0 | 0.0 | 1 container (150g) | 150 |
| NF-014205 | Basmati Rice, White, Cooked | Grains > Rice | IN | 130 | 2.7 | 0.3 | 28.2 | 0.4 | 1 cup (158g) | 158 |
| NF-022876 | Avocado, Hass, Raw | Fruits > Tropical | MX | 160 | 2.0 | 14.7 | 8.5 | 6.7 | 1/2 avocado (68g) | 68 |
| NF-031560 | Barilla Penne Rigate, Dry | Pasta > Dried | IT | 359 | 12.5 | 2.0 | 71.2 | 3.0 | 2 oz (56g) | 56 |
| NF-045892 | Kimchi, Traditional Napa Cabbage | Vegetables > Fermented | KR | 15 | 1.1 | 0.5 | 2.4 | 1.6 | 1/2 cup (75g) | 75 |
| NF-053714 | Salmon, Atlantic, Raw, Farmed | Fish > Salmon | NO | 208 | 20.4 | 13.4 | 0.0 | 0.0 | 1 fillet (113g) | 113 |
| NF-067283 | Chickpeas, Canned, Drained | Legumes > Beans | US | 119 | 6.3 | 2.0 | 18.2 | 5.4 | 1/2 cup (120g) | 120 |
The full dataset includes many more columns for micronutrients, alternative serving sizes, barcode data, and source tags. The table above shows the core nutritional fields.
Data Formats
The dataset is available in two formats:
CSV
The CSV file uses UTF-8 encoding with comma delimiters. The first row contains column headers. Fields that contain commas are enclosed in double quotes. Null values are represented as empty fields.
The CSV format is ideal for spreadsheet tools like Excel and Google Sheets, statistical software like R and SPSS, and quick data exploration with command-line tools like csvkit or xsv.
File: nutrola-open-food-dataset-v3.csv (approximately 210 MB uncompressed, 48 MB gzipped)
JSON
The JSON file contains an array of objects, one per food entry. Nested objects are used for structured fields like serving sizes (which contain a description, gram weight, and milliliter equivalent where applicable) and micronutrient profiles.
The JSON format is better suited for application development, database imports, and any workflow where you need to preserve the hierarchical structure of serving sizes and nutrient groups.
File: nutrola-open-food-dataset-v3.json (approximately 340 MB uncompressed, 62 MB gzipped)
Both files are also available as gzip-compressed archives to reduce download times.
Data Schema
Here is the full schema with descriptions for every field in the dataset:
| Field Name | Type | Description |
|---|---|---|
food_id |
string | Unique Nutrola identifier for the food entry (format: NF-XXXXXX) |
food_name |
string | Common name of the food, including brand where applicable |
category_l1 |
string | Top-level food category (e.g., Dairy, Grains, Fruits) |
category_l2 |
string | Second-level category (e.g., Cheese, Rice, Tropical) |
category_l3 |
string | Third-level category where applicable (e.g., Hard Cheese, Brown Rice) |
country |
string | ISO 3166-1 alpha-2 country code indicating primary market |
brand |
string | Brand name for branded products; null for generic foods |
barcode |
string | UPC/EAN barcode; null if not applicable |
calories_per_100g |
float | Energy in kcal per 100 grams |
protein_g |
float | Protein in grams per 100g |
fat_total_g |
float | Total fat in grams per 100g |
fat_saturated_g |
float | Saturated fat in grams per 100g |
fat_trans_g |
float | Trans fat in grams per 100g |
carbs_total_g |
float | Total carbohydrates in grams per 100g |
fiber_g |
float | Dietary fiber in grams per 100g |
sugars_total_g |
float | Total sugars in grams per 100g |
sugars_added_g |
float | Added sugars in grams per 100g |
sodium_mg |
float | Sodium in milligrams per 100g |
cholesterol_mg |
float | Cholesterol in milligrams per 100g |
vitamin_a_mcg |
float | Vitamin A in micrograms RAE per 100g |
vitamin_c_mg |
float | Vitamin C in milligrams per 100g |
vitamin_d_mcg |
float | Vitamin D in micrograms per 100g |
calcium_mg |
float | Calcium in milligrams per 100g |
iron_mg |
float | Iron in milligrams per 100g |
potassium_mg |
float | Potassium in milligrams per 100g |
magnesium_mg |
float | Magnesium in milligrams per 100g |
zinc_mg |
float | Zinc in milligrams per 100g |
phosphorus_mg |
float | Phosphorus in milligrams per 100g |
selenium_mcg |
float | Selenium in micrograms per 100g |
vitamin_b6_mg |
float | Vitamin B6 in milligrams per 100g |
vitamin_b12_mcg |
float | Vitamin B12 in micrograms per 100g |
folate_mcg |
float | Folate in micrograms DFE per 100g |
vitamin_e_mg |
float | Vitamin E in milligrams per 100g |
vitamin_k_mcg |
float | Vitamin K in micrograms per 100g |
thiamin_mg |
float | Thiamin (B1) in milligrams per 100g |
riboflavin_mg |
float | Riboflavin (B2) in milligrams per 100g |
niacin_mg |
float | Niacin (B3) in milligrams per 100g |
copper_mg |
float | Copper in milligrams per 100g |
manganese_mg |
float | Manganese in milligrams per 100g |
serving_1_desc |
string | Primary serving size description (e.g., "1 cup cooked") |
serving_1_g |
float | Primary serving size weight in grams |
serving_2_desc |
string | Alternative serving size description; null if not available |
serving_2_g |
float | Alternative serving size weight in grams |
serving_3_desc |
string | Second alternative serving size description; null if not available |
serving_3_g |
float | Second alternative serving size weight in grams |
data_source |
string | Provenance tag: "government", "manufacturer", "laboratory", or "verified_community" |
last_verified |
string | ISO 8601 date when the entry was last verified (YYYY-MM-DD) |
dataset_version |
string | Dataset version identifier (e.g., "v3.0") |
All nutrient values are expressed per 100 grams to allow consistent comparisons. To calculate nutrients per serving, multiply the per-100g value by the serving weight in grams and divide by 100.
How to Download
The dataset is hosted on our public GitHub repository:
github.com/nutrola/open-food-nutrition-dataset
You can download the files directly from the GitHub Releases page, or clone the repository:
git clone https://github.com/nutrola/open-food-nutrition-dataset.git
For the compressed versions:
# Download CSV (gzipped)
wget https://github.com/nutrola/open-food-nutrition-dataset/releases/latest/download/nutrola-open-food-dataset-v3.csv.gz
# Download JSON (gzipped)
wget https://github.com/nutrola/open-food-nutrition-dataset/releases/latest/download/nutrola-open-food-dataset-v3.json.gz
The repository also contains:
- A detailed
README.mdwith quickstart instructions - A
CHANGELOG.mddocumenting changes between dataset versions - A
scripts/directory with Python and R example scripts for loading, filtering, and analyzing the data - A
schema/directory with JSON Schema and CSV dialect definitions
If you need the full 3 million+ entry database with real-time updates rather than periodic snapshots, see our Nutrition Data API for developer access.
Use Cases
Academic Research
Nutrition researchers can use the dataset for dietary pattern analysis, epidemiological modeling, and nutrient density studies without spending weeks cleaning and merging government data files. The hierarchical category system makes it straightforward to filter by food groups, and the country field enables cross-cultural comparisons.
Published research using the dataset should cite it as: Nutrola Open Food Nutrition Dataset, v3.0 (2026). Available at github.com/nutrola/open-food-nutrition-dataset. Licensed under CC BY-SA 4.0.
Application Development
Developers building health, fitness, or food-related applications can use the dataset as a local food database. The consistent schema and serving size data mean you can build a functional food logging feature without relying on a live API connection. This is particularly useful for offline-first mobile apps, prototyping, and hackathon projects.
The CSV format loads directly into SQLite, PostgreSQL, or any relational database. The JSON format maps cleanly to document stores like MongoDB or Firestore.
Data Science and Machine Learning
The dataset is well-suited for training and evaluating machine learning models related to food and nutrition. Common applications include:
- Food classification models — use the category hierarchy as training labels to build classifiers that predict food categories from names or nutrition profiles
- Nutrition estimation — train regression models that predict calorie or macro content from partial information (e.g., estimating calories from protein, fat, and carb ratios)
- Recommendation systems — build food recommendation engines that suggest nutritionally similar alternatives
- Anomaly detection — identify unusual nutrition profiles that might indicate data quality issues in other datasets
Education
Nutrition science students and educators can use the dataset for coursework, labs, and assignments. The breadth of the data — covering foods from dozens of countries and spanning every major food group — makes it useful for teaching concepts like macronutrient ratios, micronutrient density, and how nutrition profiles vary across cuisines and food processing levels.
Public Health and Policy
Public health organizations can use the data to analyze the nutritional landscape of specific food categories or markets. The country field allows filtering by region, and the brand field enables analysis of branded vs. generic food nutrition quality.
Data Quality Methodology
Releasing an open dataset means nothing if the data is not trustworthy. Here is how we ensure quality across the 500,000+ entries in this release.
Multi-Source Verification
Every entry in the dataset has been verified against at least two independent sources. Our primary data sources include:
- Government nutrition databases — USDA FoodData Central (United States), CoFID (United Kingdom), NUTTAB (Australia), CNF (Canada), and equivalent databases from 20+ countries
- Manufacturer-provided data — nutrition facts panels submitted directly by food manufacturers through our brand partnership program
- Laboratory analysis — independent lab testing conducted by our team for high-volume foods where source data is conflicting or outdated
- Verified community submissions — user-submitted entries that have passed our three-step verification process (automated cross-referencing, expert review, and statistical outlier detection)
Automated Quality Checks
Every entry passes through a battery of automated checks before it enters the dataset:
- Energy balance validation — the calorie count is cross-checked against the Atwater calculation (4 kcal/g protein + 9 kcal/g fat + 4 kcal/g carbohydrate). Entries where the stated calories deviate from the calculated value by more than 10% are flagged for manual review.
- Range checks — every nutrient value is validated against physiologically plausible ranges for the food category. A cheese entry claiming 0 grams of fat or a fruit entry claiming 50 grams of protein gets flagged immediately.
- Cross-entry consistency — similar foods are compared statistically. If a new chicken breast entry has significantly different values from the existing cluster of chicken breast entries, it is held for review.
- Serving size validation — serving weights are checked against known standard portions. A "1 medium apple" claiming to weigh 500 grams does not pass.
Human Review
Entries flagged by automated checks go through manual review by our data team, which includes credentialed nutritionists and food scientists. Approximately 12% of entries require some form of manual correction before they are approved.
Ongoing Maintenance
The dataset is not a one-time dump. We re-verify entries on a rolling basis, prioritizing high-volume foods (those most frequently logged by Nutrola users) and entries whose source data has been updated. When a food manufacturer reformulates a product, we catch the change through our barcode monitoring system and update the entry accordingly.
Update Frequency
We publish new versions of the open dataset quarterly. Each release includes:
- New food entries added since the previous version
- Corrections to existing entries identified through our quality monitoring
- Updated nutrition data for reformulated products
- Expanded micronutrient coverage where new source data becomes available
The current version is v3.0, released in March 2026. Version history and changelogs are available in the GitHub repository.
If you need data that is updated more frequently than quarterly, our Nutrition Data API reflects changes within 48 hours.
License
The Nutrola Open Food Nutrition Dataset is released under the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.
This means you are free to:
- Share — copy and redistribute the dataset in any medium or format
- Adapt — remix, transform, and build upon the dataset for any purpose, including commercial use
Under the following terms:
- Attribution — you must give appropriate credit to Nutrola, provide a link to the license, and indicate if changes were made
- ShareAlike — if you remix, transform, or build upon the dataset, you must distribute your contributions under the same CC BY-SA 4.0 license
We chose CC BY-SA 4.0 because it strikes the right balance between openness and ensuring that improvements flow back to the community. If you build a better version of this data, the license ensures that your improvements remain available to everyone else too.
How It Compares to Other Datasets
There are several publicly available nutrition datasets. Here is how the Nutrola Open Food Nutrition Dataset compares to the two most widely used alternatives.
vs. USDA FoodData Central
USDA FoodData Central is the gold standard for nutrition data in the United States. It is thorough, well-documented, and backed by laboratory analysis. However, it has limitations that the Nutrola dataset addresses:
| Dimension | USDA FoodData Central | Nutrola Open Dataset |
|---|---|---|
| Total entries | ~400,000 (Foundation, SR Legacy, Branded combined) | 500,000+ |
| Geographic coverage | Primarily United States | 47 countries |
| Branded products | US brands only, often outdated | International brands, verified quarterly |
| Data format | Multiple incompatible file formats, complex relational structure | Single CSV or JSON file, flat structure |
| Serving sizes | Inconsistent across sub-databases | Standardized format with up to 3 servings per food |
| Ease of use | Requires significant data engineering to merge sub-databases | Download one file and start working |
| Update frequency | Varies by sub-database (annually for some) | Quarterly |
If your work is focused exclusively on US foods and you need the deepest possible nutrient profile (USDA covers 150+ nutrients for Foundation foods), FoodData Central is the better choice. If you need international coverage, consistent formatting, and a dataset that works out of the box, the Nutrola dataset is the stronger option.
The two datasets are complementary. Many researchers use USDA Foundation data for detailed US nutrient analysis and supplement it with Nutrola data for international coverage and branded products.
vs. Open Food Facts
Open Food Facts is a crowdsourced database with over 3 million entries. It has impressive scale and covers products from many countries. However, its crowdsourced nature introduces data quality challenges:
| Dimension | Open Food Facts | Nutrola Open Dataset |
|---|---|---|
| Total entries | 3M+ | 500,000+ |
| Data quality | Variable — crowdsourced with automated checks | Verified — multi-source, human-reviewed |
| Completeness | Many entries missing macro/micro data | All entries have complete macro data; 90%+ have full micro profiles |
| Serving sizes | Inconsistent, often missing | Standardized, always present |
| Category taxonomy | Crowdsourced tags, inconsistent | Hierarchical, curated taxonomy |
| Nutrient coverage | Varies widely per entry | Consistent 40+ nutrients across all entries |
| Data format | MongoDB dump, complex nested JSON | Clean CSV and JSON |
| License | Open Database License (ODbL) | CC BY-SA 4.0 |
Open Food Facts excels at breadth — if you need to look up a specific obscure product by barcode, they likely have it. The Nutrola dataset excels at depth and consistency — every entry meets the same quality bar, making it more reliable for quantitative analysis where data gaps or errors can skew results.
If you are building a barcode scanner app and need maximum product coverage, Open Food Facts is a good starting point. If you are training a machine learning model, conducting statistical research, or building an app where nutrition accuracy matters, the Nutrola dataset's verified data will give you a stronger foundation.
Getting Started
Once you have downloaded the dataset, here is a quick example of loading and exploring it in Python:
import pandas as pd
# Load the dataset
df = pd.read_csv("nutrola-open-food-dataset-v3.csv")
# Basic overview
print(f"Total entries: {len(df):,}")
print(f"Countries covered: {df['country'].nunique()}")
print(f"Food categories (L1): {df['category_l1'].nunique()}")
# Find high-protein, low-calorie foods
high_protein = df[
(df["protein_g"] > 20) &
(df["calories_per_100g"] < 150)
].sort_values("protein_g", ascending=False)
print(high_protein[["food_name", "calories_per_100g", "protein_g"]].head(10))
# Analyze average macros by food category
category_macros = df.groupby("category_l1").agg({
"calories_per_100g": "mean",
"protein_g": "mean",
"fat_total_g": "mean",
"carbs_total_g": "mean"
}).round(1)
print(category_macros.sort_values("calories_per_100g", ascending=False))
More examples — including R scripts, SQL import guides, and Jupyter notebooks — are available in the scripts/ directory of the GitHub repository.
Frequently Asked Questions
Is the dataset really free to use?
Yes. The Nutrola Open Food Nutrition Dataset is released under the CC BY-SA 4.0 license, which permits commercial and non-commercial use. The only requirements are that you credit Nutrola as the source and that any derivative datasets you distribute use the same license. There are no API keys, no usage limits, and no registration required to download the files.
How often is the dataset updated?
We publish new versions quarterly. Each release adds new food entries, corrects any errors identified since the previous version, and updates entries for products that have been reformulated. The GitHub repository's Releases page has the full version history, and you can watch the repository to be notified when new versions are published.
Can I use this dataset to build a commercial app?
Yes. The CC BY-SA 4.0 license explicitly allows commercial use. You can use the data in a paid app, a SaaS product, or any other commercial context. You must include attribution to Nutrola in your app or documentation, and if you distribute a modified version of the dataset itself, the modified version must also be licensed under CC BY-SA 4.0. Using the data within your app (without redistributing the raw dataset) does not trigger the ShareAlike requirement.
Why only 500K entries when Nutrola's full database has 3 million+?
The open dataset contains entries that we can release under an open license without restrictions. Our full database includes data from proprietary sources — direct manufacturer partnerships, licensed laboratory data, and other sources with contractual limitations on redistribution. The 500K entries in the open dataset come from government databases, our own laboratory analysis, and community submissions where contributors agreed to open licensing. If you need access to the full database, our Nutrition Data API provides it under separate commercial terms.
What should I do if I find an error in the dataset?
Open an issue on the GitHub repository with the food_id of the affected entry and a description of the error. Include a source link if you have one (e.g., a manufacturer's website showing different nutrition facts). Our data team reviews reported issues weekly, and confirmed corrections are included in the next quarterly release. For urgent corrections, we may push a patch release between quarterly updates.
How does this relate to the Nutrola Nutrition Data API?
The open dataset is a static quarterly snapshot of a curated subset of our database. The API provides real-time access to the full 3 million+ entry database with search, filtering, barcode lookup, and other features. Think of the open dataset as the foundation for offline or batch use cases, and the API as the solution for production applications that need live data. Many developers start with the open dataset for prototyping and migrate to the API when they go to production.
Ready to Transform Your Nutrition Tracking?
Join thousands who have transformed their health journey with Nutrola!