HF Space stuck at Starting

My private HuggingFace space got stuck at Starting. i tried restarting it more than 6-7 times and even factory rebuilt a few times. Still teh same issue. Its still stuck at Starting. However, the logs shows this:

===== Application Startup at 2025-11-27 21:10:32 =====

2025-11-27 21:10:52,072 | INFO | Scientry._init_logger | Logger initialized — writing logs to logs/scientry_2025-11-27_21-10-52.log

2025-11-27 21:10:52,073 | INFO | Scientry.config.env_config.validate | :white_check_mark: Environment configuration validated successfully.

* Running on local URL: http://127.0.0.1:7860

* To create a public link, set `share=True` in `launch()`.

I’m not sure how’s this happening

1 Like

Judging solely from the visible symptoms, the Space process has started normally but communication between the process and the GUI is not working. Probably an address or port configuration error.


The short explanation in your specific case:

  • Your app inside the container is starting correctly (hence the logs and 127.0.0.1:7860).

  • Hugging Face’s health checker sits outside the container and cannot see a healthy HTTP endpoint, so the Space never flips from “Starting” to “Running”.

  • The strongest hint is that it says:

    Running on local URL: http://127.0.0.1:7860

    On Spaces, that should effectively be 0.0.0.0:7860, not 127.0.0.1.

Below is a detailed breakdown: what “Starting” really means, how to interpret your logs, the main causes seen in the wild, and concrete code-level fixes/workarounds.


1. What “Starting” means for a Space

When Hugging Face runs a Space, there are two layers:

  1. Your process inside a container

    • Runs your Python app, Gradio/Streamlit server, FastAPI, etc.
    • Its logs are what you see in the “Logs” tab.
  2. The platform health checker

    • Starts the container.
    • Probes a specific port (usually 7860) and path (usually /).
    • Only when it gets a non-404 HTTP response does it mark the Space as “Running”.(Hugging Face)

This means you can have:

  • Perfectly normal logs inside the container (like yours),
  • But the external checker still never sees a valid HTTP server,
  • So the UI shows “Starting…” for a long time and may eventually show a timeout error (space was not healthy after 30 min).(Hugging Face Forums)

Your log snippet is exactly that pattern.


2. Interpreting your log lines

You wrote:

===== Application Startup at 2025-11-27 21:10:32 =====

2025-11-27 21:10:52,072 | INFO | Scientry._init_logger | Logger initialized — writing logs to logs/scientry_2025-11-27_21-10-52.log

2025-11-27 21:10:52,073 | INFO | Scientry.config.env_config.validate | :white_check_mark: Environment configuration validated successfully.

* Running on local URL: http://127.0.0.1:7860

* To create a public link, set `share=True` in `launch()`.

What each part means:

  • Application Startup + Scientry._init_logger + env_config.validate
    → Your code starts, config is validated, no obvious crash yet.

  • Running on local URL: http://127.0.0.1:7860
    → This is the standard Gradio-style message (or something Gradio-like under your Scientry wrapper):
    → It started an HTTP server bound to loopback only (127.0.0.1) on port 7860.

  • On a local machine, 127.0.0.1:7860 is fine. On Spaces, however, the app should bind to 0.0.0.0 so the health checker and HF proxy can reach it.

There is a very close forum report where a user’s Space was also stuck; their logs showed:

Another user replied:

I think this is the cause. The contents of Gradio (server process) are waiting at a different address.(Hugging Face Forums)

Exactly the same symptom: app listens on 127.0.0.1, HF expects 0.0.0.0.

So:

Everything in your snippet says “the app is alive” but “only reachable inside itself”, which is why the external HF health check sees nothing and leaves the Space in “Starting”.


3. The main causes that fit your case

3.1 Bound to 127.0.0.1 instead of 0.0.0.0 ← most likely for you

Inside a container, these addresses are different:

  • 127.0.0.1 = “only allow connections from this process/container itself”
  • 0.0.0.0 = “listen on all interfaces”, including the one HF uses to probe your app

On Spaces, HF expects your web server to listen on 0.0.0.0, not just loopback. Forum answers explicitly point to the wrong listening address (127.0.0.1) as the reason a Space stays stuck.(Hugging Face Forums)

If Gradio decides to bind to 127.0.0.1 (the “local URL” default), the app will work if you exec into the container and curl 127.0.0.1:7860, but HF’s health checker, which connects from outside, will never see it.

Fix (conceptual):
Change your app so that it binds to 0.0.0.0 explicitly.

Typical fixes for common stacks:

  • Gradio / Blocks / Interface

    import gradio as gr
    
    with gr.Blocks() as demo:
        # your UI here
        ...
    
    if __name__ == "__main__":
        demo.launch(
            server_name="0.0.0.0",  # critical
            server_port=7860,       # HF default
        )
    

    Or rely on env vars:

    export GRADIO_SERVER_NAME=0.0.0.0
    python app.py
    

    Multiple users on the HF forums report that switching to 0.0.0.0 and port 7860 resolved Spaces that kept timing out while logs looked fine.(Hugging Face Forums)

  • FastAPI + uvicorn

    import uvicorn
    from app import app
    
    if __name__ == "__main__":
        uvicorn.run(app, host="0.0.0.0", port=7860)
    
  • Flask

    if __name__ == "__main__":
        app.run(host="0.0.0.0", port=7860)
    

Wherever you see 127.0.0.1, localhost, or no host specified, adjust to 0.0.0.0.


3.2 Port mismatch: app not really on 7860

Hugging Face’s default expectation is:

  • Managed Spaces (Gradio/Streamlit etc.): app listens on port 7860.
  • Docker Spaces: app listens on the port specified as app_port in the README YAML (default 7860).(Hugging Face)

If your app runs on, say, port 8000 or 8501, but HF probes port 7860, it will:

  • See nothing on 7860,
  • Treat the Space as unhealthy and eventually time out.

There are several concrete reports where the only fix was to add or correct app_port in README:

  • Node.js Space that kept timing out until they set:

    ---
    sdk: docker
    app_port: 8080
    ---
    

    After that, it worked.(Hugging Face Forums)

  • A Streamlit/FastAPI Docker Space where the HF engineer explicitly said:

    maybe you’re missing app_port: 8501 on the README.md YAML?(Hugging Face Forums)

Fix:

  • If this is a non-Docker (managed) Space, make sure your app uses server_port=7860 or equivalent.

  • If this is a Docker Space:

    • Ensure README front matter contains:

      ---
      sdk: docker
      app_port: 7860  # or whatever port your container uses
      ---
      
    • Ensure your Dockerfile exposes and runs the app on that same port:

      EXPOSE 7860
      CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
      

If your logs show 127.0.0.1:7860, you are probably already on port 7860, but it is worth checking you haven’t changed it elsewhere.


3.3 Root path / returns 404 or never responds

For non-Gradio apps (FastAPI, Flask, Node, custom servers), HF’s health checker very likely calls GET / on the monitored port and expects:

  • A response with any 2xx/3xx status
  • Not a 404 or repeated error.

In at least one “Launch timed out, space was not healthy” thread, users confirm that the fix was:

  • Run the app on port 7860,
  • And ensure / returns a valid response (not 404).(Hugging Face Forums)

If your app only defines /api/... or similar and nothing at /, HF may consider the Space unhealthy even though the app is technically running.

Fix:

  • For FastAPI:

    from fastapi import FastAPI
    
    app = FastAPI()
    
    @app.get("/")
    def root():
        return {"status": "ok"}
    
  • For Flask:

    @app.route("/")
    def root():
        return "OK"
    

Gradio already serves a root page, so this mainly affects custom backends, not pure Gradio apps.


3.4 Space starts but then dies (crash or permission error)

Another family of “Starting forever” issues: the server starts, prints a normal log, but then:

  • Crashes due to a filesystem or permission problem, or
  • Exits because the main thread stops.

Examples:

  • Docker + Streamlit Space with PermissionError: [Errno 13] Permission denied: '/.streamlit'. The app tried to write to / (root) which is not allowed in Spaces. The solution was to refactor Dockerfile to use a writable workdir (/app) and follow the official example.(Hugging Face Forums)

  • Spaces where Dockerfile sets an unwritable working directory, or users mismatch container user / file ownership; logs show almost nothing, but the health check never sees a stable server.(Hugging Face Forums)

In your snippet, nothing yet indicates a crash, but it is worth:

  • Scrolling further down in the logs to see if any traceback appears.
  • Making sure you are not writing logs or data to forbidden paths like / instead of /data, /app, or /tmp.

3.5 App takes too long to warm up

If your app downloads huge models (tens of GB) or does expensive work before starting the HTTP server, Spaces can:

  • Stay in “Starting” for a long time,
  • End with: Launch timed out, space was not healthy after 30 min.(Hugging Face Forums)

This is particularly common with big LLMs or large DB downloads.

The pattern in your log, with instant startup and a visible server line, does not look like this, so I would treat slow startup as a secondary suspicion, not the primary one.


3.6 Platform/infrastructure issue on HF’s side

There are also real cases where everything is configured correctly and yet Spaces get stuck in “Starting”:

  • Example: the popular multimodalart/dreambooth-training Space had duplicates where logs showed:

    Running on local URL: http://0.0.0.0:7860

    But the Space always timed out as unhealthy. The maintainer later said this was due to an HF problem with mounting new Spaces, which was fixed on their side.(Hugging Face)

  • Other threads mention prolonged periods where many Spaces fail due to capacity or abuse (e.g. cryptominers draining CPU), again resolved by HF infra changes.(Hugging Face Forums)

When to suspect this:

  • A brand new minimal template Space (pure Gradio) also gets stuck on your account.
  • Your app is definitely binding to 0.0.0.0:7860 and / responds 200.
  • It used to work and suddenly doesn’t, without code changes.

In that scenario, opening a topic on the HF forum with:

  • Space URL (even if private, you can at least share the name with HF staff),
  • Startup logs showing correct host/port,
  • Note that a minimal template also fails,

is typically the right escalation path.


4. Concrete checklist for your Space

Here is an actionable sequence you can follow.

Step 1 – Fix host and port in your app code

  1. Find the place where you call launch() (for Gradio) or start your server.

  2. Change it to something like:

    If you use Gradio (most likely):

    demo.launch(
        server_name="0.0.0.0",   # not 127.0.0.1
        server_port=7860,
    )
    

    or:

    import os
    demo.launch(
        server_name="0.0.0.0",
        server_port=int(os.environ.get("PORT", 7860)),
    )
    
  3. Commit and push to the Space, then restart once.

This directly addresses the 127.0.0.1 symptom that has been explicitly identified as the cause in a similar forum thread.(Hugging Face Forums)

Step 2 – If Docker: align app_port, EXPOSE, and CMD

If your Space uses sdk: docker:

  1. Make sure the top of README.md looks like:

    ---
    sdk: docker
    app_port: 7860
    ---
    

    (or the port you actually use).

  2. In Dockerfile:

    EXPOSE 7860
    
    CMD ["python", "app.py"]
    # and app.py must run on host 0.0.0.0, port 7860
    

Examples (Streamlit, Node, etc.) and docs show that forgetting app_port is enough to cause “Starting”/timeout behavior, even when the app looks healthy in logs.(Hugging Face)

Step 3 – Ensure / returns something (for non-Gradio apps)

If you are not using Gradio for the external page:

  • Add a small root route that returns HTTP 200 at /.
  • Verify locally: curl http://127.0.0.1:7860/ returns something other than 404.

Then push and restart.

Step 4 – Check for silent crashes and permission issues

After you fix host/port:

  • If the Space is still “Starting”, scroll further down in the logs:

    • Look for tracebacks (Traceback (most recent call last)) or PermissionError.
    • Check you aren’t writing configuration or cache to /.something; use /tmp or your working directory instead.(Hugging Face Forums)

If you see a permission error, move that folder under /app, /data, or /tmp, and adjust your code.

Step 5 – Cross-check with a minimal template Space

Create a new, minimal Space (public or private) with:

  • The default Gradio “hello world” from HF’s UI.
  • No extra dependencies.

If that new Space:

  • Quickly reaches “Running” → your account and infra are fine; the issue is in your code/config.
  • Gets stuck at “Starting” too → you likely hit a platform issue and should contact HF with both Space names and logs.(Hugging Face)

5. Summary

  • Your logs show a server running on http://127.0.0.1:7860. That is fine locally but wrong for Spaces. HF’s health checker needs to reach 0.0.0.0:7860, and if it cannot, the Space stays stuck on “Starting”. Similar issues have been reported and confirmed on the HF forums.(Hugging Face Forums)

  • Main causes and fixes:

    • Wrong host: change launch() / server to host="0.0.0.0" or server_name="0.0.0.0".
    • Wrong port: ensure you use port 7860 or set the same value in both your app and README’s app_port (for Docker).(Hugging Face)
    • Root path: for non-Gradio apps, make sure GET / returns 200, not 404.(Hugging Face Forums)
    • Crashes/permissions: avoid writing to restricted paths like /.streamlit; use writable dirs.(Hugging Face Forums)
    • Infra issues: if a minimal template Space also hangs, it may be an HF-side problem (as in the DreamBooth and other incidents).(Hugging Face)

Thankyou so much @John6666 the 0.0.0.0 worked. That was causing all the issue. Thanks a alot

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.